Fast Csv How to Read and Write From Stream at the Same Time

Cover image for fast-csv for CSV files

Chris Muir

Chris Muir

Posted on • Updated on

fast-csv for CSV files

I recently had to undertake pre-processing on a CSV file with NodeJS+Typescript before ingesting it into a system.

The CSV file in question presents a number of challenges:

  1. The CSV file is large @ ~125k rows
  2. Includes a header row simply individual headers need to be renamed
  3. There are redundant columns to remove
  4. In that location may exist additional columns that we also don't know about that need to be dropped
  5. The columns need reordering
  6. Blank lines must be skipped

Via a quick Google I found fast-csv.

An initial & superficial await at fast-csv highlights a few qualities making it bonny plenty to explore further:

  • It is still actively beingness developed (at the time of this post) giving some assurance effectually bug fixes
  • Uses the MIT friendly open source license
  • Has no runtime dependencies minimizing whatever down stream license issues

In looking at the feature set, fast-csv is comprised of 'parse' and 'format' routines for ingesting and transforming CSV files. It also supports streams for fast processing of large files. The following describes how I made use of fast-csv features to run into the above requirements.

To start with hither's the initial CSV file we will ingest:

              beta,alpha,redundant,charlie,delta  betaRow1,alphaRow1,redundantRow1,charlieRow1,deltaRow1 betaRow2,alphaRow2,redundantRow2,charlieRow2,deltaRow2 betaRow3,alphaRow3,redundantRow3,charlieRow3,deltaRow3                          

Enter fullscreen mode Exit fullscreen style

Our goal is to rename and reorder the columns, drop the blank line, drib the 'redundant' column, and our program should be able to also drop the 'delta' cavalcade which it wont know about at all. The final output should look similar:

              NewAlpha,NewBeta,NewCharlie alphaRow1,betaRow1,charlieRow1 alphaRow2,betaRow2,charlieRow2 alphaRow3,betaRow3,charlieRow3                          

Enter fullscreen mode Exit fullscreen fashion

The following code shows the solution:

                              import                *                as                fs                from                '                fs                '                ;                import                *                every bit                csv                from                '                fast-csv                '                ;                const                inputFile                =                __dirname                +                '                /../sample-data/input.csv                '                ;                const                outputFile                =                __dirname                +                '                /../sample-data/output.csv                '                ;                (                async                function                ()                {                const                writeStream                =                fs                .                createWriteStream                (                outputFile                );                const                parse                =                csv                .                parse                (                {                ignoreEmpty                :                true                ,                discardUnmappedColumns                :                true                ,                headers                :                [                '                beta                '                ,                '                alpha                '                ,                '                redundant                '                ,                '                charlie                '                ],                });                const                transform                =                csv                .                format                ({                headers                :                true                })                .                transform                ((                row                )                =>                (                {                NewAlpha                :                row                .                blastoff                ,                // reordered                NewBeta                :                row                .                beta                ,                NewCharlie                :                row                .                charlie                ,                // redundant is dropped                // delta is not loaded by parse() above                }                ));                const                stream                =                fs                .                createReadStream                (                inputFile                )                .                pipage                (                parse                )                .                pipe                (                transform                )                .                pipe                (                writeStream                );                })();                          

Enter fullscreen mode Leave fullscreen fashion

In explaining the solution:

parse() options

  • ignoreEmpty takes care of skipping the blank line(s)
  • discardUnmappedColumns will driblet any columns we don't specify in the following headers option, taking care of dropping the 'delta' column
  • headers maps the columns nosotros are loading. Note how I've used discardUnmappedColumns to driblet 'delta' just I'm still loading 'redundant'. The 'redundant' column is dropped in the format() options described adjacent

format() options

  • headers directs the output to include the header row
  • The transform() row post-processor allows u.s. to reorder the columns, rename the columns, and too drop the 'redundant' cavalcade

With a larger CSV file in hand, testing shows the higher up routine can procedure ~125k rows with 126 columns, from a file of approx 135MB in size, in ~19 seconds on my MBP iii.2Ghz i7.

fast-csv indeed.

murrayyousiside.blogspot.com

Source: https://dev.to/chriscmuir/fast-csv-for-csv-files-21a1

0 Response to "Fast Csv How to Read and Write From Stream at the Same Time"

Post a Comment

Iklan Atas Artikel

Iklan Tengah Artikel 1

Iklan Tengah Artikel 2

Iklan Bawah Artikel