Giter Site home page Giter Site logo

node-csv-reader's Introduction

csv-reader

npm Version

A CSV stream reader, with many many features, and ability to work with the largest datasets

Included features: (can be turned on and off)

  • Support for excel-style multiline cells wrapped in quotes
  • Choosing a different delimiter instead of the comma
  • Automatic skipping empty lines
  • Automatic skipping of the first header row
  • Automatic parsing of numbers and booleans
  • Automatic trimming
  • Being a stream transformer, you can .pause() if you need some time to process the row and .resume() when you are ready to receive and process more rows.
  • Consumes and emits rows one-by-one, allowing you to process datasets in any size imaginable.
  • Automatically strips the BOM if exists (not handled automatically by node.js stream readers)

Installation:

npm install --save csv-reader

The options you can pass are:

Name Type Default Explanation
delimiter String , The character that separates between cells
multiline Boolean true Allow multiline cells, when the cell is wrapped with quotes ("...\n...")
allowQuotes Boolean true Should quotes be treated as a special character that wraps cells etc.
skipEmptyLines Boolean false Should empty lines be automatically skipped?
skipHeader Boolean false Should the first header row be skipped? (Deprecated, please use skipLines)
skipLines Number 0 Number of lines to skip (if skipHeader is true, then this gets +1) (after the header line if headerLine is set)
headerLine Number 0 Line number of the header (skipLines will be lines skipped after the header line)
asObject Boolean false If true, each row will be converted automatically to an object based on the header. This adds 1 to skipLines.
parseNumbers Boolean false Should numbers be automatically parsed? This will parse any format supported by parseFloat including scientific notation, Infinity and NaN.
parseBooleans Boolean false Automatically parse booleans (strictly lowercase true and false)
ltrim Boolean false Automatically left-trims columns
rtrim Boolean false Automatically right-trims columns
trim Boolean false If true, then both 'ltrim' and 'rtrim' are set to true

Events:

A 'data' event will be emitted with each row, either in an array format ((string|number|boolean)[]) or an Object format (Object<string, (string|number|boolean)>), depending on the asObject option.
A preliminary 'header' event will be emitted with the first row, only in an array format, and without any interpolation to different types (string[]).
Of course other events as usual - end and error.

Usage example:

const Fs = require('fs');
const CsvReadableStream = require('csv-reader');

let inputStream = Fs.createReadStream('my_data.csv', 'utf8');

inputStream
	.pipe(new CsvReadableStream({ parseNumbers: true, parseBooleans: true, trim: true }))
	.on('data', function (row) {
	    console.log('A row arrived: ', row);
	})
	.on('end', function () {
	    console.log('No more rows!');
	});

A common issue with CSVs are that Microsoft Excel for some reason does not save UTF8 files. Microsoft never liked standards. In order to automagically handle the possibility of such files with ANSI encodings arriving from user input, you can use the autodetect-decoder-stream like this:

const Fs = require('fs');
const CsvReadableStream = require('csv-reader');
const AutoDetectDecoderStream = require('autodetect-decoder-stream');

let inputStream = Fs.createReadStream('my_data.csv')
	.pipe(new AutoDetectDecoderStream({ defaultEncoding: '1255' })); // If failed to guess encoding, default to 1255

// The AutoDetectDecoderStream will know if the stream is UTF8, windows-1255, windows-1252 etc.
// It will pass a properly decoded data to the CsvReader.
 
inputStream
	.pipe(new CsvReadableStream({ parseNumbers: true, parseBooleans: true, trim: true }))
	.on('data', function (row) {
	    console.log('A row arrived: ', row);
	}).on('end', function () {
	    console.log('No more rows!');
	});
	

Contributing

If you have anything to contribute, or functionality that you lack - you are more than welcome to participate in this! If anyone wishes to contribute unit tests - that also would be great :-)

Me

  • Hi! I am Daniel Cohen Gindi. Or in short- Daniel.
  • [email protected] is my email address.
  • That's all you need to know.

Help

If you want to buy me a beer, you are very welcome to Donate Thanks :-)

License

All the code here is under MIT license. Which means you could do virtually anything with the code. I will appreciate it very much if you keep an attribution where appropriate.

The MIT License (MIT)

Copyright (c) 2013 Daniel Cohen Gindi ([email protected])

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

node-csv-reader's People

Contributors

asterion avatar camilomanrique avatar cinderblock avatar danielgindi avatar dependabot[bot] avatar lordgameleo avatar tgbv avatar tlhunter avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

node-csv-reader's Issues

Error in the parser. Issue handling "value \" with quote"

file.csv

test,ok
123,"T-250 148\" Med Rf 9000 GVWR Sliding RH Dr"

If you read the file

const Fs = require('fs');
const CsvReadableStream = require('csv-reader');

let inputStream = Fs.createReadStream('./file.csv', 'utf8');

inputStream
  .pipe(new CsvReadableStream({ parseNumbers: true, parseBooleans: true, asObject: true }))
  .on('data', function (row) {
    console.log('A row arrived: ', row);
  })
  .on('end', function () {
    console.log('No more rows!');
  });
{ test: 123, ok: 'T-250 148\\ Med Rf 9000 GVWR Sliding RH Dr"' }

The current returned value of field ok is (Double back slashe and quote at the end):

T-250 148\\ Med Rf 9000 GVWR Sliding RH Dr"

But it should be more like that :

T-250 148\" Med Rf 9000 GVWR Sliding RH Dr

Using PHP I tried to write and read that specific value using internal csv functions:

$file = fopen(__DIR__ . '/file.csv', 'w');

$row = array('test' => '123', 'ok' => 'T-250 148\" Med Rf 9000 GVWR Sliding RH Dr');

fputcsv($file, array_keys($row));
fputcsv($file, $row);

fclose($file);

$file = fopen(__DIR__ . '/file.csv', 'r');
print_r(fgetcsv($file));
print_r(fgetcsv($file));
fclose($file);

The output file is
file.csv

test,ok
123,"T-250 148\" Med Rf 9000 GVWR Sliding RH Dr"

And the value I received retrieve is:

T-250 148\" Med Rf 9000 GVWR Sliding RH Dr

Possible solution

As I know different way to encode quotes in csv value.
One is using a escape string \

Then encoding

T-250 148\" Med Rf 9000 GVWR Sliding RH Dr

Then resulting csv value look like that

"T-250 148\" Med Rf 9000 GVWR Sliding RH Dr"

If we don't use escape by default quote are doubled.

"T-250 148\"" Med Rf 9000 GVWR Sliding RH Dr"

Maybe you can add an option escape: '\'

  .pipe(new CsvReadableStream({ parseNumbers: true, parseBooleans: true, escape: '\\', asObject: true }))

Error event is never emitted

If there is an error while processing the CSV, the error event is never emitted. Take this for example:

let inputStream = Fs.createReadStream(Path.join(__dirname, 'test-auto-parse.csv'), 'utf8');
let output = [];

inputStream
	.pipe(new CsvReadableStream({
		parseNumbers: true,
		parseBooleans: true,
		trim: true,
		asObject: false,
skipHeader: true,
	}))
	.on('data', row => {
		// Imagine an error is thrown here for any reason
	})
	.on('end', () => {
		resolve(output);
	})
	.on('error', err => {
		reject(err); // This is never executed
	});

Instead the error is uncaught.

Typo in npm readme

Hi! I noticed you have typo in example in your npm readme.
https://www.npmjs.com/package/csv-reader#usage-example

var fs = require('fs');
var CsvReadableStream = require('csv-reader');
var AutoDetectDecoderStream = require('autodetect-decoder-stream');
 
var inputStream = fs.createReadStream('my_data.csv')
    .pipe(new AutoDetectDecoderStream({ defaultEncoding: '1255' })); // If failed to guess encoding, default to 1255 
 
// The AutoDetectDecoderStream will know if the stream is UTF8, windows-1255, windows-1252 etc. 
// It will pass a properly decoded data to the CsvReader. 
 
inputStream
    .pipe(CsvReader({ parseNumbers: true, parseBooleans: true, trim: true }))
    .on('data', function (row) {
        console.log('A row arrived: ', row);
    }).on('end', function (data) {
        console.log('No more rows!');
    });

Guess it should look like this (CsvReader ->CsvReadableStream, like in your github readme):

var fs = require('fs');
var CsvReadableStream = require('csv-reader');
var AutoDetectDecoderStream = require('autodetect-decoder-stream');
 
var inputStream = fs.createReadStream('my_data.csv')
    .pipe(new AutoDetectDecoderStream({ defaultEncoding: '1255' })); // If failed to guess encoding, default to 1255 
 
// The AutoDetectDecoderStream will know if the stream is UTF8, windows-1255, windows-1252 etc. 
// It will pass a properly decoded data to the CsvReader. 
 
inputStream
    .pipe(CsvReadableStream({ parseNumbers: true, parseBooleans: true, trim: true }))
    .on('data', function (row) {
        console.log('A row arrived: ', row);
    }).on('end', function (data) {
        console.log('No more rows!');
    });

[question] Does async reading work ? For await sub function.

Thanks for this great job.
Reading stream is pretty good !

Does async reading work ?

If i must to do this :

import fs from 'fs';
import CsvReadableStream from 'csv-reader'

let inputStream = fs.createReadStream('my_data.csv', 'utf8');

inputStream
	.pipe(new CsvReadableStream({ parseNumbers: true, parseBooleans: true, trim: true }))
	.on('data', async (row) => {
	    console.log('A row arrived: ', row);

	    await makeMyDreamReality(row) // does await work ?
	})
	.on('end', function () {
	    console.log('No more rows!');
	});

Feature Request: Typings

With the popularity of TypeScript and VS Code, have you considered adding Type information to the published version of this package?

TypeError: Cannot read properties of undefined (reading 'charCodeAt') - when reading empty file

const readStream = fs.createReadStream(inputCsv, 'utf-8')
await readStream
  .pipe(
    new CsvReadableStream({parseNumbers: false, trim: true, skipHeader: false, skipEmptyLines: true})
  )
  .on('data', () => {
    logger.info('new data')
  })
  .on('end', () => {
    logger.info('end')
  });

When supplied with empty input file, then following error occurs:

~/project/node_modules/csv-reader/index.js:111
            if (newData.charCodeAt(0) === 0xfeff) {
                        ^
TypeError: Cannot read properties of undefined (reading 'charCodeAt')
    at CsvReadableStream._processChunk (~/project/node_modules/csv-reader/index.js:111:25)
    at CsvReadableStream._flush (~/project/node_modules/csv-reader/index.js:287:14)

Feature request - skip header line

If the data file has a header in the first row then have an option to skip it

For example:
NAME, AGE
John Smith, 45

Row 1 should not be processed if skipHeader: true

Nevgative numbers are not parsed

Negative numbers are not parsed even when the parseNumbers option is true. There is something wrong with the regex for numbers.

Export other types

It would be useful if other types from package are exported as well, eg: Line, DataTypes, etc.

Do you consider such idea?

Posible to read from csv in buffer?

Hi ! I wanted to know if it's possible to use node csv reader from a variable. I have a framework that process xlsx files, and I wanted to add csv support, in such a way the csv is turned into a JSON file. Is it possible?

TypeError: newData.charCodeAt is not a function

I was using below snippet.

const UTIL = require('util');
if (process.argv.length !== 3) {
  UTIL.error("Invalid file. Please pass a vlaid file path");
  process.exit(1);
}

const CSVREADER = require('csv-reader');
const FS = require('fs');


const READ_STREAM = FS.createReadStream(process.argv[2]);

READ_STREAM
  .pipe(CSVREADER({parseNumbers: true, skipEmptyLines: true, skipHeader: true}))
  .on('data', function(row) {
    UTIL.log('Row arived', row);
  })
  .on('end', function() {
    UTIL.log('File parsing is done');
  })
  .on('error', function() {
    UTIL.error('CSV read error:', error);
  });

While running this with node fixPCUEntries.js ./updatablePCU.csv getting below error

/home/ankurkumar/workspace/self-learning/csv-parsing/node_modules/csv-reader/index.js:105
            if (newData.charCodeAt(0) === 0xfeff) {
                        ^

TypeError: newData.charCodeAt is not a function
    at CsvReadableStream._processChunk (/home/ankurkumar/workspace/self-learning/csv-parsing/node_modules/csv-reader/index.js:105:25)
    at CsvReadableStream._transform (/home/ankurkumar/workspace/self-learning/csv-parsing/node_modules/csv-reader/index.js:249:10)
    at CsvReadableStream.Transform._read (_stream_transform.js:186:10)
    at CsvReadableStream.Transform._write (_stream_transform.js:174:12)
    at doWrite (_stream_writable.js:397:12)
    at writeOrBuffer (_stream_writable.js:383:5)
    at CsvReadableStream.Writable.write (_stream_writable.js:290:11)
    at ReadStream.ondata (_stream_readable.js:639:20)
    at emitOne (events.js:116:13)
    at ReadStream.emit (events.js:211:7)

Parsing bug

test.csv

"A", "1"
"2", "B"
"C", "D"

parse.js

let inputStream = fs.createReadStream(__dirname + '/test.csv', 'utf8');
inputStream
    .pipe(new CsvReadableStream())
    .on('data', function (row) {
        console.log("DATA", row);
    })
    .on('end', function () {
        console.log("END");
    })
    .on('error', function (e) {
        reject(e);
    });

Output

DATA [ 'A', ' "1"' ]
DATA [ '2', ' "B"' ]
DATA [ 'C', ' "D"' ]
END

Your parser doesn't like whitespace between the columns.

[Feature Request] Skip lines

Hey

Thanks for your awesome library!

I just need another option to skip n lines from above. Something like skipHeader but in number (not boolean)

If we want it to be a non-breaking change, then we can add something like skipLines.

Can not use fs anymore.

When I tried the example first

const Fs = require('fs');
const CsvReadableStream = require('csv-reader');
 
let inputStream = Fs.createReadStream('my_data.csv', 'utf8');
 
inputStream
    .pipe(new CsvReadableStream({ parseNumbers: true, parseBooleans: true, trim: true }))
    .on('data', function (row) {
        console.log('A row arrived: ', row);
    })
    .on('end', function (data) {
        console.log('No more rows!');
    });

An error occured. I find the problem is about the fs package. The link shows that:

This package name is not currently in use, but was formerly occupied by another package. To avoid malicious use, npm is hanging on to the package name, but loosely, and we'll probably give it to you if you want it.

I can't use or test this package further after this stuck.

Can anyone update the documentation or tell me how to solve it?

Error in reusing CsvReadableStream

Hey!

Thanks for your awesome library!

I just faced a small bug:

  • Instantiate a CsvReadableStream:
const csvReader = new CsvReadableStream({
  skipEmptyLines: true,
  asObject: true,
  trim: true
});
  • Use csvReader in a function parseCSV()
  • Try to call the function again

Code

import CsvReadableStream from 'csv-reader';
import type { Readable } from 'stream';
import type { CSV, Customer } from './types';

const csvReader = new CsvReadableStream({
  skipEmptyLines: true,
  asObject: true,
  trim: true
});

export const parseCSV = (raw: Readable): Promise<Customer[]> => {
  const data: Customer[] = [];

  return new Promise((resolve, reject) => {
    raw
      .pipe(csvReader)
      .on('error', error => reject(`CSV: Error: "${error}"`))
      .on('data', (row: CSV) =>
        data.push({
          x: row.x
        })
      )
      .on('end', () => resolve(data));
  });
};

But if I move:

const csvReader = new CsvReadableStream({
  skipEmptyLines: true,
  asObject: true,
  trim: true
});

inside the parseCSV(), it will work!

Clarity on how quotes are handled when used as text qualifier

Hi - thank you for the library!

I want to use it to parse a CSV that has unescaped newline chars in it - fields that have this are wrapped in quotes.

How are quotes treated if I specify that newlines fields are wrapped in quotes? For example this line from a CSV:

field1,field2,"this is
a field with a newline char", "This is "another field"
with a newline char",field5

Will quotes be ignored if they are not placed next to a delimiter (I would like this to be the case)?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.