shystruk / csv-file-validator Goto Github PK

View Code? Open in Web Editor NEW

91.0 5.0 42.0 1.06 MB

🔧🔦 Validation of CSV file against user defined schema (returns back object with data and invalid messages)

Home Page: https://www.npmjs.com/package/csv-file-validator

License: MIT License

JavaScript 76.14% HTML 3.23% TypeScript 20.63%

javascript csv-parser validator csv-files csv csv-reader

csv-file-validator's People

Contributors

Stargazers

Watchers

Forkers

driss7 0xflotus yurirusakovich ganeshpatro321 walterrojas monodigital phamthibaoyen3d1 hantulautt krukmats4g cjmaranas curiousinc gaxon-lab iguatemi365 kimbugp platinumitpl thangnm-ibl enase nikitaa42 dbdavidbacker soutikcbnits alvaromashiro dhirajjadhavrao gabbytam javierlinked rebirthlab igors-levkovs codeplay87 sajankjohn grupo-palermo sheldonfrith ankurstyli pulls007 dev-seahouse hirenchauhan2 reecebedding lulzash rachelfrenkel sheikh566 yourbuddyconner in-tech-gration sackle-yumura isabelchm

csv-file-validator's Issues

Handling of validated vs. error rows should be optional for each header

In case the CSV file needs to be used for something after validation, it would be good to have a way of getting all the validated rows separately from the error rows. That way, the code can continue running for the validated rows, while the non-validated rows can be handled separately without having to stop the entire code.

For each header, there needs to be an option that will say if an error in validation for that header will cause the row to be removed from the validated rows.

For example:
{
name: 'Email',
inputName: 'email',
removeIfNotValid: true,
unique: true,
uniqueError: function (headerName) {
return ${headerName} is not unique
},
validate: function(email) {
return isEmailValid(email)
},
validateError: function (headerName, rowNumber, columnNumber) {
return ${headerName} is not valid in the ${rowNumber} row / ${columnNumber} column
}
}

The removeIfNotValid will indicate whether or not a row should be removed from the Valid object.

The "csvData" object will then contain 3 objects instead of 2: inValidMessages, data and validData, with the last containing ONLY the rows that have been validated correctly.
Another option is to have yet another object: invalidData, which will contain the original rows that have not validated correctly.

Question: Can I use this package to validate a TSV file?

No error detected when there is mismatch between fields and headers

Hi,
Given the following CSV:

Header1;
Value1;Value2;

Config:

{
headers: [
{name: 'Header1', inputName: 'header1', required:true},
{name: 'Header2', inputName: 'header2', required:true}
],
isHeaderNameOptional: false
}

I would expect a parse error since there is mismatch between headers and fields. But I don't get any error in the validator response. Is this feature not implemented or am I missing something ?
Thanks.

Not able to shuffle header names

By any chance can we reorder the header name in config

"FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory" when adding new validator headers

I have encountered a strange issue.
The code worked fine with my headers.
Then I added two new headers with the exact same structure, and suddenly I get an error message that crashes my app:

FATAL ERROR: Reached heap limit Allocation failed - JavaScript heap out of memory
 1: 00007FF6CDA697DF node_api_throw_syntax_error+174159
 2: 00007FF6CD9F8136 v8::internal::wasm::WasmCode::safepoint_table_offset+67430
 3: 00007FF6CD9F900D node::OnFatalError+301
 4: 00007FF6CE472E3E v8::Isolate::ReportExternalAllocationLimitReached+94
 5: 00007FF6CE45D1B2 v8::Isolate::Exit+674
 6: 00007FF6CE2E2E5C v8::internal::EmbedderStackStateScope::ExplicitScopeForTesting+124
 7: 00007FF6CE2E02C1 v8::internal::Heap::CollectGarbage+3585
 8: 00007FF6CE2F551D v8::internal::HeapAllocator::AllocateRawWithLightRetrySlowPath+1517
 9: 00007FF6CE2F5BB3 v8::internal::HeapAllocator::AllocateRawWithRetryOrFailSlowPath+83
10: 00007FF6CE2FE19F v8::internal::Factory::AllocateRaw+543
11: 00007FF6CE31456A v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithFiller+90
12: 00007FF6CE314853 v8::internal::FactoryBase<v8::internal::Factory>::NewFixedArrayWithMap+35
13: 00007FF6CE0E7FF6 v8::internal::HashTable<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::EnsureCapacity<v8::internal::Isolate>+246
14: 00007FF6CE0EEEFD v8::internal::BaseNameDictionary<v8::internal::NameDictionary,v8::internal::NameDictionaryShape>::Add+109
15: 00007FF6CDFF9738 v8::internal::Runtime::GetObjectProperty+936
16: 00007FF6CE512B41 v8::internal::SetupIsolateDelegate::SetupHeap+566513
17: 00007FF64E6D80BE

Not sure what could cause such an issue.
If I remove the new headers, it works fine again.

required column bug

hello
i'm testing this package to valid CSV ( obviously)

i maybe found a bug

for example i have 4 columns, the 2 first are required

const CSVConfig = {
  headers: [
    {name: 'id', inputName: 'id', required: true, requiredError, unique: true, uniqueError},
    {name: 'content', inputName: 'content', required: true, requiredError},
    {name: 'exa', inputName: 'exa', required: false},
    {name: 'exb', inputName: 'exb', required: false},
  ]
}

the following CSV don't throw error even if last row have only ONE column ,But it should throw error because missing the second column :

"id","content","exa","exb"
"www.agent_de_fabrication.com15"

this throw an error :

"id","content","exa","exb"
"www.agent_de_fabrication.com15",

I'm missing an option ?

thanks for your hard work

Number of fields check for the first row

I think this check needs to be done also for the first row of CSV file.

So, instead of this:

			// fields are mismatch
			if (rowIndex !== 0 && row.length !== config.headers.length) {

should be this:

			// fields are mismatch
			if (row.length !== config.headers.length) {

In uniqueError function, getting the rowNumber as undefined

Hi, I am trying to get the rowNumber for the unique error, but i am getting it as undefined.
{
name: 'Employee ID (ZID)',
inputName: 'EmployeeID ',
required: false,
unique: true,
uniqueError: function (headerName, rowNumber) {
return ${headerName} is not unique at the ${rowNumber} row
},
},

is it possible to add dropdowns in exported csv

Allow parsing options to be given when validating a CSV

My use case is that users often provide CSV files with a trailing newline and this library considers that to be an invalid file if any of the headers is required. I'd like to simply skip these lines and PapaParse has an option for this in skipEmptyLines: true, however the current structure has no way for me to pass this option to the internal PapaParse instance.

I've currently forked this repo and added a commit to fix above issue by allowing a third "options" argument to be given to the validator. If you like this approach then I can create a pull request for it.

Unique fails for email field

I'm getting multiple failed uniqueness tests for an email column, while there is only one duplicate.
Rows 3,4,5,6,8,9 get flagged instead of 8,9 only
Note: required and email validation work correctly

Data:

Email

[email protected]
XXX
[email protected]
@domain.io
[email protected]
[email protected]
[email protected]

Code:

const CSVConfig = {
	headers: [
    ...
		{ name: 'Email'              ,inputName: 'email', required: true, requiredError, unique: true, uniqueError, validate: isEmailValidStringent, validateError}.
		...
		]
}

Custom error message for number of fields mismatch

Hi.
I think we should be able to override this message, so we can show it in different languages.

Thank you.

[Bug] Cannot read property stream of null

Current behavior:

I tried to test my code with cypress and cypress-upload-file. But I have a parsing error (see below, cannot read stream of null) that I do not have outside of cypress.
During cypress test, I can console.log file (see code below) and cannot see a difference between in cypress or in a real browser...

I am not sure the issue is from csv-file-validator library, but I only have this issue during testing with cypress (I create an issue on the repo of cypress-file-upload). Maybe the library waiting for something that the test do not provide.

Desired behavior:

CSV-file-validator could parse the file.

Steps to reproduce: (app code and test code)

Cypress: cy.get('input[type=file]').attachFile('myfile.csv');
Front:

function onFileUploaded(event) {
    const file = event.target.files[0];
    CSVFileValidator(file, CSVConfig)
      .then(csvData => {
          /* Do stuff with csvData */
      })
      .catch(err => console.error(err) });
  };
}
<input type="file" accept=".csv" onChange={onFileUploaded} />

Trace error:

TypeError: Cannot read property 'stream' of null
    at Object.parse (papaparse.min.js:46)
    at csv-file-validator.js:17
    at new Promise (<anonymous>)
    at csv-file-validator.js:16
    at Cr (AccountantFileUpload.js:64)
    at Object.<anonymous> (react-dom.production.min.js:49)
    at d (react-dom.production.min.js:69)
    at react-dom.production.min.js:73
    at k (react-dom.production.min.js:140)
    at T (react-dom.production.min.js:169)

Versions

Yarn: 1.12.3
Cypress: 4.3.0
Cypress-file-upload: 4.0.4
Csv-file-validator: 1.8.0
React: 16.6.0
MacOS: 10.14.6

Issue when csv field contains quotation characters

It seems that if the csv field contains quotation characters at the beginning of the row. It causes the entire row to become a single field instead of splitting the fields based on the delimiter.

For example:
These three rows:
"Arbeit ist keine Ware" - 100 Jahre Internationale Arbeitsorganisation 978-3-658-25415-5 978-3-658-25416-2 https://link.springer.com/10.1007/978-3-658-25416-2 Senghaas-Knobloch 10.1007/978-3-658-25416-2 fulltext Springer Fachmedien Wiesbaden Monograph 2019 1 P

"Arbeit ist keine Ware" – 100 Jahre Internationale Arbeitsorganisation 978-3-658-35978-2 978-3-658-35979-9 https://link.springer.com/10.1007/978-3-658-35979-9 Senghaas-Knobloch 10.1007/978-3-658-35979-9 fulltext Springer Fachmedien Wiesbaden Monograph 2022 2 P

"Auf Stalin, Sieg und Vaterland!" 978-3-658-00821-5 978-3-658-00822-2 https://link.springer.com/10.1007/978-3-658-00822-2 Lutz-Auras 10.1007/978-3-658-00822-2 fulltext Springer Fachmedien Wiesbaden Monograph 2013 1 P

Will result in the following:
{ publication_title: 'Arbeit ist keine Ware" - 100 Jahre Internationale Arbeitsorganisation\t978-3-658-25415-5\t978-3-658-25416-2\t\t\t\t\t\t\thttps://link.springer.com/10.1007/978-3-658-25416-2\tSenghaas-Knobloch\t10.1007/978-3-658-25416-2\t\tfulltext\t\tSpringer Fachmedien Wiesbaden\tMonograph\t2019\t\t\t1\t\t\t\tP\n' + '"Arbeit ist keine Ware" – 100 Jahre Internationale Arbeitsorganisation\t978-3-658-35978-2\t978-3-658-35979-9\t\t\t\t\t\t\thttps://link.springer.com/10.1007/978-3-658-35979-9\tSenghaas-Knobloch\t10.1007/978-3-658-35979-9\t\tfulltext\t\tSpringer Fachmedien Wiesbaden\tMonograph\t2022\t\t\t2\t\t\t\tP\n' + '"Auf Stalin, Sieg und Vaterland!', print_identifier: '978-3-658-00821-5', online_identifier: '978-3-658-00822-2', date_first_issue_online: '', num_first_vol_online: '', num_first_issue_online: '', date_last_issue_online: '', num_last_vol_online: '', num_last_issue_online: '', title_url: 'https://link.springer.com/10.1007/978-3-658-00822-2', first_author: 'Lutz-Auras', title_id: '10.1007/978-3-658-00822-2', embargo_info: '', coverage_depth: 'fulltext', coverage_notes: '', publisher_name: 'Springer Fachmedien Wiesbaden', publication_type: 'Monograph', date_monograph_published_print: '2013', date_monograph_published_online: '', monograph_volume: '', monograph_edition: '1', first_editor: '', parent_publication_title_id: '', preceding_publication_title_id: '', access_type: 'P' }

Is it possible to require at least one (any) of a span of columns?

I have a CSV template that includes 20 headers. Five of those are required, and then there is a span of 12 that are all individually optional, but at least one must have value and pass validation.

Is this inherently possible with csv-file-validator? Would you have a recommended way to pull it off if not?

Ignore columns specified in config

Is it possible to ignore all other columns in the CSV that aren't specified in the config headers?

validation for dependent columns

I wish to validate two or more dependent columns.
One column value is dependent on the value of the other column.
for example country and state column are dependent where state column is depending on the value of the country column.

How do I validate the columns in this scenario ?

Optional column not supported correctly

Suppose you had the following configuration:

{
  "csvValidatorConfig": {
    "headers": [
      {
        "name": "First Name",
        "inputName": "firstName",
        "required": true
      },
      {
        "name": "Last Name",
        "inputName": "lastName",
        "required": true
      },
      {
        "name": "Age",
        "inputName": "age",
        "optional": true
      },
      {
        "name": "Employee ID",
        "inputName": "employeeId"
      },
      {
        "name": "Roles",
        "inputName": "roles",
        "isArray": true
      },
      {
        "name": "Comments",
        "inputName": "comments"
      }
    ]
  }
}

And your CSV input were like this, where Age is omitted:

First Name,Last Name,Employee ID,Roles,Comments
John,Public,112233,"cook,bottle washer","comments"
George,O'Jungle,112244,"spy","comments"
Tennessee,Tuxedo,112255,"teacher (""penguin""),cartoon","comments"
Tim,Cook,112266,"CEO","comments"

The code will not support the missing column correctly. It will instead process the Employee ID column as the Age Column.

I do not think this is an enhancement request. It is not working as expected.

How can I map the inputName field with the field given in the csv file if the fields in csv file are not in sequence ?

Skip rows from csv

Hi, im wondering if you can help me with this issue,

i want to skip some rows form a csv, how can i do this ?

Custom error message for dependentValidate error

When dependentValidate() fails, it sets the error message from validateError() function given. But instead it should have its own error handler dependentValidateError() to show more precisely what is wrong.

Indicate expected header name when missing or incorrect

Hi,
Currently, with isHeaderNameOptional: false if a header name is missing or incorrect, the error message is of the format 'Header name ' + columnValue + ' is not correct or missing'. When multiple input headers are empty/blank string, then this message is not very informative since the user does not know which headers were missed.
I would like to indicate to the user the expected header name also.

Will csvData.data contain only validated rows?

I was about to write a suggestion to add a "removeIfNotValid" option to the headers to remove rows that are not validated, but it suddenly occurred to me that csvData.data might already have only the validated rows?
Can you please confirm if that is the case?
If not, I will submit the enhancement suggestion.

Delimiter (,) in one of the fields

Hello I have the following issue, one of my columns is a free text and in this column there could be a comma.

However this doesn't mean additional value.

Could you add a support for checking it.

Not able to define optional columns

Is there a way to define optional columns @shystruk

`inValidMessages` needs to be more descriptive

Currently the inValidMessages is just a strings array with no addition info. It's gets pretty messy to put formatted strings to identify row with error if we want to do complex tasks.

Context
I'm working on CSV import tool, and need to validate CSV files on client side. This tool will show the parsed data in a table. I want to add the error message on the table's row and highlight it and allow user to edit the data in-place. To do that we need to have an index of that row/column to quickly mark it as an errored record in the table.

Proposed suggestion

We should have an error object with at least these properties to start with:

interface RowValidationError {
   rowIndex?: number; 
   columnIndex?: number;
   message: string; // this will be populated by the config validation functions like requiredError, uniqueError, uniqueError, etc.
}

interface ParsedResults<Row = any, Error = RowValidationError> {
   /** Array of parsed CSV entries */
   data: Row[];
   /** List of validation error messages */
   inValidMessages: Error[];
}

update

made rowIndex, columnIndex fields optional, as in some scenarios they may not have these details. e.g. top level file error, etc.

TypeError: (0 , csv_file_validator_1.default) is not a function

import CSVFileValidator from "csv-file-validator";

.
.
.
public async ValidateCsv(csv: string, validation) {
  const data = await CSVFileValidator(csv, validation);

  return { data };
}
.
.
.

I am using node v14.20.1 and npm 6.14.17

I don't know why I am getting this error: TypeError: (0 , csv_file_validator_1.default) is not a function

IE 11 issue

When I try to run my Angular 5 application I get a syntax error on your script:

/** * @param {File} csvFile * @param {Object} config */ function CSVFileValidator (csvFile, config) { return new Promise((resolve, reject) => {

It errors out at the EQUAL sign on the => {

I got UndetectableDelimiter as in csv file i am using , the delimiter is "," . So how can I add delimiter?

errors: [ { type: 'Delimiter', code: 'UndetectableDelimiter', message: 'Unable to auto-detect delimiting character; defaulted to \',\'', row: undefined } ], meta: { delimiter: ',', linebreak: '\n', aborted: false, truncated: false, cursor: 105 }

How can we validate headers, not values? and optional key is not working, showing same result for both boolean values.

Dynamic Typing in PapaParse config breaks the module

Hello,

I noticed an issue with this package where if you pass parserConfig: { dynamicTyping: true } in the config the code breaks if there are types other than strings in the CSV file (in my case there are numbers).

The module tries to call _clearValue which fails because the replace function does not exist on numbers.

The use case is that I want to be able to validate that values in certain columns are in fact numbers. The workaround for now would be to add a validation function to see if the string is numeric:

function isNumeric(value: any) {
  return !isNaN(parseFloat(value)) && isFinite(value);
}

However, it would be much preferred if we could rely on papaparse to perform the dynamicTyping correctly. Let me know if I can help resolve this. Thanks!

Header Name cannot be read

Hi, I am trying to get the read data from csv , but it always gives an error

'Header name public/uploads/tmp/data/file-1652695495283-job-template-csv.csv is not correct or missing in the 1 row / 1 column. The Header name should be First Name'

Feature request: Add optional header to config

sbj

Papaparse not working inside csv file validator?

I have tried for few csv files but the result is same
{ data: [ [ filename ] ], errors: [ { type: 'Delimiter', code: 'UndetectableDelimiter', message: 'Unable to auto-detect delimiting character; defaulted to \',\'', row: undefined } ], meta: { delimiter: ',', linebreak: '\n', aborted: false, truncated: false, cursor: 71 } }

Handling for when CSV and config headers do not match

I have different types of CSVs that I need to validate where the headers may vary, right now the config headers need to match the headers in the CSV, is there a possible way to have a list of headers that may or may not be in the CSV?
I tried using isHeaderNameOptional but this doesn't trigger the field validations. I also tried using the optional property on the extra headers in the config but that doesn't work either

[Bug] Using Papaparse config header = true

If we set the property header in the config for papaparse. CSV File Validator cannot parse file

Error: csv-file-validator.js:62 Uncaught TypeError: row.forEach is not a function at csv-file-validator.js:62

Validator is not identifying empty entry in the csv file therefore no error is returned while validation process

headers: [ { name: 'First Name', inputName: 'FirstName*', required: true, requiredError: function (headerName, rowNumber, columnNumber) { return '${headerName} is required in the ${rowNumber} row / ${columnNumber} column' } }

this is one of the configuration of the header

Letting you know about similar efforts

Hi there,

Not really an issue or criticism, but thought you might find the British National Archive CSV validation effort worth knowing about: https://github.com/digital-preservation/csv-validator and https://github.com/digital-preservation/csv-schema

Validate Function Should Be Async Function to check whether a user with this email and mobile already exist or not in dababase

Last required header is skipped by validation

Validation doesn't work correctly for the following case:

Steps to reproduce:

CSV file should contain the following headers: "Name,Surname,Age"
Prepare file with content "Name,Surname"
Validation CSV

Actual result:
No validation errors

Expected result:
Validation error that "Age" header is missing

Please note that the validation will work correctly for the headers with the comma: "Name,Surname,"

Dev notes:
It seems this is because of the row.forEach cycle. Without the comma, the array will contain 2 entries (headers config contains 3 entries), but because of the process is based on row entries instead of headers config, it will never check for the absence of the required headers.

The unique error map does not return the row and column number of the entry that is not unique but only returns the header name.

The unique error map does not return the row and column number of the entry that is not unique but only returns the header name. Any suggestions on how to get the row and column no ?

Scenario for "optional" column

Type definitions

Hello,

I'd needed to add type definitions for this package, as I am using it from TypeScript project.

Can I prepare a PR and include them in the package?

Add configuration option for converting CSV column number indices to letter indices for validation

Request the addition of a configuration option to allow for the conversion of CSV column number indices to letter indices for validation purposes. Currently, when validating CSV files, it is necessary to manually convert column number indices to letter indices in order to accurately identify and reference specific columns. This can be time-consuming and error-prone, especially when working with large CSV files.

Adding a configuration option to automatically convert column number indices to letter indices would greatly improve the efficiency and accuracy of CSV validation processes. It would also make it easier for users to understand and reference specific columns within a CSV file.

Currently, it is not possible to convert CSV column numbers to letters using the current configuration options. This can be inconvenient when working with large CSV files, as it is difficult to read and understand the data when the columns are represented by numbers.

I propose adding a new configuration option that allows users to specify whether or not to convert column numbers to letters.

I envision this configuration option being a boolean flag, with a default value of "false". When set to "true", the column numbers in the CSV file would be automatically converted to letters.