Comments (6)
Would be nice to have this (or an equivalent) to be able to just read the header row of a CSV file
( Maybe to just get the names of the columns)
from kotlin-csv.
I think it would be good to have access to basic CSV parser functionality, without the line length checks or header parsing logic. As a fallback in case you need to parse very unusual CSVs. Or to parse malformed CSVs for error reporting or other purposes.
If a file contains blank rows, or rows with differing numbers of fields, then it is still possible to load these files into Excel. Spreadsheet or generic CSV parsing software does not care about the number of fields, or headers, it only cares about the most basic structure of a CSV. Even a malformed CSV with blank rows or rows with a different number of fields will be laid out in table form with row and column numbers. This type of software consistently puts cells in the same row and column numbers, regardless of the vendor. I have tested this with Excel, Libre Office and Modern CSV and they all give consistent row numbers for files with blank rows, differing numbers of fields and fields containing newlines.
Using CSVFileReader.readAllAsSequence()
it is not possible to replicate the lenient parsing behavior of those types of tools. It is not even possible to consistently determine which row numbers they assign to specific rows. However using CSVFileReader.readNext()
these things are possible for any file.
insufficientFieldsRowBehaviour
, excessFieldsRowBehaviour
and skipEmptyLine
allow for a bit of wiggle room in some use-cases, but do not allow the checks to be bypassed entirely. For example it is not possible to parse a file with blank rows unless skipEmptyLine = true
. However if this is enabled then readAllAsSequence()
will not return those rows at all. This is an issue if you are trying to report the row number in the file correctly since you can not know if the line exists or not. Rows with differing numbers of fields have other issues. With readNext()
you can always get the correct row number in the file, and the content, regardless of how many fields it has or if it is blank.
With CSVFileReader.readNext()
it is possible to replicate the lenient parsing behavior of spreadsheet software, or to report errors with row numbers that are consistent with spreadsheet software. Without readNext()
these things are no longer possible.
I do agree that the behavior of readNext()
may be confusing. A user might assume that readNext()
behaves the same as readAllAsSequence().iterator().next()
and thus respects insufficientFieldsRowBehaviour
, excessFieldsRowBehaviour
and skipEmptyLine
. It may be confusing and unexpected that it does not. As an alternative to readNext()
it might be better to add a function readAllAsSequenceRaw()
or a separate class which provides this raw row read functionality. Or adding a config option which bypasses the field number checks entirely.
from kotlin-csv.
Thank you for your very useful input. I completely agree with your opinion.
I'll consider the future direction of this feature.
from kotlin-csv.
I'm currently using readNext()
in 2 ways;
- to read the headers only
- to read the first X rows to grab a preview of the CSV file, without needing any of the checks that happen when reading the whole file
If readNext()
were to be removed, I'd love to have the option to call something like readColumnNames()
or similar, which ideally also checks for duplicates. Even better, being able to call something like CsvReader.readColumnNames(File)
would likely limit misuse and be prettier than having to open(File)
first.
For my second use case, there currently are only utility methods to readAll
in different ways, but being able to read the first X rows including headers and the full set of checks used would be amazing to provide previews!! Maybe something like CsvReader.readWithHeader(File, Int? = null)
where, if passed, the integer value would specify how many rows to read before returning :)
from kotlin-csv.
It seems readNext() is line by line read and can work in very less memory which readAllSequence() will force loading all the csv in memory first. What is the alternate if memory is constraint or we are suppose to process very large csvs
from kotlin-csv.
Related Issues (20)
- Keep input stream open HOT 2
- Optimize writeNext Method HOT 3
- The binary version of its metadata is 1.6.0, expected version is 1.4.2.
- Seeing a parsing error when quotes within text HOT 1
- Parser unable to parse csv file with lower row quantity compare with header HOT 6
- Introduce `insufficientFieldsRowBehaviour = "EMPTY_STRING"` option on CSV reader HOT 2
- How to read big csv file using your library?
- CsvFileWriter.writeRows writes line terminator even when passed list or sequence is empty HOT 3
- Allow an error interface channnel HOT 1
- Unable to parse with multiple quotes HOT 1
- Exception in thread "main" com.github.doyaaaaaken.kotlincsv.util.CSVParseFormatException: must appear escapeChar(") after escapeChar(") [rowNum = 729, colIndex = 467, char = "]
- Add support for WebAssembly HOT 2
- Use java.nio.Path instead of File
- Allow writing data with headers
- Make common ancestor for CsvReaderContext and CsvWriterContext
- Remove logger 3rd party library HOT 8
- java.lang.NoClassDefFoundError: com/github/doyaaaaaken/kotlincsv/dsl/CsvReaderDslKt HOT 1
- How to set/get Header HOT 1
- Write directly to a String HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kotlin-csv.