Comments (3)
+1 @ben-strasser I seem to need the opposite of this. If set_header
specifies more columns (say n
) and the csv file has m
columns, then the last m-n
columns should get default value as in ignore_missing_columns
from fast-cpp-csv-parser.
Hi,
I originally considered adding a more general set_header however decided not to do it.
If you do not know the number of columns in the file then how do you know which you need? You might say that you need the first x columns. But why not the last x? or the every other column? When writing the parser you cannot know how someone will modify the file format. Where will he add his new column? Will he remove columns? All these question can be handled transparently and missing columns detected when the CSV file has a header. If it does not then this is not possible. I therefore argue that if the CSV format changes and does not have a header that in any case the programmer will have to check manually whether the parsing code still works. Having a set_header with an ignore_policy that only reads the first x parameters is therefore a bug in the making.
If you know the CSV format and the number of columns but only want to read some columns then you can use dummy char* variables. These pointers point directly into the memory buffer. There is therefore nearly no overhead associated. You can argue that for this usecase the interface is ugly and you are right. However, I think that this usecase is sufficiently rare that we can live with the current inferface, especially I do not see how to design an interface that is both flexible and elegant. Using a complicated interface is no prettier than the current situation.
Further having an ugly interface for CSV files without header has its use: It pushes people towards adding headers, which will help them down the line when the CSV file format is updated.
Best Regards
Ben Strasser
from fast-cpp-csv-parser.
I get what you're saying. For argument's sake, consider an C/C++ function with default values. You can specify only the first k<n arguments and the rest get the defaults.
There is no syntactic option for using just the last k or interleaving.
I could argue the same here. If you want just the first k columns, it is a valid use case. Otherwise, use dummy variables.
I ended up using dummies too.
Sent from my iPhone
On 25 May 2016, at 08:59, ben-strasser [email protected] wrote:
Hi,
I originally considered adding a more general set_header however decided not to do it.
If you do not know the number of columns in the file then how do you know which you need? You might say that you need the first x columns. But why not the last x? or the every other column? When writing the parser you cannot know how someone will modify the file format. Where will he add his new column? Will he remove columns? All these question can be handled transparently and missing columns detected when the CSV file has a header. If it does not then this is not possible. I therefore argue that if the CSV format changes and does not have a header that in any case the programmer will have to check manually whether the parsing code still works. Having a set_header with an ignore_policy that only reads the first x parameters is therefore a bug in the making.
If you know the CSV format and the number of columns but only want to read some columns then you can use dummy char* variables. These pointers point directly into the memory buffer. There is therefore nearly no overhead associated. You can argue that for this usecase the interface is ugly and you are right. However, I think that this usecase is sufficiently rare that we can live with the current inferface, especially I do not see how to design an interface that is both flexible and elegant. Using a complicated interface is no prettier than the current situation.
Further having an ugly interface for CSV files without header has its use: It pushes people towards adding headers, which will help them down the line when the CSV file format is updated.
Best Regards
Ben Strasser—
You are receiving this because you authored the thread.
Reply to this email directly or view it on GitHub
from fast-cpp-csv-parser.
Related Issues (20)
- How can i skip specified line? HOT 1
- Segmentation fault when running inside Boost Unit Test Framework HOT 2
- free(): invalid pointer HOT 4
- i18n README HOT 1
- My csv files can have changing number of columns
- Add ability to detect NULL values HOT 3
- Parse Single Line Without Loading File HOT 3
- Loss of precision on float reading HOT 4
- Indexing read_row HOT 1
- Count rows without processing them? HOT 1
- Possible to ignore columns in read_row()?
- Can the parsing performance be improved by using a precomputed index? HOT 2
- Progressbar support HOT 2
- Read file line by line
- Hope for more examples for code noobs
- Is it possible that I can wrap this library with lz4?
- Can I read the csv from console with this library?
- Usage of set_file_line is not clear HOT 1
- C4996 (function or variable may be unsafe) error for strncpy and fopen HOT 2
- Parsing a CSV with unknown number of columns HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from fast-cpp-csv-parser.