Hi, Instead of explicitly typing out variables into which you'd pars

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Passing a vector of columns to be populated instead of names about fast-cpp-csv-parser HOT 9 CLOSED

ben-strasser commented on July 21, 2024

Passing a vector of columns to be populated instead of names

from fast-cpp-csv-parser.

Comments (9)

helmingstay commented on July 21, 2024

This is (mostly) possible with a variadic template. First off, you must use
an array, since the size of a vector is not known at compile time. Second,
you need a helper template to take that array and "unroll" it into args.

It's a little tricky if you're new to TMP, but it's a common enough
construct.

Edit - I've pasted in some snippets of my working code. It's far from a MWE, but it will (hopefully) provide you with a starting place.

// http://stackoverflow.com/questions/16834851/passing-stdarray-as-arguments-of-template-variadic-function
// magic: generate integer sequence as function args, pass to variadic template
namespace indices_t {
template<int... Is>
struct seq { };
template<int N, int... Is>
struct gen_seq : gen_seq<N - 1, N - 1, Is...> { };
template<int... Is>
struct gen_seq<0, Is...> : seq<Is...> { };
}

// example calling function template
// the analogy to a "column name array" here is row_text
template<int... Is>
void _read_row(reader_t & in_csv, indices_t::seq<Is...>) {
// ...
while(
in_csv.read_row( row_text[Is]...)
) {
// do stuff
}
}

// call example - 
// the effect is that _read_row is indices_t generates one arg, a numeric index, for each element of the array 
    _read_row(in_csv, indices_t::gen_seq<Ndat>());

hth,
-xian

from fast-cpp-csv-parser.

rapoth commented on July 21, 2024

@helmingstay Thanks for your time Xian! This is a very interesting solution! However, wouldn't this require modifying the csv.h? In addition, I have two questions:

I am the new function will only read the csv values as text (and does not deal with casting into types). Is this correct?
Instead of this approach (which requires modifying csv.h), why not add one more signature like the following:

bool read_row(std::vector<std::string>& columns)

Would be interesting to see whether @ben-strasser has any plans of supporting this. If so, maybe @helmingstay can submit a pull request? I would highly prefer something that does not require me to maintain my own fork of the codebase.

from fast-cpp-csv-parser.

helmingstay commented on July 21, 2024

Just to reiterate, the code I added wasn't meant as a complete example, or as a solution to your particular issue. It was simply to A) let you know that your request was possible and B) point you in the right direction.

To answer your questions:

A. " However, wouldn't this require modifying the csv.h?"
I've simply added a few alias templates and helper classes that set policies, and mold the interface in csv.h to my liking. For example, I have a header with the following:

// convenience policy definition, trim spaces, comma sep, comments=#
template<size_t ncol>
using csv_t = io::CSVReader<ncol, io::trim_chars<' '>, io::no_quote_escape<','>, io::throw_on_overflow, io::single_line_comment<'#'>  >;

In my case, I'm reading everything as text and then manually parsing. As @ben-strasser notes in the README, "The builtin number parsers are pure convenience. If you need a slightly different syntax then use char* and do the parsing yourself."
It is NOT POSSIBLE to determine the dimension of a vector at runtime. Thus, this code will not (and cannot be made to) compile. You could do this with a std::array containing char* (though be aware that, for the exact same reason, an array of std::strings is not possible).

I suggest spending a few days reading up on TMP and variadic templates in cpp, and then working through a few examples. Some of the questions that you're asking will be quickly clarified with some practice :)

from fast-cpp-csv-parser.

ben-strasser commented on July 21, 2024

Is it possible to instead accept a vector containing columns with specific types to do this? Something like the following:
auto cols = new Columns{ ... three string cols ... };
_reader->read_row(cols);
will then populate cols[0], cols[1] and cols[2] as needed. Any suggestions on how this can be done?

You can do this:

vector<string>vec(3);
in.read_row(vec[0], vec[1], vec[2]);

This will get the data into a vector of strings without you having to declare a variable per column. Sounds to me like what you were asking for.

Instead of this approach (which requires modifying csv.h), why not add one more signature like the following:
bool read_row(std::vector<std::string>& columns)

CSVReader takes the number of columns as compile-time constant. A vector has a dynamic size. What should happen if the vector's size and the number of columns do not coincide? There are many options: assert fails, exception thrown, vector resized, segfault, stack trashing. No option is intuitive. On the other hand the simple

in.read_row(vec[0], vec[1], vec[2]);

makes it absolutely clear that it is the responsibility of the caller to make the vector the right size. Further, not having an additional method makes the interface thinner. Less code -> less bugs, Smaller interface -> easier to learn.

@helmingstay Be aware that the docs says to read as char* and not as std::string. From a performance point of view, there is a significant difference between those two.

from fast-cpp-csv-parser.

helmingstay commented on July 21, 2024

@ben-strasser "Be aware that the docs says to read as char* and not as std::string. From a performance point of view, there is a significant difference between those two."

Yes, thank you :)
In my case I actually do use std::string, mainly out of laziness, but I was careful to remove any mention of it in my examples (and explicitly point to your README comment).

from fast-cpp-csv-parser.

sshivaji commented on July 21, 2024

I respectfully think its quite useful to have dynamic fill for values in the in.read_row api. A common usecase is to process very different CSV files for analysis. How would one know in advance the number of columns and formulate that in the source code then? Is there another way to populate a variadic function?

from fast-cpp-csv-parser.

helmingstay commented on July 21, 2024

On Wed, Dec 14, 2016 at 6:32 PM, Shivkumar Shivaji ***@***.*** > wrote: How would one know in advance the number of columns and formulate that in the source code then? Is there another way to populate a variadic function?

Be careful to distinguish between a variadic *function* ( http://en.cppreference.com/w/cpp/utility/variadic) and a variadic function *template* (used here, see http://en.cppreference.com/w/cpp/language/parameter_pack). While I'm not Ben, the use of templates here is a fundamental design decision. And a template _must_ be fully specified at compile time. *Maybe* you could specify an upper bound of the number of columns, and then generate all possible instantiations. *Maybe* you could store these definitions in an std::array, and then use some sort of typedecl to select the right type at compile time? That doesn't solve the column name problem. All told, it seems like this particular library isn't a very good fit for your use-case? best, Christian

…

-- A man, a plan, a cat, a ham, a yak, a yam, a hat, a canal – Panama! http://www.x14n.org

from fast-cpp-csv-parser.

sshivaji commented on July 21, 2024

The speed of this library is very good. The limitation of the static column typing I think can be solved. If you have a branch or commit I can test, I will be more than happy. For now, I will look at your proposal and sample code above and see if I can fix it.

Thanks for pointing out the difference between a variadic template and a variadic function!

from fast-cpp-csv-parser.

ben-strasser commented on July 21, 2024

Hi, thanks for the feedback. :) This has been asked several times. There are several important features that currently exist, that would conflict in an, as far as I can tell, non-resolvable manner. Especially the implicit column reordering is problematic. Look at the older issues for a more detailed rational. If all you want is the speed, then you can also use LineReader and populate a vector<char*> by splitting the line at commas and replacing them by a 0-byte. For example: ``cpp LineReader in(file_name); while(char*line = in.next_line()){ vector<char*>row; row.push_back(line); while(*line != '\0'){ if(*line == ','){ *line = '\0'; row.push_back(line+1); } ++line; } // do something with row } `` Warning: untested code. Best Regards Ben Strasser

from fast-cpp-csv-parser.

Passing a vector of columns to be populated instead of names about fast-cpp-csv-parser HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent