Comments (8)
Aye. I'll leave this issue open for now. I will likely (some day) revisit the Serde integration in this crate and see whether it can be improved holistically. At the very least, we'll get better docs.
from rust-csv.
(Aside: Perhaps it would also make sense to document to which specification this crate intends to conform to?)
The docs could be better. The answer to this should probably be in the crate docs. But the answer is in other places:
- https://docs.rs/csv/latest/csv/struct.Reader.html#error-handling
- https://docs.rs/csv-core/latest/csv_core/struct.Reader.html#rfc-4180 (specifically mentions RFC 4180)
You are indeed correct that there really is no one agreed upon CSV specification that one can adhere to. As the links above elaborate on, the problem with RFC 4180 is that it's too strict. Essentially, the CSV world needs its version of HTML 5 (nothing is invalid). RFC 4180 is like XHTML (lots are invalid). For pretty much exactly the same reason: tons of real world CSV data is messy, and erroring on it just isn't useful (in most cases).
It certainly makes sense that
flexible
defaults totrue
Eh? flexible
defaults to false
for both Reader
and Writer
.
but I'd prefer if there was an option to support nested containers without changing the number of fields.
I propose the following ideas (not sure how feasible they are though):
All of your proposals appear to already be possible today. All you should need to do is implement Serialize
(or Deserialize
) for your type. Serialize it to a string, which CSV knows how to encode. Same for deserialization. Have you tried this appoach? If not, why not? And if so, what didn't work about it?
My understanding is that folks run into this nested container issue and expect it to be resolved automatically. It can't be. Not in any way that I can see without choosing a new serialization format to layer inside of CSV fields. It can't be resolved automatically because CSV doesn't support nested data. If you need to serialize nested data, then this crate can't really figure that out for you (aside from a few special cases). You either need to figure out how to convert it to a flattened representation or how to encode richer data inside of a single CSV field.
I suspect this is why all three of your proposals appear to have a common thread between them: they ask for the user to resolve the nested container issue manually by supplying some kind of transformation function. (Your third proposal does suggest providing some sensible defaults, but presumably the user would still need to make a choice.) As far as I know, resolving the nested container issue manually is not actually a problem because I believe it can already be done via Serde's framework. So there really isn't an issue there.
Instead, what I suspect people would like is something like, "hey if I have a nested container I don't want to hear about it, just JSON serialize it and stuff it into a single CSV field." But I'm pretty philosophically opposed to something like personally, and I don't perceive it as a common enough problem to really worry about too much.
from rust-csv.
The docs could be better. The answer to this should probably be in the crate docs. But the answer is in other places:
Oh, thanks, I missed that 🙈. I looked in multiple places (but focused on the Writer
) but should've used the search functionality for that.
Putting a (brief) version into the crate docs would be great IMO.
As the links above elaborate on, the problem with RFC 4180 is that it's too strict.
Yes, I was focused on the Writer
and not the Reader
(as nested containers are basically unsolvable for the Reader
due to the lack of a strict standard that supports such data structures).
For the Writer
it should make sense to be as strict as possible as much more guarantees can be made.
Or as RFC 4180 puts it (interoperability considerations):
Due to lack of a single specification, there are considerable
differences among implementations. Implementors should "be
conservative in what you do, be liberal in what you accept from
others" (RFC 793 [8]) when processing CSV files. An attempt at a
common definition can be found in Section 2.
(my interpretation/opinion would be conservative and strict)
Eh? flexible defaults to false for both Reader and Writer.
Oops, sorry - that's what I meant but I somehow wrote true
instead of false
.
All of your proposals appear to already be possible today. All you should need to do is implement Serialize (or Deserialize) for your type. Serialize it to a string, which CSV knows how to encode. Same for deserialization. Have you tried this appoach? If not, why not? And if so, what didn't work about it?
Yes, that would be ideal. In my "exotic" use case (sorry that it was quite hidden and too brief at the bottom in the "PS:") this isn't easily possible (AFAIK) as the type gets generated during compile time and depends on the API specification. It should theoretically be possible to either generate the types in advance (but this becomes problematic when the API specification changes) or to use reflection to transform the data at run time (this is what we're currently exploring but makes it more complex).
(But when I think more about it I could probably solve my particular use case with a generic Serialize
implementation for Vec<T>
(or for a few common T
s if a generic implementation isn't possible). I didn't realize this before so thanks a lot for the suggestion!)
I guess another limitation should be that this approach becomes problematic when serializing to multiple different formats as there can only be a single serde::ser::Serialize
implementation and one would probably only want to use flattening for CSV? (But I haven't looked into this so far / only glanced over the Serialize
documentation)
It can't be resolved automatically because CSV doesn't support nested data.
Yes, agreed :)
I suspect this is why all three of your proposals appear to have a common thread between them: they ask for the user to resolve the nested container issue manually by supplying some kind of transformation function.
Right
(Your third proposal does suggest providing some sensible defaults, but presumably the user would still need to make a choice.)
Yes. That could make sense if we find a few good, generic, and "universal" solutions but I guess it'd be better to just support the custom transformation function and put the possible solutions as examples in the documentation.
Instead, what I suspect people would like is something like, "hey if I have a nested container I don't want to hear about it, just JSON serialize it and stuff it into a single CSV field."
Ideally :) But I agree that this is just not possible with CSV.
Anyway, that custom user function (more of a hack than a proper solution) could make sense if the two potential limitations of the serde::ser::Serialize
approach, that I listed, make sense / are valid. If not, then Serialize
should be sufficient.
from rust-csv.
this isn't easily possible (AFAIK) as the type gets generated during compile time and depends on the API specification
You would likely need to generate the Serialize
impls for those types.
I guess another limitation should be that this approach becomes problematic when serializing to multiple different formats as there can only be a single
serde::ser::Serialize
implementation and one would probably only want to use flattening for CSV? (But I haven't looked into this so far / only glanced over theSerialize
documentation)
Yeah you probably need to use newtypes or even build up the infrastructure yourself to call different serialization functions. (This might require embedding the functions on the data type? Not sure.)
Anyway, that custom user function (more of a hack than a proper solution) could make sense if the two potential limitations of the
serde::ser::Serialize
approach, that I listed, make sense / are valid. If not, thenSerialize
should be sufficient.
My suspicion is that there is an isomorphism here where anything the csv
crate can do is possible with Serialize
. I will say though that I haven't even thought about how a customer user function would even work in the serializer as it exists today. It's possible that it would be quite hokey.
The other hesitance you'll run into here is that the Serde integration in this crate is absolutely hideous. I would encourage you to look at it. And the vast majority of all issues/bugs/feature-requests on this repository are related to Serde integration. It is just overall simultaneously miserable to support (for CSV specifically) but also extremely convenient. What this means is that even if you're stuck, it's unlikely there's any reasonable path forward in this crate itself in any reasonable time span.
from rust-csv.
You would likely need to generate the Serialize impls for those types.
Yes, my initial hope was that I could override the Serialize
implementation for Vec<T>
but I completely forgot that Rust doesn't allow this (for good reasons but it would've been a handy hack here).
Having to replace all vectors with a custom vector type is unfortunately likely a dealbreaker (at least with my limited Rust knowledge) for my use case since the types are structures with quite a few fields and everything gets generated from the API specification. At that point it would probaboy be much easier to use a different approach/hack to handle the serilization.
or even build up the infrastructure yourself to call different serialization functions
That would also be handy for my use case. I'd like to call a custom serialization function for Vec<T>
/ change the serialization but I haven't found a (clean) way to do so and I'm not sure if it's possible (based on https://stackoverflow.com/questions/60008192/how-to-implement-a-custom-serialization-only-for-serde-json it also doesn't look good).
I'll look around a bit more but it doesn't look good for avoiding custom types.
In case it helps someone: Here's a PoC/hack how one could serialize vectors into a single CSV field using a custom type:
use anyhow::Result;
use csv::WriterBuilder;
use serde::Serialize;
use serde::Serializer;
struct MyVec<T>(Vec<T>);
impl<T> Serialize for MyVec<T>
where
T: Serialize + std::fmt::Debug,
{
fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
where
S: Serializer,
{
let s = format!("{:?}", self.0);
serializer.serialize_str(&s)
}
}
#[derive(serde::Serialize)]
struct Record {
a: MyVec<i32>,
b: MyVec<i32>,
}
fn main() -> Result<()> {
let record = Record {
a: MyVec(vec![1, 2, 3]),
b: MyVec(vec![4, 5, 6]),
};
let mut wtr = WriterBuilder::new()
.has_headers(false)
.from_path("out.csv")?;
wtr.serialize(record)?;
wtr.flush()?;
Ok(())
}
The result:
$ cat out.csv
"[1, 2, 3]","[4, 5, 6]"
from rust-csv.
I wasn't necessarily thinking about custom impls for vecs, but custom impls for your API types.
from rust-csv.
And serde_derive lets you specify custom serialization functions for individual fields without introducing your own newtype wrapper.
from rust-csv.
I wasn't necessarily thinking about custom impls for vecs, but custom impls for your API types.
Yes, but again, the problem is that the implementation/source-code of the API types gets auto-generated by a crate and a custom build script (plus I wouldn't really like to write a serializer for a large struct just to modify how a few fields should get serialized). But I'd call that an exotic use case and it's pretty broken tbh (I never liked the idea of it in the first place). That crate is also very limited and should even be unmaintained so we'll better replace that part :)
And serde_derive lets you specify custom serialization functions for individual fields without introducing your own newtype wrapper.
Right, the #[serde(deserialize_with = "path")]
macro (https://serde.rs/variant-attrs.html#deserialize_with) seems quite useful for that purpose, thanks! :)
That should be ergonomic enough to solve such use cases with minimal effort (which should be the goal, at least IMO).
Anyway, we can close this issue if you want. I'm not that happy with the current default but, considering your responses, I also don't really see a significantly better solution anymore (and prohibiting nested containers by default for correctness also doesn't sound really good).
I guess the only potentially actionable TODOs would be to further improve the documentation a little bit but hopefully this issue will also be discoverable enough to help a bit in the meantime.
from rust-csv.
Related Issues (20)
- Use `#[non_exhaustive]` tag instead of manually `__Nonexhaustive` variant
- A nested struct deserializer problem HOT 3
- Program crashes caused by inappropriate parameter sizes. HOT 1
- Deserialize a field to an empty Vec<>
- Disable line terminator config HOT 5
- `write_byte_record` and `write_field` does not mix well and this is not properly documented. HOT 1
- Feature: Manually add headers to new CSV, using proposed csv::Writer.push_header() function HOT 2
- Space after delimiter messes with quoting HOT 5
- How to writing column? HOT 1
- Add Support for serde_transcode::transcode HOT 2
- Deserializing a String field inside a flattende struct fails if the field contains a valid integer HOT 4
- Feature request: please add `invalid_result` deserializer HOT 7
- Can the separator in CSV format support the char type? HOT 1
- Header Implementations "Content-Disposition". HOT 2
- How to serialize to a byte buffer HOT 16
- Automatically add an index number to headers that contain duplicate fields. HOT 2
- Error During the Deserialization of `String` Fields from Nested `struct`s HOT 4
- unexpected behavior (bug?) when using serde untagged with an enum to deserialize csv data HOT 1
- Serializing `None` vs serializing empty string HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rust-csv.