Giter Site home page Giter Site logo

Comments (8)

BurntSushi avatar BurntSushi commented on June 3, 2024 1

Aye. I'll leave this issue open for now. I will likely (some day) revisit the Serde integration in this crate and see whether it can be improved holistically. At the very least, we'll get better docs.

from rust-csv.

BurntSushi avatar BurntSushi commented on June 3, 2024

(Aside: Perhaps it would also make sense to document to which specification this crate intends to conform to?)

The docs could be better. The answer to this should probably be in the crate docs. But the answer is in other places:

You are indeed correct that there really is no one agreed upon CSV specification that one can adhere to. As the links above elaborate on, the problem with RFC 4180 is that it's too strict. Essentially, the CSV world needs its version of HTML 5 (nothing is invalid). RFC 4180 is like XHTML (lots are invalid). For pretty much exactly the same reason: tons of real world CSV data is messy, and erroring on it just isn't useful (in most cases).

It certainly makes sense that flexible defaults to true

Eh? flexible defaults to false for both Reader and Writer.

but I'd prefer if there was an option to support nested containers without changing the number of fields.

I propose the following ideas (not sure how feasible they are though):

All of your proposals appear to already be possible today. All you should need to do is implement Serialize (or Deserialize) for your type. Serialize it to a string, which CSV knows how to encode. Same for deserialization. Have you tried this appoach? If not, why not? And if so, what didn't work about it?

My understanding is that folks run into this nested container issue and expect it to be resolved automatically. It can't be. Not in any way that I can see without choosing a new serialization format to layer inside of CSV fields. It can't be resolved automatically because CSV doesn't support nested data. If you need to serialize nested data, then this crate can't really figure that out for you (aside from a few special cases). You either need to figure out how to convert it to a flattened representation or how to encode richer data inside of a single CSV field.

I suspect this is why all three of your proposals appear to have a common thread between them: they ask for the user to resolve the nested container issue manually by supplying some kind of transformation function. (Your third proposal does suggest providing some sensible defaults, but presumably the user would still need to make a choice.) As far as I know, resolving the nested container issue manually is not actually a problem because I believe it can already be done via Serde's framework. So there really isn't an issue there.

Instead, what I suspect people would like is something like, "hey if I have a nested container I don't want to hear about it, just JSON serialize it and stuff it into a single CSV field." But I'm pretty philosophically opposed to something like personally, and I don't perceive it as a common enough problem to really worry about too much.

from rust-csv.

primeos-work avatar primeos-work commented on June 3, 2024

The docs could be better. The answer to this should probably be in the crate docs. But the answer is in other places:

Oh, thanks, I missed that 🙈. I looked in multiple places (but focused on the Writer) but should've used the search functionality for that.
Putting a (brief) version into the crate docs would be great IMO.

As the links above elaborate on, the problem with RFC 4180 is that it's too strict.

Yes, I was focused on the Writer and not the Reader (as nested containers are basically unsolvable for the Reader due to the lack of a strict standard that supports such data structures).

For the Writer it should make sense to be as strict as possible as much more guarantees can be made.
Or as RFC 4180 puts it (interoperability considerations):

Due to lack of a single specification, there are considerable
differences among implementations. Implementors should "be
conservative in what you do, be liberal in what you accept from
others" (RFC 793 [8]) when processing CSV files. An attempt at a
common definition can be found in Section 2.
(my interpretation/opinion would be conservative and strict)

Eh? flexible defaults to false for both Reader and Writer.

Oops, sorry - that's what I meant but I somehow wrote true instead of false.

All of your proposals appear to already be possible today. All you should need to do is implement Serialize (or Deserialize) for your type. Serialize it to a string, which CSV knows how to encode. Same for deserialization. Have you tried this appoach? If not, why not? And if so, what didn't work about it?

Yes, that would be ideal. In my "exotic" use case (sorry that it was quite hidden and too brief at the bottom in the "PS:") this isn't easily possible (AFAIK) as the type gets generated during compile time and depends on the API specification. It should theoretically be possible to either generate the types in advance (but this becomes problematic when the API specification changes) or to use reflection to transform the data at run time (this is what we're currently exploring but makes it more complex).
(But when I think more about it I could probably solve my particular use case with a generic Serialize implementation for Vec<T> (or for a few common Ts if a generic implementation isn't possible). I didn't realize this before so thanks a lot for the suggestion!)

I guess another limitation should be that this approach becomes problematic when serializing to multiple different formats as there can only be a single serde::ser::Serialize implementation and one would probably only want to use flattening for CSV? (But I haven't looked into this so far / only glanced over the Serialize documentation)

It can't be resolved automatically because CSV doesn't support nested data.

Yes, agreed :)

I suspect this is why all three of your proposals appear to have a common thread between them: they ask for the user to resolve the nested container issue manually by supplying some kind of transformation function.

Right

(Your third proposal does suggest providing some sensible defaults, but presumably the user would still need to make a choice.)

Yes. That could make sense if we find a few good, generic, and "universal" solutions but I guess it'd be better to just support the custom transformation function and put the possible solutions as examples in the documentation.

Instead, what I suspect people would like is something like, "hey if I have a nested container I don't want to hear about it, just JSON serialize it and stuff it into a single CSV field."

Ideally :) But I agree that this is just not possible with CSV.

Anyway, that custom user function (more of a hack than a proper solution) could make sense if the two potential limitations of the serde::ser::Serialize approach, that I listed, make sense / are valid. If not, then Serialize should be sufficient.

from rust-csv.

BurntSushi avatar BurntSushi commented on June 3, 2024

this isn't easily possible (AFAIK) as the type gets generated during compile time and depends on the API specification

You would likely need to generate the Serialize impls for those types.

I guess another limitation should be that this approach becomes problematic when serializing to multiple different formats as there can only be a single serde::ser::Serialize implementation and one would probably only want to use flattening for CSV? (But I haven't looked into this so far / only glanced over the Serialize documentation)

Yeah you probably need to use newtypes or even build up the infrastructure yourself to call different serialization functions. (This might require embedding the functions on the data type? Not sure.)

Anyway, that custom user function (more of a hack than a proper solution) could make sense if the two potential limitations of the serde::ser::Serialize approach, that I listed, make sense / are valid. If not, then Serialize should be sufficient.

My suspicion is that there is an isomorphism here where anything the csv crate can do is possible with Serialize. I will say though that I haven't even thought about how a customer user function would even work in the serializer as it exists today. It's possible that it would be quite hokey.

The other hesitance you'll run into here is that the Serde integration in this crate is absolutely hideous. I would encourage you to look at it. And the vast majority of all issues/bugs/feature-requests on this repository are related to Serde integration. It is just overall simultaneously miserable to support (for CSV specifically) but also extremely convenient. What this means is that even if you're stuck, it's unlikely there's any reasonable path forward in this crate itself in any reasonable time span.

from rust-csv.

primeos-work avatar primeos-work commented on June 3, 2024

You would likely need to generate the Serialize impls for those types.

Yes, my initial hope was that I could override the Serialize implementation for Vec<T> but I completely forgot that Rust doesn't allow this (for good reasons but it would've been a handy hack here).

Having to replace all vectors with a custom vector type is unfortunately likely a dealbreaker (at least with my limited Rust knowledge) for my use case since the types are structures with quite a few fields and everything gets generated from the API specification. At that point it would probaboy be much easier to use a different approach/hack to handle the serilization.

or even build up the infrastructure yourself to call different serialization functions

That would also be handy for my use case. I'd like to call a custom serialization function for Vec<T> / change the serialization but I haven't found a (clean) way to do so and I'm not sure if it's possible (based on https://stackoverflow.com/questions/60008192/how-to-implement-a-custom-serialization-only-for-serde-json it also doesn't look good).

I'll look around a bit more but it doesn't look good for avoiding custom types.


In case it helps someone: Here's a PoC/hack how one could serialize vectors into a single CSV field using a custom type:

use anyhow::Result;
use csv::WriterBuilder;
use serde::Serialize;
use serde::Serializer;

struct MyVec<T>(Vec<T>);

impl<T> Serialize for MyVec<T>
where
    T: Serialize + std::fmt::Debug,
{
    fn serialize<S>(&self, serializer: S) -> Result<S::Ok, S::Error>
    where
        S: Serializer,
    {
        let s = format!("{:?}", self.0);
        serializer.serialize_str(&s)
    }
}

#[derive(serde::Serialize)]
struct Record {
    a: MyVec<i32>,
    b: MyVec<i32>,
}

fn main() -> Result<()> {
    let record = Record {
        a: MyVec(vec![1, 2, 3]),
        b: MyVec(vec![4, 5, 6]),
    };

    let mut wtr = WriterBuilder::new()
        .has_headers(false)
        .from_path("out.csv")?;
    wtr.serialize(record)?;
    wtr.flush()?;

    Ok(())
}

The result:

$ cat out.csv
"[1, 2, 3]","[4, 5, 6]"

from rust-csv.

BurntSushi avatar BurntSushi commented on June 3, 2024

I wasn't necessarily thinking about custom impls for vecs, but custom impls for your API types.

from rust-csv.

BurntSushi avatar BurntSushi commented on June 3, 2024

And serde_derive lets you specify custom serialization functions for individual fields without introducing your own newtype wrapper.

from rust-csv.

primeos-work avatar primeos-work commented on June 3, 2024

I wasn't necessarily thinking about custom impls for vecs, but custom impls for your API types.

Yes, but again, the problem is that the implementation/source-code of the API types gets auto-generated by a crate and a custom build script (plus I wouldn't really like to write a serializer for a large struct just to modify how a few fields should get serialized). But I'd call that an exotic use case and it's pretty broken tbh (I never liked the idea of it in the first place). That crate is also very limited and should even be unmaintained so we'll better replace that part :)

And serde_derive lets you specify custom serialization functions for individual fields without introducing your own newtype wrapper.

Right, the #[serde(deserialize_with = "path")] macro (https://serde.rs/variant-attrs.html#deserialize_with) seems quite useful for that purpose, thanks! :)
That should be ergonomic enough to solve such use cases with minimal effort (which should be the goal, at least IMO).

Anyway, we can close this issue if you want. I'm not that happy with the current default but, considering your responses, I also don't really see a significantly better solution anymore (and prohibiting nested containers by default for correctness also doesn't sound really good).
I guess the only potentially actionable TODOs would be to further improve the documentation a little bit but hopefully this issue will also be discoverable enough to help a bit in the meantime.

from rust-csv.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.