Giter Site home page Giter Site logo

Comments (5)

ovatsus avatar ovatsus commented on May 22, 2024

Related to this, I like the fact that we have a JsonValue that's useful on its own and then build a type provider on top of it to add type safety. This way we always have access to underlying JsonValue for edge cases when needed. The XmlProvider is similar, giving access to the underlying XElement.
I think we should have the same pattern for CsvProvider. I'm prototyping something that replaces the base classes CsvRow and CsvFile under RuntimeImplementation and promotes them to first class citizens, giving them more functionality, getting closer to the functionality of R's DataFrame (this includes dynamic lookup as described in #64). Then CsvProvider can built on top adding type safety, but you can always escape to the underlying values and do the .AsXxx like in the Json provider

from fsharp.data.

tpetricek avatar tpetricek commented on May 22, 2024
  • I think keeping CSV in the name is probably a good idea (I expect that people know the name and realize that this is actually working for wider range of tabular data sources and I think nobody really expects that the comma in Comma-Separated-Values has to be a comma :-))
  • I think we do not need multiple providers. The reason why this is needed for WorldBank is that there are default values for all parameters and so one version is not parameterized (WorldBank.Countries....). For CSV (etc.) we always need at least the input.

But:

  • I really like the idea of changing CsvProvider and CsvRow to follow the same style as JsonValue and be standalone types that people can use for dynamic access (I think we can pretty much follow the same pattern and have a module that adds dynamic operator and various AsXxx extensions).

    If you're happy to look into that, I'll leave it to you (if we do this, we'll need to add another *.fsx file with some documentation for the dynamic access).

from fsharp.data.

ovatsus avatar ovatsus commented on May 22, 2024

I've been doing a bunch of R code lately, so I'll try to convert some of it to use FSharp.Data instead so to get a feel what would work better as a JsonValue-like API

from fsharp.data.

ovatsus avatar ovatsus commented on May 22, 2024

With the latest changes from #122, we already have a decent enough dynamic API. I did a comparison between using the type provider, using the dynamic api, and using R here: https://gist.github.com/ovatsus/5354187

One advantage the dynamic version has is that we can slice the columns directly (https://gist.github.com/ovatsus/5354187#file-csvfile-fsx-L45), but we could eventually be able to do something like that with the typed version. On both cases, the average by column is not very easy to do, unless we consider a csv file to have similar operations to a matrix, and that's not easy to do in unless all the columns are of the same time

The R code is still more concise when doing filtering and mapping on the datasets, I think we have a lot of room of improvement here. FMat is able to get a Matlab/R-like syntax, maybe we could get some of that too. A possible idea would be something like this https://gist.github.com/ovatsus/5355630. I'm using the dynamic api and hardcoded a few things to make it look like the typed api. But even if we could make that work on the type provider version, I'm not very happy with it either. @tpetricek do you have any bright idea?

from fsharp.data.

ovatsus avatar ovatsus commented on May 22, 2024

I think the api is good enough for now, and the csv name is not ideal but it's ok, so I'm closing this. Let's keep things minimal until we have more real world feedback

from fsharp.data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.