Giter Site home page Giter Site logo

Comments (17)

Numpsy avatar Numpsy commented on June 15, 2024 1

I'm thinking it might be best to create a new provider instead of trying to fit these changes into the existing one.
What's your opinion about that?

fwiw, I've seen comments elsewhere about breaking the type providers for different data types into different packages so maybe that's a place to start for that, otherwise just creating something new might help to avoid any legacy stuff altogether?

from fsharp.data.

mlaily avatar mlaily commented on June 15, 2024 1

I went ahead and started a JsonProvider2 implementation.
https://github.com/fsprojects/FSharp.Data/compare/main...mlaily:FSharp.Data:jsonprovider2?expand=1

Here are some progress screenshots and side-by-side comparisons with the original JsonProvider...

An array is an array. It does not depend on the number of items it contains:
image

An empty string is still a string, not an optional value:
image

0, 1, or "yes" are not booleans:
image

from fsharp.data.

dlidstrom avatar dlidstrom commented on June 15, 2024 1

I just learnt about empty strings being inferred as null. Using this file as a sample:

{
  "Lines": [
    "             **",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "              *",
    "              *",
    "    *****     *",
    "   *******     ",
    "    *****     *",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "               "
  ]
}

Now reading this file using the provider results in an array for Lines being:

[|"             **"; "              *"; "              *"; "              *";
  "    *****     *"; "   *******     "; "    *****     *"; "              *"|]

Very surprising and interesting to me that I now find this newly created issue. Are the nuances of the JsonProvider documented somewhere? For example, I don't find any mention of null in the reference documentation.

from fsharp.data.

dlidstrom avatar dlidstrom commented on June 15, 2024 1

@goswinr It doesn't seem to help. Here is my test program and files:

#r "nuget: FSharp.Data"

open FSharp.Data

type Provider = JsonProvider<"sample.json", InferenceMode = InferenceMode.NoInference>

let sample = Provider.Load "test.json"

// show the Lines from json value and from generated type
printfn
  "%A\n%A"
  (sample.JsonValue.Properties() |> Array.find (fst >> ((=) "Lines")) |> snd)
  (sample.Lines)

sample.json:

{
  "Lines": [
    ""
  ]
}

test.json:

{
  "Lines": [
    "             **",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "              *",
    "              *",
    "    *****     *",
    "   *******     ",
    "    *****     *",
    "              *",
    "               ",
    "               ",
    "               ",
    "               ",
    "               "
  ]
}

It looks like the provider reads all lines, as can be seen when inspecting the JsonValue, but "provides" only a subset.
Output:

PS C:\> dotnet fsi .\Scratch.fsx
[
  "             **",
  "              *",
  "               ",
  "               ",
  "               ",
  "               ",
  "              *",
  "              *",
  "    *****     *",
  "   *******     ",
  "    *****     *",
  "              *",
  "               ",
  "               ",
  "               ",
  "               ",
  "               "
]
[|"             **"; "              *"; "              *"; "              *";
  "    *****     *"; "   *******     "; "    *****     *"; "              *"|]

from fsharp.data.

goswinr avatar goswinr commented on June 15, 2024

@mlaily I love this work! Is your JsonProvider2 working ? I guess these changes are not merging here soon. Could you publish your own nuget?

from fsharp.data.

mlaily avatar mlaily commented on June 15, 2024

@goswinr thanks for your interest.

This new json provider is still a prototype: there are a lot of things that are still changing or are not tested properly and can crash.

My branch is available but it's way too early for a release.

Handling null or missing values properly is proving challenging, but I'm learning a lot and making progress.

Ideally, I would also like to use System.Text.Json, but for now all my changes are based on the existing JsonValue parser.

I'll worry about how to release it when it works well enough for production use :)

from fsharp.data.

goswinr avatar goswinr commented on June 15, 2024

@dlidstrom does inferenceMode = InferenceMode.NoInference help? in type MyJson = JsonProvider<myFilePath, InferenceMode = InferenceMode.NoInference>

from fsharp.data.

Thorium avatar Thorium commented on June 15, 2024

Yes it seems MS puts quite an effort for System.Text.Json (also Newtonsoft would probably be faster than custom implementation).

System.Text.Json source code is open (https://github.com/dotnet/runtime/tree/main/src/libraries/System.Text.Json/src/System/Text/Json) and one of the maintainers @eiriktsarpalis has been active in F# community. I don't know if there would be any quick wins without rewriting a lot, and do we even want dependencies like System.Memory to FSharp.Data (to use Spans).

from fsharp.data.

Thorium avatar Thorium commented on June 15, 2024

His advice was:

just use Utf8JsonReader under wraps. That way make sure it's compliant and also take advantage of all the vectorization code it's using under the hood.

This makes sense:
The current FSharp.Data parser JsonValue (see: https://github.com/fsprojects/FSharp.Data/blob/main/src/FSharp.Data.Json.Core/JsonValue.fs#L39)
would actually map the System.Text.Json Utf8JsonReader values (just testing here: https://gist.github.com/Thorium/09a67ce08adee4fcc02ec7f0048e6962).
So the parsing could be replaced without re-building the whole domain model.

And now that FSharp.Data is split to multiple NuGet packages, maybe a dependency of System.Text.Json to FSharp.Data.Json.Core would be acceptable?

from fsharp.data.

mlaily avatar mlaily commented on June 15, 2024

I haven't worked on it for a while but I actually have started to use System.Text.Json on my branch.

I opted to use the JsonNode api instead of the JsonDocument one. https://learn.microsoft.com/en-us/dotnet/standard/serialization/system-text-json/use-dom

It seemed the most fitting for a balance between (user)-convenience and performance for a general use library.
(Keep in mind we want to be able to serialize json, not only deserialize it, and IMO an F# immutable wrapper over a mutable DOM is perfect. It's also very close to what is done with the XmlProvider and I think the two of them having a similar feature set is a good idea.)


The parsing somewhat works (not for all cases yet I think), but serialization is currently non-existent.

One big problem I had that I decided to ignore for now is that I didn't manage to make the dependency on System.Text.Json work everywhere when using the compiled provider.
In my tests it works with Visual Studio Code without issue, but I can't figure out how to make it work with Visual Studio! (IIRC a project using the compiled provider referencing STJ fails to compile, complaining that it doesn't find STJ or something similar)

Another thing is my JsonWrapper object backing every provided type after erasure is currently a struct. My reasoning is that this JsonWrapper is only there because there are members on JsonNode I don't want to expose, and if I could I would use JsonNode directly as the backing type for provided types, so I want to make it have the smallest footprint possible. I'm not 100% sure it's the right call...

Anyway if you are curious I pushed my current version to my branch (still in a somewhat PoC-state):

(side note: the JsonNode design makes it hard to use properly, e.g. because it uses null to represent json null tokens, and plenty of other strange decisions, but it still seems to me like the most fitting api to use for our case)

from fsharp.data.

Thorium avatar Thorium commented on June 15, 2024

I'm trying to get this working. So far:

  • I updated System.Text.Json to version 8.0.0. There are more serialization options which may be helpful.
  • I merged main to this branch. No conflicts, but now the build wanted me to run some code formatting tool: dotnet fake build -t Format
  • "build.cmd" failed as I didn't have .NET 6.0.5. That happens in main branch too, which is Ironic as this provider is using .NET5. "dotnet build" works but I need to use "build.cmd -t pack" to generate packet. So I copied the latest C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Ref\6.0.25 to required C:\Program Files\dotnet\packs\Microsoft.NETCore.App.Ref\6.0.5 and now I can run "build.cmd -t pack"
  • Packed. Then I unzipped \bin\FSharp.Data.6.3.0.nupkg and copied over my files in my Nuget cache at C:\Users\(myuser)\.nuget\packages\fsharp.data\6.3.0.
  • Then, to avoid missing STJ complaint, I copied netstandard2.0 dll versions of System.Text.Json.dll and Microsoft.Bcl.AsyncInterfaces.dll from my nuget-cache to a temp-folder and then to nuget-cache both folders FSharp.Data\6.3.0\typeproviders\fsharp41\netstandard2.0 and FSharp.Data\6.3.0\lib\netstandard2.0 (The temp is needed because NuGet-restores will remove them easily.) If those are needed in the package, there could be added some post-build deployment task.

Now I'm this far:

Error	FS3021	Unexpected exception from provided type 'FSharp.Data.JsonProvider2,Sample="{\"x\": 1}"' 
member 'GetMethods': The type provider 'ProviderImplementation.JsonProvider2' reported an error: 
The design-time type 'System.Text.Json.Nodes.JsonNode' utilized by a type provider was not found in the target reference assembly 
set '[tgt assembly FSharp.Core, Version=8.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a;
 ---<cut list of assemblies...but not System.Text.Json here>---
]'. You may be referencing a profile which contains fewer types than those needed by the type provider you are using.

from fsharp.data.

mlaily avatar mlaily commented on June 15, 2024

I forgot to mention another interesting thing about this: if you dotnet build a test project referencing FSharp.Data containing my version based on STJ, it works, and then VS is able to build it too! (at least until you clean your bin output folder...)

from fsharp.data.

Thorium avatar Thorium commented on June 15, 2024

As far as I know:

System.Text.Json is released on multiple frameworks:
net462 / netstandard2.0 / net6.0 / net7.0 / net8.0

TypeProvider uses the runtime-code on compile-time.
Different IDEs may use different versions (e.g. VS2022 may be running on net462, meanwhile VSCode can be running on net6 or whatever).

After compilation is the easy part, because if the customer referenced the System.Text.Json to their project (or other dependencies did it), then it probably picks the correct runtime-version anyway.

from fsharp.data.

Thorium avatar Thorium commented on June 15, 2024

I tried to play with this version but I think it goes very complex to try to expose the System.Text.Json types on design-time.

I think what I would rather prefer, is just to use the current "obsolete" JSON-model mapping and add 2 methods:

  • For input, current .Parse, to have sister method like ".ParseWithSystemTextJson"
  • For output, current .JsonValue to have sister method like ".JsonValueWithSystemTextJson" (or JsonSaveOptions overload but that's messy).

And so, not to expose System.Text.Json types to outside at all. Design-time would work as it is, no System.Text.Json, and runtime user would have option to use faster serialization if needed. Keep backward compatibility. Just map current Json types to Utf8JsonReader types with simple match clause. And, actually, if even more independency wanted, the System.Text.Json.dll could be loaded with reflection, so that if it's missing, the methods would just throw an error, that way there would not be dependency needed to Nuget-packet level at all to the actual library.

from fsharp.data.

mlaily avatar mlaily commented on June 15, 2024

I think exposing the underlying JsonNode is a desirable feature. The providers are not perfect, and it's very valuable as a user to be able to get full control and bypass the provider if you need it, and if the provider is based on STJ I would expect to get control of the STJ primitives, and not just some opaque wrapper. (Same as the XmlProvider)

I don't like the idea of choosing an implementation based on a method name but without actually seeing anything else change.
Also, throwing at runtime if STJ is missing seems worse than failing at build time...

I agree adding STJ as a dependency of FSharp.Data isn't that great, but I don't see a good path around that...

from fsharp.data.

Thorium avatar Thorium commented on June 15, 2024

hmm, maybe the ideas are both doable..

Maybe the "faster serializer" I was thinking (not as good as raw STJ as you pointed out) could be a separate nuget package depending both fsharp.data and system.text.json. That way current old provider can stay as it is, and old projects could benefit with minimal migration.

from fsharp.data.

mlaily avatar mlaily commented on June 15, 2024

That would probably be ideal if we could make the providers independent yeah.

Even though I'm aware some work has already been done to go in that direction, I'm not sure how easy it will be though.

from fsharp.data.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.