Decode CSV in the most boring way possible.
Other CSV libraries have exciting, innovative APIs... not this one! Pretend you're writing a JSON decoder, gimme your data, get on with your life.
import Csv.Decode as Decode exposing (Decoder)
decoder : Decoder ( Int, Int, Int )
decoder =
Decode.map3 (\r g b -> ( r, g, b ))
(Decode.column 0 Decode.int)
(Decode.column 1 Decode.int)
(Decode.column 2 Decode.int)
csv : String
csv =
"0,128,128\r\n112,128,144"
Decode.decodeCsv Decode.NoFieldNames decoder csv
--> Ok
--> [ ( 0, 128, 128 )
--> , ( 112, 128, 144 )
--> ]
However, in an effort to avoid a common problem with elm/json
("how do I decode records with more than 8 fields?") this library also exposes a pipeline-style decoder for records:
import Csv.Decode as Decode exposing (Decoder)
type alias Pet =
{ id : Int
, name : String
, species : String
, weight : Maybe Float
}
decoder : Decoder Pet
decoder =
Decode.pipeline Pet
|> Decode.required (Decode.field "id" Decode.int)
|> Decode.required (Decode.field "name" Decode.string)
|> Decode.required (Decode.field "species" Decode.string)
|> Decode.required (Decode.field "weight" (Decode.blank Decode.float))
csv : String
csv =
"id,name,species,weight\r\n1,Atlas,cat,14.5\r\n2,Pippi,dog,"
Decode.decodeCsv Decode.FieldNamesFromFirstRow decoder csv
--> Ok
--> [ { id = 1, name = "Atlas", species = "cat", weight = Just 14.5 }
--> , { id = 2, name = "Pippi", species = "dog", weight = Nothing }
--> ]
Yep!
Use decodeCustom
.
It takes a field and row separator string, which can be whatever you need.
Yes, there are! And while I appreciate the hard work that other people have put into those, there are a couple problems:
First, you need to put together multiple libraries to successfully parse CSV.
Usually you'll use something like lovasoa/elm-csv
to parse into a List (List String)
, and then something like ericgj/elm-csv-decode
to convert from a grid of strings into the values you care about.
Props to those authors for making their hard work available, of course, but this situation bugs me!
I don't want to have to pick different libraries for parsing and converting.
I just want it to work like elm/json
where I write a decoder, give the package a string, and handle a Result
.
This should not require so much thought!
The second thing, and the one that prompted me to finally do something about this, is that none of the libraries available implement andThen
.
Sure, you can use a Result
to do whatever you like, but there's not a good way to combine make decoding decisions dependent on the fields you see.
That's something I'd like to add! Probably even in the 1.x line, if possible! The thing is, I'm writing this to fix a specific problem (decoding) and I want to release 1.0.0 and move on to using it to fix the problem.
It'd be a nice thing to contribute! But open an issue to talk about the API first, OK?
This project uses Nix to manage versions (but just need a nix
installation, not NixOS, so this will work on macOS.)
Install that, then run nix-shell
to get into a development environment.
Things I'd appreciate help with:
-
Testing the parser on many kinds of CSV and TSV data. If you find that some software produces something that this library can't handle, please open an issue with a sample!
-
Feedback on speed. For the data sizes I'm working with in my use of this library, speed is unlikely to be an issue. If you're parsing a lot of data, though, it may be for you. If you find that this library has become a bottleneck in your application, please open an issue.
-
Feedback on decoders for things you find necessary (but please open an issue and talk through it instead of jumping straight to a PR!) Some things I've thought of:
parse : Parser.Parser a -> Decoder a
andjson : Json.Decode.Decoder a -> Decoder a
. The reason they're not in the library now is because a)fromResult
exists to make those easier and b) I don't want to add the new dependencies onelm/json
without a good reason. If you find yourself writing things like this constantly, though, let's talk about them!
Things I'd appreciate seeing PRs for, which we probably don't need to coordinate much on other than a heads-up that you're doing the work:
-
Benchmarking and performance improvements. Internally, we just use
List
for everything. Some smart application ofArray
could potentially perform a lot better, but I have held off optimizing since I haven't measured! -
Docs. Always docs. Forever docs.
I want my open-source activities to support projects addressing the climate crisis (for example, projects in clean energy, public transit, reforestation, or sustainable agriculture.) If you are working on such a project, and find a bug or missing feature in any of my libraries, please let me know and I will treat your issue as high priority. I'd also be happy to support such projects in other ways. In particular, I've worked with Elm for a long time and would be happy to advise on your implementation.
elm-csv
is licensed under the BSD 3-Clause license, located at LICENSE
.