dashbitco / nimble_csv Goto Github PK

View Code? Open in Web Editor NEW

747.0 747.0 51.0 3.8 MB

A simple and fast CSV parsing and dumping library for Elixir

Home Page: https://hexdocs.pm/nimble_csv

Elixir 100.00%

csv-parsing elixir

nimble_csv's People

Contributors

Stargazers

Watchers

Forkers

michalmuskala wojtekmach behrendtio asconix-old robmckinnon hhy5277 caike andrzejkiszka tilo mitizhi jrissler olleolleolle pawkahub mkompanets sadfuzzy hammeraj neo4reo tsutsu mirjoy namjae pragtob sneako lostkobrakai neoarcanjo mikl elpikel adrianomitre fcevado nirev costantinimatteo mathieuprog ioavv mcrumm rutaba vishal-h tiagoefmoraes verdi-forum-nord liveforeverx kianmeng jwbaldwin wigny romulogarofalo jeregrine viniciusmuller alimcongji pubcomfinhe skota axelson diasbruno kzlsakal tfiedlerdejanze

nimble_csv's Issues

Escaping double quotes within text.

NimbleCSV.define(MyParser, separator: "|", escape: "\"")
MyParser.parse_string "name|age\njohn|27\nsay: \"Hello\"|32"

Results in:

** (NimbleCSV.ParseError) unexpected escape character " in "say: \"Hello\"|32"
    deps/nimble_csv/lib/nimble_csv.ex:348: MyParser.separator/5
    deps/nimble_csv/lib/nimble_csv.ex:281: anonymous fn/4 in MyParser.parse_enumerable/2

Whereas expected result would be:
[["john", "27"], ["say: \"Hello\"", "32"]]

parse_stream vs parse_string

I was comparing the normal version of CSV parser:

airports_csv()
    |> File.read!()
    |> CSV.parse_string()
    |> Enum.map(fn row ->
      %{
        id: Enum.at(row, 0),
        type: Enum.at(row, 2),
        name: Enum.at(row, 3),
        country: Enum.at(row, 8)
      }
    end)
    |> Enum.reject(&(&1.type == "closed"))

with the stream version

airports_csv()
    |> File.stream!()
    |> CSV.parse_stream()
    |> Stream.map(fn row ->
      %{
        id: :binary.copy(Enum.at(row, 0)),
        type: :binary.copy(Enum.at(row, 2)),
        name: :binary.copy(Enum.at(row, 3)),
        country: :binary.copy(Enum.at(row, 8))
      }
    end)
    |> Stream.reject(&(&1.type == "closed"))
    |> Enum.to_list()

And while measure it with :timer.tc/1 I have notice that the stream version is much slower. first version takes around 3 second, while the stream version 44 seconds.
I was expecting the stream version to be faster (below 1 second). Am I doing something wrong here?

I'm using
:nimble_csv, "~> 1.2"

BTW this example is taken from the "Concurrent Data Processing with Elixir" where the stream version is 5 times faster.

Error handling guidance

Currently when parsing invalid CSV a ParseError is raised. This behavior isn't documented in function docs/type specs and would seem to violate the widespread convention of naming functions that can raise with a !.

I would propose to add some documentation around that (and possibly fix the type spec?)

As part of that it would be nice to offer some guidance on how to approach parsing CSV that may be invalid. Is catching the error the best approach?

Crash on Unexpected Escape Character

I am getting this error from Nimble CSV when it attempts to parse a CSV:

Elixir.NimbleCSV.ParseError: unexpected escape character " in <<80, 75, 3, 4, 20, 0, 6, 0, 8, 0, 0, 0, 33, 0, 124, 108, 152, 22, 108, 1, 0, 0, 160, 5, 0, 0, 19, 0, 8, 2, 91, 67, 111, 110, 116, 101, 110, 116, 95, 84, 121, 112, 101, 115, 93, 46, 120, 109, 108, 32, ...>>
  File "lib/nimble_csv.ex", line 271, in NimbleCSV.RFC4180.separator/5
  File "lib/nimble_csv.ex", line 229, in anonymous fn/4 in NimbleCSV.RFC4180.parse_enumerable/2
  File "lib/enum.ex", line 1025, in anonymous fn/3 in Enum.flat_map_reduce/3
  File "lib/stream.ex", line 1384, in Stream.do_unfold/4
  File "lib/enum.ex", line 1023, in Enum.flat_map_reduce/3
  File "lib/nimble_csv.ex", line 176, in NimbleCSV.RFC4180.parse_enumerable/2
  File "lib/models/submission.ex", line 237, in APITournament.Models.Submission.create_resolver/2
  File "lib/absinthe/resolution.ex", line 184, in Absinthe.Resolution.call/2

Transform stream to line based one before trying to parse

Sometimes streams are not yet line based (e.g. when streaming over http). I'm wondering if it makes sense for nimble_csv to optionally deal with such by doing something akin to this before parsing the data:

chunk_fun = fn element, acc ->
  parts = String.split(element, "\n")

  case List.pop_at(parts, -1) do
      {nil, []} -> {[], acc}
      {new_acc, []} -> {[], new_acc}
      {new_acc, [h | t]} -> {[acc <> h | t], new_acc}
  end
end
after_fun = fn
  "" -> []
  acc -> [acc]
end

[
    "abc",
    "def\n",
    "abc\ndef",
    "abc\ndef\nmore",
    "\nok"
]
|> Stream.transform(fn -> "" end, chunk_fun, after_fun)
|> Enum.into([])
|> IO.inspect()
# ["abcdef", "abc", "defabc", "def", "more"]

NimbleCSV.RFC4180 and trailing CRLF

For the default parser, trailing \r is added:

iex> NimbleCSV.RFC4180.parse_string("name,last,year\r\njohn,doe,1986\r\n")
[["john", "doe", "1986\r"]]

I dealt with it by creating a parser with reversed order of newline markers:

iex> NimbleCSV.define(MyCSV, newlines: ["\r\n", "\n"])
iex> MyCSV.parse_string("name,last,year\r\njohn,doe,1986\r\n")
[["john", "doe", "1986"]]

but maybe it's worth addressing in RFC4180?

Excel-Friendly default implementation

I'm wondering if there could be an excel friendly default implementation besides NimbleCSV.RFC4180 based on https://underthehood.meltwater.com/blog/2018/08/08/excel-friendly-csv-exports-with-elixir/.

This seems to be working fine for recent versions of Excel, Numbers, OpenOffice based on my quick research, while RFC4180 just produces garbage in Excel.

Count of entries?

Is there a way to get the count of entries in the file?

Dialyzer issue

:0:unknown_function
Function NimbleCSV.ParseError.exception/1 does not exist.

docker_deps/nimble_csv/lib/nimble_csv.ex:193:callback_info_missing
Callback info about the NimbleCSV behaviour is not available.

I am currently experiencing this issue in a project based on elixir 1.6.6.

Using fields from rows prevents the entire row from being freed

Because larger binaries are stored as Refc binaries, the fields are stored as a match contexts, which means that using a single column from parse_stream or parse_string will prevent the entire origin string from being freed. This means that either NimbleCSV should automatically copy the data, or that it should be made very clear the implications of using a field from the row.

Relevant Erlang Docs

Perhaps define should create a parser which will copy by default (since this is a nasty side effect) but have an option to disable copying for power users.

Defining a parser causes the file to always be recompiled

Is this expected? I'm pretty new to elixir but it seems like unwanted behavior. I would expect it to use the already compiled file when running the project the second time.

Example tsv.ex:

NimbleCSV.define(TSV, separator: "\t")

defmodule MyTest do
  IO.puts "loading"
end

Running the program 2 times

matthewbender ~/codebender/tsv $ iex -S mix
Erlang/OTP 19 [erts-8.0.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Compiling 1 file (.ex)
loading
Generated tsv app
Interactive Elixir (1.3.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)> System.halt
matthewbender ~/codebender/tsv $ iex -S mix
Erlang/OTP 19 [erts-8.0.2] [source] [64-bit] [smp:4:4] [async-threads:10] [hipe] [kernel-poll:false] [dtrace]

Compiling 1 file (.ex)
loading
Interactive Elixir (1.3.2) - press Ctrl+C to exit (type h() ENTER for help)
iex(1)>

I wound't expect the 2nd "Compiling 1 file (.ex)", when no changes to any file were made between runs.

current version defined in memory

I tried to call the API (in which I use NimpleCSV) for a second time, but I got this error:

current version defined in memory

for line:

NimbleCSV.define(MyParser, separator: ",", escape: "\"")

So, should I remove MyParser above when I finish? or shall I define it only once for all for API usage? where should that be?

define/2 should have a list as default second parameter.

it's pretty weird that i need to pass an empty list when i just want the default options to work.
I already have the changes for that, if you agree, i'll open the PR.

Dialyzer error with NimbleCSV.RFC4180

I have an error with dialyzer, by using this dummy parse_stream, I'm obtaining no local return

defmodule MyMod

  @spec base_stream(binary) :: Enumerable.t()
  def base_stream(srcfile) do
    srcfile
    |> File.stream!([:read, :compressed, :utf8, read_ahead: 10_000])
    |> NimbleCSV.RFC4180.parse_stream(skip_headers: false)
  end

end

Finding suitable PLTs
Checking PLT...
...
PLT is up to date!
No :ignore_warnings opt specified in mix.exs and default does not exist.

Starting Dialyzer
[
  check_plt: false,
  init_plt: ~c"/app/_build/dev/dialyxir_erlang-26.0.2_elixir-1.15.4_deps-dev.plt",
  files: [...],
  warnings: [:unknown]
]
Total errors: 9, Skipped: 0, Unnecessary Skips: 0
done in 0m5.13s

...
________________________________________________________________________________
lib/tasks/image_captioning/prepare_image.ex:166:no_return
Function base_stream/1 has no local return.
________________________________________________________________________________

...

________________________________________________________________________________
done (warnings were emitted)
Halting VM with exit status 2
zsh returned exit code 2

My real codes is more or less:

defmodule Foo

  def run(srcfile) do
    [["1", "foo"]]
    # srcfile
    # |> base_stream()
    |> Stream.map(fn [id, image_link] -> %{"id" => id, "image_link" => image_link} end)
    |> ...
  end

end

If I use the dummy example [["1", "foo"]] dialyzer doesn't fail, if I use it with the base_stream it fails. Any idea how to overcome this? I think it is a bug in RFC4180 parser definition

Fails to compile on elixir 1.8 latest master

I'm not sure if this is an error nimble_csv or a regression in elixir.

== Compilation error in file lib/nimble_csv.ex ==
** (CompileError) lib/nimble_csv.ex:411: undefined variable "offset" in bitstring segment. If the size of the binary is a variable, the variable must be defined prior to its use in the binary/bitstring match itself, or outside the pattern match
    (elixir) src/elixir_bitstring.erl:197: :elixir_bitstring.expand_each_spec/5
    (elixir) src/elixir_bitstring.erl:168: :elixir_bitstring.expand_specs/6
    (elixir) src/elixir_bitstring.erl:41: :elixir_bitstring.expand/8
    (elixir) src/elixir_bitstring.erl:10: :elixir_bitstring.expand/4
    expanding macro: NimbleCSV.RFC4180.newlines_escape!/1

Can't Get Headers

I run CSV.parse_string(some_string, header: false) and get the same output as when I run CSV.parse_string(some_string, header: true).

some_string looks like:

"id,probability\nn00e1d5ebcf3d4d5,0.51228\nnb0b4cce48b78471,0.51814\nn...

so I'd expect to get ["id", "probability"] in the header: false case, but I don't get it. I always get the same result:

[["n00e1d5ebcf3d4d5", "0.51228"], ["nb0b4cce48b78471", "0.51814"], ... ]

I have nimble_csv 0.3.0. Am I misusing the header option?

Parsing slower after each parse_stream call

Hi! I'm experiencing an odd issue when parsing large CSV files (in the order of hundreds of megabytes): basically if I call f/0 repeatedly, each new call is slower than the previous one

defmodule ElixirBug do
  def f do
    "data.csv"
    |> File.stream!()
    |> NimbleCSV.RFC4180.parse_stream()
    |> Enum.map(fn x -> Enum.at(x, 31) end) # column 31 is sometimes a quoted string
  end

  def test do
    Enum.map(1..5, fn n ->
      {t, x} = :timer.tc(&f/0)
      IO.puts("Round ##{n}: #{t/1_000_000}s")
      x
    end)
  end
end

Erlang/OTP 22 [erts-10.6.4] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]
                                                                                                                       
Compiling 1 file (.ex)                                                                                                 
Interactive Elixir (1.10.1) - press Ctrl+C to exit (type h() ENTER for help)                                           
iex(1)> ElixirBug.test                                                                                                 
Round #1: 4.212789s                                                                                                    
Round #2: 6.072119s                                                                                                    
Round #3: 8.055231s                                                                                                    
Round #4: 9.855778s                                                                                                    
Round #5: 12.035544s                                                                                                   
[                                                                                                                      
  ["RAILROAD R/W", ...

In addition, the iex session also seems slower: if I call recompile it takes a couple of seconds instead of a few milliseconds to finish (and since this library is relatively simple it makes me wonder if this could actually be a problem with BEAM).

Support no escape character

Setting escape: nil results in compilation errors.

Dialyzer error in newlines_separator!

When running dialyzer in my project (which relies on nimble_csv), I get the following error:

deps/nimble_csv/lib/nimble_csv.ex:523:unmatched_return
The expression produces a value of type:

[integer(), ...]

but this value is unmatched.

By adding dialyzer to this repo with the following settings, I can confirm that the problem is in nimble_csv:

dialyzer: [
  flags: [:unmatched_returns, :error_handling, :extra_return, :missing_return, :underspecs],
  plt_file: {:no_warn, "priv/plts/nimble_csv.plt"},
  plt_core_path: "priv/plts/core.plt",
]

While the error points at this line, it's clear the the missing match occurs somewhere within newlines_separator!.

Support of IANA standard for TSV

From having a short look at the documentation and the code, it seems it's not possible to define a parser for the IANA standard for TSV with NimbleCSV, since escaping is currently mandatory, but TSV works without any escaping by simply disallowing tabs in the values.

Am I missing something and defining a parser for TSV is already possible? If not, would it be hard to change the escaping behavior accordingly and were you willing to accept a PR?

Slow parsing with refc binaries?

I believe this is more likely an issue with OTP but since I'm experiencing it while using NimbleCSV I thought it would be appropriate to first ask/report it here: when I try to parse a large CSV file (more than a 100000 lines) where almost every row contains at least one string field longer than 64 bytes it takes a very long of time to finish. In comparison, when every field is less or equal than 64 bytes then the parsing is always almost immediate.

Here is what I'm doing to test this behavior:

Example

defmodule CsvTest do
  def parse(name) do
    Path.join(["priv", name])
    |> File.stream!
    |> NimbleCSV.RFC4180.parse_stream
    |> Enum.map(fn [_, x, _] -> :binary.copy(x) end)
  end

  # 1.csv contains 100000 lines of: 1,aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,3
  # 2.csv contains 100000 lines of: 1,aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa,3
  def test do
    Enum.each(~w(1.csv 2.csv), fn name ->
      {t, _} = :timer.tc(&parse/1, [name])
      IO.puts("#{name}: #{t/1_000_000}s")
    end)
  end
end

Version

Erlang/OTP 23 [erts-11.0] [source] [64-bit] [smp:4:4] [ds:4:4:10] [async-threads:1] [hipe]

IEx 1.10.3 (compiled with Erlang/OTP 22)

These are the results I get:

1.csv: 0.239626s
2.csv: 8.42937s

The first file, which I believe only needs heap binaries, is parsed 35x faster than the one that requires refc binaries. Interestingly, the rate of this slowdown is superlinear (eg if the files were 140 thousands lines long the difference would be 50 fold), so last night while playing a bit with all this (and after reading this issue with :binary.split) I found that passing read_ahead: 1 (for instance) or encoding: :utf8 to File.stream! seems to fix the problem, but I'm not sure why 🤷

Thanks!

Is there any easy way to keep the headers as the keys of each row of output, if the columns are not known in advance?

In the CSV library, one can simply specify headers: true to get a result like

iex> ["a;b","c;d", "e;f"]
iex> |> Stream.map(&(&1))
iex> |> CSV.decode!(separator: ?;, headers: true)
iex> |> Enum.take(2)
[
  %{"a" => "c", "b" => "d"},
  %{"a" => "e", "b" => "f"}
]

However, I haven't been able to figure out a way to easily do it with nimble_csv.

There is an example of

"name\tage\njohn\t27"
|> MyParser.parse_string
|> Enum.map(fn [name, age] ->
  %{name: name, age: String.to_integer(age)}
end)

However, this requires me to know in advance:

How many columns the file will have
The name of each column

My use case is to handle CSV files with potentially unknown columns. But I need to produce a map as in the first example.

Seems that the only way to do it with nimble_csv would be to

specify headers: false
take the head of the resulting list as the keys
Perform a Enum.map on the tail of the list to add the keys one by one.

Which seems to be quite convoluted. Did I not understand the library correctly and there's an easy way to do it? Or is my use case not suited to nimble_csv here/I should rethink my approach?

Allow overriding separator in parse functions

Hello, and apologies if this already discussed.

We could update the init_parser/1 like this:

   defp init_parser(opts) do
        state = if Keyword.get(opts, :skip_headers, true), do: :header, else: :line
        separator = :binary.compile_pattern(Keyword.get(opts, :separator, @separator))
        {state, separator, :binary.compile_pattern(@escape)}
      end

That way the user is allowed to override the parser separator (and maybe also escape).

I understand that this contradicts with the idea that each Parser has a specified functionality,
but when you have to deal with different kind of separated files (commas, semicolons, tabs, spaces, pipes, etc),
it would really help to avoid creating multiple modules for just in case or having to rewrite-recompile-redeploy
if/when a new format arrives.

NimbleCSV.RFC4180.dump* is using `\n` instead of `\r\n`

The spec states that the line separator must be be \r\n, but it's currently using the default value \n.

https://tools.ietf.org/html/rfc4180#section-2

Each record is located on a separate line, delimited by a line
break (CRLF).

CR = %x0D ;as per section 6.1 of RFC 2234
LF = %x0A ;as per section 6.1 of RFC 2234
CRLF = CR LF ;as per section 6.1 of RFC 2234

workaround (and fix):

  NimbleCSV.define(NimbleCSV.RFC4180fixed,
    separator: ",",
    escape: "\"",
    line_separator: "\r\n",
    moduledoc: """
    A CSV parser that uses comma as separator and double-quotes as escape according to RFC4180.
    """
  )

parse_string/2 takes option `:headers`, but documentation says option is named `:header`

In the documentation, the option argument name claims to be :header:
https://github.com/plataformatec/nimble_csv/blob/v0.4.0/lib/nimble_csv.ex#L148

However, the tests and implementation use the option :headers:
https://github.com/plataformatec/nimble_csv/blob/v0.4.0/test/nimble_csv_test.exs#L17

It took us a little bit to figure out why the option wasn't working as expected.

Excel can not open files .xlsx and .xls

'.xlsx' files generated by NimbleCSV has problems when opening on Mac via Excel and Numbers apps.
They both throw this error

Excel cannot open the file '2021-04-01_invoices-2.xlsx' because the file format or file extension is not valid. Verify that the file has not been corrupted and that the file extension matches the format of the file.

But if I change file extension to .xls then both Excel and Numbers apps are able to open it but they still throw warning/error message

The file format and extension of '2021-04-01_invoices.xls' don't match. The file could be corrupted or unsafe. Unless you trust its source, don't open it. Do you want to open it anyway?

NimbleCSV.Spreadsheet.dump_to_iodata(rows)

rows

[
  ["Wendy Ruecker", "Wendy Ruecker", "20210010000004321", ~D[2021-04-01],
   ~D[2021-04-30], "Stark Heights 2948", nil, "68204", "Lake Macie", "Plan",
   "Platinum Plan", nil, nil, 11900, 10000],
  ["Trever Ziemann", "Trever Ziemann", "2021001000000432100", ~D[2021-04-01],
   ~D[2021-04-30], "Nienow Mountains 1133", nil, "51534", "Lake Orval", "Plan",
   "Gold Plan", nil, nil, 9520, 8000],
  ["Domingo Ward", "Domingo Ward", "2021001000000432102", ~D[2021-04-01],
   ~D[2021-04-30], "Emmerich Greens 2", nil, "82765", "Kenneth", "Plan",
   "Silver Plan", nil, nil, 5950, 5000]
]

Compilation warning

The parser is working like a charm! However, it seems that the compiler doesn't understand that my CSVParser module is defined through a macro, it gives me a warning:

function CSVParser.parse_stream/2 is undefined (module CSVParser is not available)

I'm using Elixir 1.3.0

Tag for 0.1.0

Hi, could you please add a git tag for 0.1.0?

Generating CSV

Is there a module for generating CSV from data?
Sorry, but I couldn't find it. Thanks.

newlines failing only with Streams

I have found a strange CSV with \r as the newline separator in the wild that fails to parse when using parse_stream, but seems to work with parse_string.

I attempted a fix but I had some trouble following the logic/macros.

(aside)
While trying to narrow down a smaller csv test case I struggled. I don't know if I'm dumb at using Sed but every time I did a sed -i 's/\r/\n/g' /tmp/mprop.csv then head /tmp/mprop.csv > /tmp/mprop-smol.csv then sed -i 's/\n/\r/g' /tmp/mprop-smol.csv it would produce different file line endings than the original. Same with with elixir and String.replace. I am probably very tired and doing something wrong and dumb or maybe its a special secret \r that I cannot reproduce.
(/aside)

Anyways here is the test case, the data is public and ~90mb csv so this should give you a case to work with.

Mix.install([
  {:req, "~> 0.3.6"},
  {:nimble_csv, "~> 1.2"},
])

mprop_file = "/tmp/mprop.csv"
unless File.exists?(mprop_file) do
  # public data feel free to run this
  Req.get!("https://data.milwaukee.gov/dataset/562ab824-48a5-42cd-b714-87e205e489ba/resource/0a2c7f31-cd15-4151-8222-09dd57d5f16d/download/mprop.csv", output: mprop_file)
end

NimbleCSV.define(CSV, newlines: ["\r"])

File.read!("/tmp/mprop.csv")
|> CSV.parse_string()
|> Enum.take(1) 

# Slow but succeeds.

File.stream!("/tmp/mprop.csv", read_ahead: 100_000)
|> CSV.parse_stream()
|> Enum.take(1) 

# Errors ** (NimbleCSV.ParseError) unexpected escape character " in "MAP_EXT\"\r\"0000005005\",\"\",\"2022\",\"....

Functions throwing exceptions in Stream

I have an issue, that is i guess coming from streams. The lib uses streams, so in the end when I start the stream by ... |> Stream.run I get an exception about e.g. wrong escaped chars etc.

What i want to do is to skip these lines. BUT: the error is triggered by Stream.run, so I loose actually the context to the Stream.map loop. I guess streams spawn processes under the hood, so how to deal with functions throwing exceptions in a stream?

Sample code: https://gist.github.com/asconix/27a032d255b47db4a4ee55d7aa1be0e0

Documentation on :headers option is confusing

I feel the documentation for the :headers option while parsing is confusing.

It states:

:headers - when false, no longer discard the first row. Defaults to true.

I had to test the option to see exactly how it works, and it seems to be the opposite of what I'd expect -- setting headers: true skips the headers row, and so I'd expect it to be called e.g. skip_headers.

It's easy enough to update the documentation, but what are your thoughts about deprecating :headers and changing it to :skip_headers?

Problem with comma when get a CSV from some api's

The API used to load the csv: https://github.com/GrandCru/GoogleSheets

Problem: When a field have a comma inside like "description, is, this!" The google Sheets Api load the field as ""description,is,this!"", as the Standard implementation of CSV Escape is ", this is a problem when loading and parse using the sheets api Elixir, because the parser truncate with comma inside the field

How to test: load a google spreadsheet, the fields need to have comma inside, use the google_sheets Elixir Lib to do this and nimble_csv.

This can be fixed changing the escape character when download the CSV from google API, but I can't figure out to change from comma to other escape in API.

Link to download from api is like this: https://docs.google.com/spreadsheets/d//export?gid=0&format=csv

make it work with use.

I've started to use NimbleCSV recently and using the define/2 function is very restrictive. Right now the defined module will only be able to do what NimbleCSV insert there, and I won't be able to add more functions to that module.
That's why I think working with use NimbleCSV, opts would be better. If you agree I can start working on making it support both.

Generated beam files contain absolute path for nimble_csv.ex

When a parser is defined in an umbrella app, the generated beam files contain an absolute reference to the nimble_csv.ex file inside the deps/nimble_csv folder. This is causing that absolute file path to be picked up in test coverage and coveralls reports.

Elixir:

  NimbleCSV.define(MyChildApp.Parsers.TsvParser, separator: "\t", escape: "\"")

Disassembled beam:

//Function  Elixir.MyChildApp.Parsers.TsvParser:dump/2
label007:  func_info            Elixir.MyChildApp.Parsers.TsvParser dump 2 //line /Users/binns/git/my_umbrella_app/deps/nimble_csv/lib/nimble_csv.ex, 458

mix test --cover output:

COV    FILE                                        LINES RELEVANT   MISSED
 42.1% /Users/binns/git/my_umbrella_app/deps/nimb      494       38       22
  0.0% lib/my_child_app.ex                                18        0        0

Usage with csv files that contain escaped double quote entries

Hello, I am a little unclear from the documentation how to use this library with a CSV file that contains an entry with already escaped double quote:

testfile:

1,"Hello \"World\""

test.exs:

NimbleCSV.define(MyParser, separator: ",", escape: ~S'"')

stdio = IO.stream(:stdio, :line)

stdio
|> Stream.map(&String.trim/1)
|> MyParser.parse_stream()
|> Enum.to_list()

I would expect this to succeed with [[1, "Hello \"World\""]], but I encounter this error:

➜ cat testfile | mix run test.exs
** (NimbleCSV.ParseError) unexpected escape character " in "Hello \\\"World\\\"\""
    deps/nimble_csv/lib/nimble_csv.ex:485: MyParser.escape/6
    deps/nimble_csv/lib/nimble_csv.ex:355: anonymous fn/4 in MyParser.parse_stream/2
    (elixir 1.10.4) lib/stream.ex:902: Stream.do_transform_user/6
    (elixir 1.10.4) lib/enum.ex:3383: Enum.reverse/1
    (elixir 1.10.4) lib/enum.ex:2984: Enum.to_list/1

writing a row with a comma in it doesn't appear to be escaped

I'd think a row with a comma in it would get quoted, but it doesn't seem to. I might be misunderstanding things entirely, but I've taken the example from here: https://en.wikipedia.org/wiki/Comma-separated_values#Basic_rules

Behaviour:

iex> NimbleCSV.RFC4180.dump_to_iodata([[1997, "Ford", "E350", "Super, luxurious truck"]])
[["1997", 44, "Ford", 44, "E350", 44, "Super, luxurious truck", 10]]

What I'd expect

iex> NimbleCSV.RFC4180.dump_to_iodata([[1997, "Ford", "E350", "Super, luxurious truck"]])
[["1997", 44, "Ford", 44, "E350", 44, "\"Super, luxurious truck\"", 10]]

Does nimblecsv expect commas in values will already be quoted? Or is this a bug?

Please make a v0.4.0 release on github

so the docs can link back to the right code, thanks!

currently the docs link to 404 page

Escape only strings with leading zeros

Hi,

Wanted to generate a CSV file where common spreadsheet tools interpret strings like "001" as strings, not numbers. One way to do that is to escape that cell. Tried to add 0 as reserved characters do not meet my needs as I don't want to escape other cells with 0 which are numbers.

Do you think there is a way to achieve the escaping of cells with leading zeros ?