Giter Site home page Giter Site logo

csvfiles.jl's People

Contributors

davidanthoff avatar github-actions[bot] avatar harryscholes avatar oxinabox avatar ralphas avatar rjplevin avatar wongjoel avatar xiaodaigh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

csvfiles.jl's Issues

BUG: importing csv file when bottom right cell is empty

I have tracked down an error I initially found in trying to use query.jl, but the root cause seems to be an apparent bug when importing csv files where the bottom right most cell is empty.

For example with the simple test csf file (attached test.csv) which looks like this
Screenshot 2021-06-12 at 17 12 48

when you import the empty bottom right most cell is coming up as #undef rather than missing
Screenshot 2021-06-12 at 17 14 03

This then causes all kinds of downstream havoc when doing anything with this cell. For example
Screenshot 2021-06-12 at 17 14 47

EDIT: To clarify this error happens when the bottom right code cell in the csv file is empty. When it is explicitly coded as NA the import works fine

MethodError: no method matching iterate(::CSVFiles.CSVFile)

I believe the following code should work, given the information on the README.md.

julia> using CSVFiles

julia> load("boo.csv")
1x4 CSV file
a │ b │ c │ d
──┼───┼───┼──
1234

julia> using TypedTables

julia> load("boo.csv") |> Table
ERROR: MethodError: no method matching iterate(::CSVFiles.CSVFile)

julia> using IterableTables

julia> load("boo.csv") |> Table
ERROR: MethodError: no method matching iterate(::CSVFiles.CSVFile)

Am I doing something wrong here?

Problems testing with Julia Nightly

I have a dependency on CSVFiles.jl and a problem when Travis testing against Julia Nightly. It works with Julia 0.6.

I forked CSVFiles to https://github.com/RobBlackwell/CSVFiles.jl and updated the Travis script to try investigate the problem.

I guess it's actually a problem with TextParse.jl but haven't managed to chase it down.

I can do some more investigation, but wanted to get your thoughts before spending more time please. Thanks!

ERROR: LoadError: LoadError: syntax: label "#26#done" referenced but not defined
Stacktrace:
 [1] include at ./boot.jl:317 [inlined]
 [2] include_relative(::Module, ::String) at ./loading.jl:1034
 [3] include at ./sysimg.jl:29 [inlined]
 [4] include(::String) at /home/travis/.julia/packages/TextParse/udwy/src/TextParse.jl:3
 [5] top-level scope at none:0
 [6] include at ./boot.jl:317 [inlined]
 [7] include_relative(::Module, ::String) at ./loading.jl:1034
 [8] include(::Module, ::String) at ./sysimg.jl:29
 [9] top-level scope at none:0
 [10] eval at ./boot.jl:319 [inlined]
 [11] eval(::Expr) at ./client.jl:402
 [12] top-level scope at ./none:3 [inlined]
 [13] top-level scope at ./<missing>:0
in expression starting at /home/travis/.julia/packages/TextParse/udwy/src/util.jl:44
in expression starting at /home/travis/.julia/packages/TextParse/udwy/src/TextParse.jl:9
ERROR: LoadError: LoadError: Failed to precompile TextParse to /home/travis/.julia/compiled/v0.7/TextParse/Ry2K.ji.
Stacktrace:
 [1] error(::String) at ./error.jl:33
 [2] macro expansion at ./logging.jl:298 [inlined]
 [3] compilecache(::Base.PkgId) at ./loading.jl:1173
 [4] _require(::Base.PkgId) at ./loading.jl:942
 [5] require(::Base.PkgId) at ./loading.jl:838
 [6] require(::Module, ::Symbol) at ./loading.jl:833
 [7] include at ./boot.jl:317 [inlined]
 [8] include_relative(::Module, ::String) at ./loading.jl:1034
 [9] macro expansion at ./logging.jl:312 [inlined]
 [10] _require(::Base.PkgId) at ./loading.jl:929
 [11] require(::Base.PkgId) at ./loading.jl:838
 [12] require(::Module, ::Symbol) at ./loading.jl:833
 [13] include at ./boot.jl:317 [inlined]
 [14] include_relative(::Module, ::String) at ./loading.jl:1034
 [15] include(::Module, ::String) at ./sysimg.jl:29
 [16] include(::String) at ./client.jl:401
 [17] top-level scope at none:0
in expression starting at /home/travis/build/RobBlackwell/CSVFiles.jl/src/CSVFiles.jl:3
in expression starting at /home/travis/build/RobBlackwell/CSVFiles.jl/test/runtests.jl:1
ERROR: Package CSVFiles errored during testing
Stacktrace:
 [1] cmderror(::String, ::Vararg{String,N} where N) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/Types.jl:120
 [2] macro expansion at ./logging.jl:301 [inlined]
 [3] #test#56(::Bool, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/Operations.jl:1218
 [4] #test at ./none:0 [inlined]
 [5] #test#44(::Bool, ::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::Pkg.Types.Context, ::Array{Pkg.Types.PackageSpec,1}) at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:234
 [6] #test at ./none:0 [inlined]
 [7] #test#43 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:217 [inlined]
 [8] #test at ./none:0 [inlined]
 [9] #test#42 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:216 [inlined]
 [10] #test at ./none:0 [inlined]
 [11] #test#41 at /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v0.7/Pkg/src/API.jl:215 [inlined]
 [12] (::getfield(Pkg.API, Symbol("#kw##test")))(::NamedTuple{(:coverage,),Tuple{Bool}}, ::typeof(Pkg.API.test), ::String) at ./none:0
 [13] top-level scope at none:0

Add example of how to read CSV files and specify a delimiter

I can never figure out how to set the delim reading the readme.

It would good to actually add an example instead of just having the signature.

I tried

FileIO.File(format"CSV", delim='|')

FileIO.File(format"CSV"(delim='|'))

neither worked.

How do I iterate through each column?

Say I have read in my files like this

using CSVFiles

@time a = load("c:/data/AirOnTimeCSV/airOT199302.csv", type_detect_rows = 2000)

Is there a way to iterate through the columns without converting it to DataFrame first (cos it's slow)?

E.g. if I did convert to DataFrame then I can do

adf = DataFrame(a)
for c in eachcol(adf)
   # do something to c, like serialize to disk.
end

Automatic loading of tab- and ;- delimited files

DataFrames support loading tab-delimited files and semicolon-delimited files automatically by file extension (e.g. .tsv). Is something like that possible?
The issues is that comma-delimited files, though apparently the default format in Julia, are more or less restricted to countries with decimal points, i.e. the English-speaking countries:
skaermbillede 2017-09-12 kl 07 58 39
In countries where the comma is the decimal separator, semicolon-delimited values are saved automatically in programs like Excel when specifying the csv format. I can't believe the guys who came up with that thought that might be a good idea, but there you are.

DataFrame and DataTable constructors require IterableTables

Maybe this is intended behavior, but the example from the readme does not work without a using IterableTables for me.

This doesn't work:

julia> using FileIO, CSVFiles, DataFrames
WARNING: Method definition ==(Base.Nullable{S}, Base.Nullable{T}) in module Base at nullable.jl:238 overwritten in module NullableArrays at /Users/tcovert/.julia/v0.6/NullableArrays/src/operators.jl:99.

julia> wi = DataFrame(load("WellIndex_20160811.csv", escapechar = '"', type_detect_rows = 30000));
ERROR: MethodError: Cannot `convert` an object of type CSVFiles.CSVFile to an object of type DataFrames.DataFrame
This may have arisen from a call to the constructor DataFrames.DataFrame(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] DataFrames.DataFrame(::CSVFiles.CSVFile) at ./sysimg.jl:24

but this does:

julia> using IterableTables

julia> wi = DataFrame(load("WellIndex_20160811.csv", escapechar = '"', type_detect_rows = 30000));

Here's my Pkg.status():

julia> Pkg.status()
7 required packages:
 - CSV                           0.1.4
 - CSVFiles                      0.3.0
 - DataTables                    0.0.3
 - FileIO                        0.5.1
 - IndexedTables                 0.3.0
 - JLD                           0.8.1
 - Query                         0.7.0
35 additional packages:
 - BinDeps                       0.7.0
 - Blosc                         0.3.0
 - CategoricalArrays             0.1.6
 - Compat                        0.30.0
 - DataArrays                    0.6.2
 - DataFrames                    0.10.1
 - DataStreams                   0.1.3
 - DataStructures                0.6.1
 - DataValues                    0.2.0
 - DocStringExtensions           0.4.0
 - Documenter                    0.11.2
 - GZip                          0.3.0
 - HDF5                          0.8.5
 - HTTP                          0.4.3
 - Homebrew                      0.5.8
 - IterableTables                0.5.0
 - JSON                          0.13.0
 - LegacyStrings                 0.2.2
 - MacroTools                    0.3.7
 - MbedTLS                       0.5.0
 - NamedTuples                   4.0.0
 - NullableArrays                0.1.2
 - Nulls                         0.0.5
 - PooledArrays                  0.1.1
 - Reexport                      0.0.3
 - Requires                      0.4.3
 - SHA                           0.5.1
 - SortingAlgorithms             0.1.1
 - SpecialFunctions              0.3.1
 - StatsBase                     0.18.0
 - TableTraits                   0.0.1
 - TableTraitsUtils              0.0.1
 - TextParse                     0.1.7+             master
 - URIParser                     0.2.0
 - WeakRefStrings                0.3.0

Errors in saving non-standard element types

Consider the following code:

julia> df = DataFrame(x = [',','\n', ','])
3×1 DataFrame
│ Row │ x    │
│     │ Char │
├─────┼──────┤
│ 1   │ ','  │
│ 2   │ '\n' │
│ 3   │ ','  │

julia> df |> save("test.csv")

julia> println(read("test.csv", String))
"x"
,


,


julia>

And the saved file is broken because non-strings are saved as not quoted.

Here is an extreme example (not to say it happens in reality, but just shows that it could be handled better). The code is a continuation of the earlier code:

julia> DataFrame(d=[df, df]) |> save("test2.csv")

julia> println(read("test2.csv", String))
"d"
3×1 DataFrame
│ Row │ x    │
│     │ Char │
├─────┼──────┤
│ 1   │ ','  │
│ 2   │ '\n' │
│ 3   │ ','  │
3×1 DataFrame
│ Row │ x    │
│     │ Char │
├─────┼──────┤
│ 1   │ ','  │
│ 2   │ '\n' │
│ 3   │ ','  │

and it is completely unreadable back (even as string) because it is not quoted again.

Finally let us consider a more normal scenario, which is again broken because of non-quoting:

julia> df = DataFrame(a=Date("2000-10-10"), b=Date("2000-11-11"))
1×2 DataFrame
│ Row │ a          │ b          │
│     │ Date       │ Date       │
├─────┼────────────┼────────────┤
│ 1   │ 2000-10-10 │ 2000-11-11 │

julia> df |> save("test3.csv", delim="-")

julia> println(read("test3.csv", String))
"a"-"b"
2000-10-10-2000-11-11

@davidanthoff Not sure which of the issues above can be fixed but at least I wanted you to be aware of them.

NO applicable_loaders found for csv

using CSV
using DataFrames
a = DataFrame(a=[1,2], b=[3,4])
CSV.write("a.csv", a, delim='|', writeheader =false)

using CSVFiles, FileIO, TextParse
@time a = load(File(format"csv", "a.csv"), delim='|')

gettiing error

NO applicable_loaders found for csv

Windows 10 Pro
Julia 1.2
TextParse 0.9.1
CSVFiles 0.15.0
FileIO 1.0.7

Performance issue w/ getiterator(::CSVFile)

I noticed a weird performance issue when playing w/ CSVFile; note the following (this is all after having run the functions once):

julia> @time f = load("/Users/jacobquinn/Downloads/randoms.csv");
  0.000297 seconds (84 allocations: 5.234 KiB)

julia> @time f = IteratorInterfaceExtensions.getiterator(load("/Users/jacobquinn/Downloads/randoms.csv"));
  6.374569 seconds (36.76 M allocations: 810.608 MiB, 2.06% gc time)

julia> @time DataFrame(f);
  0.070821 seconds (532.63 k allocations: 34.434 MiB, 9.99% gc time)

The getiterator call takes that long each time it's called. I haven't had a chance to dig in further, but wanted to report.

Deprecation warning at TableTraitsUtils.jl:17

I get the following warning when loading a CSV file with Julia 1.2.

┌ Warning: `T` is deprecated, use `nonmissingtype` instead.
│   caller = create_tableiterator(::Array{Array{T,1} where T,1}, ::Array{Symbol,1}) at TableTraitsUtils.jl:17
└ @ TableTraitsUtils ~/.julia/packages/TableTraitsUtils/IAE6S/src/TableTraitsUtils.jl:17

Looks easy to fix (famous last words). If nobody else can get to it, I'd be willing to make the change with a little encouragement. 😉

CSV file load error - no method matching UInt8(::String)

Hello,

Am trying to load a CSV file from https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/latest_data/latestdata.tar.gz (https://github.com/beoutbreakprepared/nCoV2019/tree/master/latest_data).

Opening file locally works no problem, but trying to do so via CSVFiles leads to following error:

ERROR: LoadError: CSV parsing error in tar/latestdata.csv at line 14455 char 310:
.../world-asia-51345855; San Lazaro Hospital,True,"""thought to have had other pre-existing conditions""...
____________________________________________________^
column 20 is expected to be: TextParse.Field{String,TextParse.Quoted{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8},UInt8,UInt8}}(""<string>"", true, true, false)
Stacktrace:
 [1] parsefill!(::TextParse.VectorBackedUTF8String, ::TextParse.LocalOpts{UInt8,UInt8,UInt8}, ::TextParse.Record{Tuple{TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{Union{Missing, Float64},TextParse.NAToken{Union{Missing, Float64},TextParse.Numeric{Float64}}},TextParse.Field{Union{Missing, Float64},TextParse.NAToken{Union{Missing, Float64},TextParse.Numeric{Float64}}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}},TextParse.Field{Union{Missing, Float64},TextParse.NAToken{Union{Missing, Float64},TextParse.Numeric{Float64}}},TextParse.Field{Missing,TextParse.NAToken{Missing,TextParse.Unknown}},TextParse.Field{Missing,TextParse.NAToken{Missing,TextParse.Unknown}}},Tuple{String,String,String,String,String,String,Union{Missing, Float64},Union{Missing, Float64},String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,String,Union{Missing, Float64},Missing,Missing}}, ::Int64, ::Tuple{Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{Union{Missing, Float64},1},Array{Union{Missing, Float64},1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{Union{Missing, Float64},1},Array{Missing,1},Array{Missing,1}}, ::OrderedCollections.OrderedDict{Union{Int64, String},Union{Nothing, AbstractArray{T,1} where T}}, ::Int64, ::Int64, ::Int64, ::Int64, ::Nothing) at /Users/yoh/.julia/packages/TextParse/EETm2/src/csv.jl:604
 [2] _csvread_internal(::TextParse.VectorBackedUTF8String, ::Char; spacedelim::Bool, quotechar::Char, escapechar::Char, commentchar::Nothing, stringtype::Type{T} where T, stringarraytype::Type{T} where T, noresize::Bool, rowno::Int64, prevheaders::Nothing, pooledstrings::Nothing, skiplines_begin::Int64, samecols::Nothing, header_exists::Bool, nastrings::Array{String,1}, colnames::Array{String,1}, colspool::OrderedCollections.OrderedDict{Union{Int64, String},Union{Nothing, AbstractArray{T,1} where T}}, row_estimate::Int64, prev_parsers::Nothing, colparsers::Array{Any,1}, filename::String, type_detect_rows::Int64) at /Users/yoh/.julia/packages/TextParse/EETm2/src/csv.jl:338
 [3] (::TextParse.var"#34#36"{Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:stringarraytype, :quotechar, :escapechar),Tuple{UnionAll,Char,Char}}},String,Char})(::IOStream) at /Users/yoh/.julia/packages/TextParse/EETm2/src/csv.jl:117
 [4] open(::TextParse.var"#34#36"{Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:stringarraytype, :quotechar, :escapechar),Tuple{UnionAll,Char,Char}}},String,Char}, ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:298
 [5] open at ./io.jl:296 [inlined]
 [6] #_csvread_f#32 at /Users/yoh/.julia/packages/TextParse/EETm2/src/csv.jl:114 [inlined]
 [7] csvread(::String, ::Char; kwargs::Base.Iterators.Pairs{Symbol,Any,Tuple{Symbol,Symbol,Symbol},NamedTuple{(:stringarraytype, :quotechar, :escapechar),Tuple{UnionAll,Char,Char}}}) at /Users/yoh/.julia/packages/TextParse/EETm2/src/csv.jl:80
 [8] _loaddata(::CSVFiles.CSVFile) at /Users/yoh/.julia/packages/CSVFiles/C68zw/src/CSVFiles.jl:103
 [9] get_columns_copy_using_missing(::CSVFiles.CSVFile) at /Users/yoh/.julia/packages/CSVFiles/C68zw/src/CSVFiles.jl:116
 [10] columns at /Users/yoh/.julia/packages/Tables/okt7x/src/fallbacks.jl:231 [inlined]
 [11] DataFrames.DataFrame(::CSVFiles.CSVFile; copycols::Bool) at /Users/yoh/.julia/packages/DataFrames/S3ZFo/src/other/tables.jl:40

Here below is code that I am using:

import CodecZlib
import CSVFiles
import DataFrames
import HTTP
import Tar

local_csv_filename::String = "tar/latestdata.csv"
local_downloaded_filename::String = "latestdata.tar.gz"
local_tar_dir = "./tar"

HTTP.download("https://raw.githubusercontent.com/beoutbreakprepared/nCoV2019/master/latest_data/latestdata.tar.gz", "./$local_downloaded_filename")

Tar.extract(
    CodecZlib.GzipDecompressorStream(
        open(local_downloaded_filename, "r")
    ),
    local_tar_dir
)
 
df::DataFrames.DataFrame = DataFrames.DataFrame(
    CSVFiles.load(
        local_csv_filename,
        delim=',',
        quotechar='"',
        escapechar='"'
    )
)

When opening file manually I did see some "" in string columns of data.
Not sure if this is causing problem.

Thank you!

Date parsing issue "Month out of range"

I've been running into a weird date parsing issue, and I can't sort out what the pattern is, though I've managed to nail down a MWE

The linked csv has 4 rows of dates.

julia> load("parse_test.csv") |> DataFrame
ERROR: ArgumentError: Month: 27 out of range (1:12)
Stacktrace:
 [1] Date(::Int64, ::Int64, ::Int64) at ./dates/types.jl:204
 [2] tryparsenext(::TextParse.DateTimeToken{Date,DateFormat{Symbol("yyyy/mm/dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/kev/.julia/v0.6/TextParse/src/field.jl:431
 [3] macro expansion at /Users/kev/.julia/v0.6/TextParse/src/util.jl:23 [inlined]
 [4] tryparsenext(::TextParse.Field{Date,TextParse.DateTimeToken{Date,DateFormat{Symbol("yyyy/mm/dd"),Tuple{Base.Dates.DatePart{'y'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'m'},Base.Dates.Delim{Char,1},Base.Dates.DatePart{'d'}}}}}, ::String, ::Int64, ::Int64, ::TextParse.LocalOpts) at /Users/kev/.julia/v0.6/TextParse/src/field.jl:569
#...

(the stack trace is super long, let me know if it would be useful to post the whole thing)

There are 3 27s, two in the second row, and one in the last row. If I remove just the last row, it works.

julia> load("parse_test.csv") |> DataFrame
3×6 DataFrames.DataFrame
│ Row │ c1         │ c2         │ c3         │ c4         │ c5         │ c6         │
├─────┼────────────┼────────────┼────────────┼────────────┼────────────┼────────────┤
│ 10016-08-080011-11-100016-08-080010-01-150010-01-150016-08-08 │
│ 20016-05-270010-12-130016-05-270012-01-150012-01-150016-01-01 │
│ 30016-06-150011-08-040009-12-210011-01-090011-01-090009-11-24 │

julia>

But if I leave the 4th row in and just change the 27 in the last row to a 2, I get the same ERROR: ArgumentError: Month: 27 out of range (1:12).

If I change all the 27s to 2s, I now get ERROR: ArgumentError: Month: 21 out of range (1:12), and again this error goes away if I delete the last row, even though there are no 21s in the last row.

There's not just something weird with that row - this is part of a much larger csv file, and removing only row 4 does not stop the error.

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Error loading csv file with moderate length string

When trying to load a simple CSV file with the function: "CSV.read(testfile, DataFrame")

where testfile is a the following trivial csv:

A,B
name, 1.0
longname, 1.0

I receive the following error:

ERROR: MethodError: Cannot convert an object of type
Parsers.Result{String15} to an object of type
Parsers.Result{Any}

However, when 'longname' is changed to 'name', the CSV.read function works fine.

I am running Julia 1.8.2 and CSV v0.10.4 on an M1 Mac with MacOS 12.6

IndexedTable(load(...)) gives an NDSparse instead of a Table

Is there a way to request a specific formulation of the IndexedTable type when reading in a CSV? Right now it seems to default to the full NDSparse representation, in which the n-1 leftmost columns are keys and the rightmost column is a value. For my use cases, getting a plain IndexedTable with no primary key columns is more useful.

Note, this is not a show stopper. This, for example, works fine

t = IndexedTable(load("file.csv"))
t = table(t)

Problem with nastring for non-numeric columns

Because the default nastring is NA there is a following problem:

  1. take a data structure that has e.g. String column with missing data in it;
  2. save it to disk using default parameters; missings get converted to NA on disk
  3. load it back and you have "NA" string where you earlier had missings

The same problem occurs with e.g. Char data.

While NA is a sensible default for numeric columns it is a bit confusing for non-numeric columns (and actually can lead to wrong results as it is fully possible to have NA string in data).

I think that it would be best to have an empty string for missings in non-numeric data.

Feature request: return data on save

I think it would be useful if the save method returned its given data. This change would allow variable assignment after chaining/continuations.

For Example:

using FileIO, CSVFiles
using DataFrames
using Query
using Test

df = DataFrame([(a=1,b=2),(a=3,b=4)])
result = df |> @map({e = _.a^2}) |> DataFrame |> save("tmp.csv") 
@test result[2,1] == 9

Let me know what you think, or if you foresee any other issues that may have passed me.

The streaming load/save example does not work

julia> using DataFrames, CSVFiles, FileIO

julia> df = DataFrame(a = [1,2,3], b = [4,5,6]);

julia> stream = IOBuffer();

julia> fileiostream = Stream(format"CSV", stream);

julia> save(fileiostream, df)

julia> load(fileiostream)
0x0 CSV file
Error showing value of type CSVFiles.CSVStream:
ERROR: MethodError: no method matching zero(::Type{Any})
Closest candidates are:
  zero(::Type{Union{Missing, T}}) where T at missing.jl:87
  zero(::Type{LibGit2.GitHash}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/LibGit2/src/oid.jl:220
  zero(::Type{Pkg.Resolve.VersionWeights.VersionWeight}) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/Pkg/src/resolve/VersionWeights.jl:19
  ...
Stacktrace:
 [1] zero(::Type{Any}) at ./missing.jl:87
 [2] reduce_empty(::typeof(+), ::Type) at ./reduce.jl:227
 [3] reduce_empty(::typeof(Base.add_sum), ::Type) at ./reduce.jl:234
 [4] mapreduce_empty(::typeof(identity), ::Function, ::Type) at ./reduce.jl:251
 [5] _mapreduce(::typeof(identity), ::typeof(Base.add_sum), ::IndexLinear, ::Array{Any,1}) at ./reduce.jl:305
 [6] _mapreduce_dim at ./reducedim.jl:308 [inlined]
 [7] #mapreduce#548 at ./reducedim.jl:304 [inlined]
 [8] mapreduce at ./reducedim.jl:304 [inlined]
 [9] _sum at ./reducedim.jl:653 [inlined]
 [10] _sum at ./reducedim.jl:652 [inlined]
 [11] #sum#550 at ./reducedim.jl:648 [inlined]
 [12] sum(::Array{Any,1}) at ./reducedim.jl:648
 [13] #printtable#1(::Bool, ::Function, ::IOContext{REPL.Terminals.TTYTerminal}, ::TableTraitsUtils.TableIterator{NamedTuple{(),Tuple{}},Tuple{}}, ::String) at /Users/harry/.julia/packages/TableShowUtils/ImkA9/src/TableShowUtils.jl:43
 [14] printtable(::IOContext{REPL.Terminals.TTYTerminal}, ::TableTraitsUtils.TableIterator{NamedTuple{(),Tuple{}},Tuple{}}, ::String) at /Users/harry/.julia/packages/TableShowUtils/ImkA9/src/TableShowUtils.jl:7
 [15] show(::IOContext{REPL.Terminals.TTYTerminal}, ::CSVFiles.CSVStream) at /Users/harry/.julia/packages/CSVFiles/KysmQ/src/CSVFiles.jl:38
 [16] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::CSVFiles.CSVStream) at ./sysimg.jl:194
 [17] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:131
 [18] display(::REPL.REPLDisplay, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:135
 [19] display(::Any) at ./multimedia.jl:287
 [20] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [21] invokelatest at ./essentials.jl:741 [inlined]
 [22] print_response(::IO, ::Any, ::Any, ::Bool, ::Bool, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:155
 [23] print_response(::REPL.AbstractREPL, ::Any, ::Any, ::Bool, ::Bool) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:140
 [24] (::getfield(REPL, Symbol("#do_respond#38")){Bool,getfield(REPL, Symbol("##48#57")){REPL.LineEditREPL,REPL.REPLHistoryProvider},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:714
 [25] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [26] invokelatest at ./essentials.jl:741 [inlined]
 [27] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/LineEdit.jl:2273
 [28] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:1035
 [29] run_repl(::REPL.AbstractREPL, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:192
 [30] (::getfield(Base, Symbol("##734#736")){Bool,Bool,Bool,Bool})(::Module) at ./client.jl:362
 [31] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [32] invokelatest at ./essentials.jl:741 [inlined]
 [33] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:346
 [34] exec_options(::Base.JLOptions) at ./client.jl:284
 [35] _start() at ./client.jl:436

I took a look at the tests, which include mark and reset statements. When I add these statements, it works:

julia> stream = IOBuffer();

julia> fileiostream = Stream(format"CSV", stream);

julia> mark(stream)
0

julia> save(fileiostream, df)

julia> reset(stream)
0

julia> mark(stream)
0

julia> load(fileiostream)
3x2 CSV file
a │ b
──┼──
14
25
36

Weirdly, that works in the REPL, but not in Atom, where I get the same stacktrace as above.

TypeError: non-boolean (Missing) used in boolean context

Hello,

I'm reading a GPX file (GPS track) https://gist.github.com/scls19fr/3048506102e37263902588f86b7e759f using Julia.

Here is my code

using Dates
using XMLDict


function read_gpx(fname)
    s = read(fname, String)
    xml = parse_xml(s)
    trkpts = xml["trk"]["trkseg"]["trkpt"]
    g = (
            (
                time=parse(DateTime, trkpt["time"], dateformat"yyyy-mm-dd\THH:MM:SSZ"), 
                elevation=parse(Float64, trkpt["ele"]),
                description=trkpt["desc"],
                latitude=parse(Float64, trkpt[:lat]),
                longitude=parse(Float64, trkpt[:lon])
            )
    for trkpt in trkpts)

    return g
end

fname = "sample.gpx"
# fname = "20191107_LFBI_0932_LFBI_1134.gpx"
g = read_gpx(fname)
positions = collect(g)
# println(positions)

using DataFrames
df = DataFrame(g)
println(df)

using CSV
CSV.write("sample.csv", df)

# using CSVFiles
# save("sample.csv", g)  # raises ERROR: LoadError: TypeError: non-boolean (Missing) used in boolean context

Exporting to CSV file by converting to DataFrame and using CSV.jl works fine but I can't directly export my position generator to CSV file using CSVFiles.jl

It's raising ERROR: LoadError: TypeError: non-boolean (Missing) used in boolean context.

I don't understand what is going on. I wonder if that's bug on CSVFiles or a misunderstanding from me how to use it.

Any idea?

Kind regards

Load first N rows of a file

Is it possible to load only the first N rows of a CSV file? For example, if I have a 1001 row file (one header row and 1000 rows of data), is it possible for me to load only the header row and the first 500 rows of data?

`load()` doesn't support URLs

The following does not work, although the README.md file claims urls can be used instead:

julia> url = "https://raw.githubusercontent.com/queryverse/CSVFiles.jl/master/test/data.csv"
julia> load(url)
ERROR: ArgumentError: No file exists at given path: https://raw.githubusercontent.com/queryverse/CSVFiles.jl/master/test/data.csv
Stacktrace:
 [1] load(::Formatted; options::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at /Users/rpz/.julia/packages/FileIO/wN5rD/src/loadsave.jl:189
 [2] load at /Users/rpz/.julia/packages/FileIO/wN5rD/src/loadsave.jl:184 [inlined]
 [3] #load#14 at /Users/rpz/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133 [inlined]
 [4] load(::String) at /Users/rpz/.julia/packages/FileIO/wN5rD/src/loadsave.jl:133
 [5] top-level scope at REPL[7]:1

Method error when using keywords

I have a file that doesn't have a header, but can't seem to sort out how to load it:

julia> df = load("teststream.tsv", '\t', header=false) |> DataFrame
ERROR: MethodError: no method matching _csvread_internal(::String, ::Char; filename="teststream.tsv", header=false)
Closest candidates are:
  _csvread_internal(::AbstractString, ::Any; spacedelim, quotechar, escapechar, pooledstrings, noresize, rowno, prevheaders, skiplines_begin, samecols, header_exists, nastrings, colnames, colspool, nrows, prev_parsers, colparsers, filename, type_detect_rows) at /n/home09/kbonham/.julia/v0.6/TextParse/src/csv.jl:163 got unsupported keyword argument "header"
  _csvread_internal(::AbstractString) at /n/home09/kbonham/.julia/v0.6/TextParse/src/csv.jl:163 got unsupported keyword arguments "filename", "header"
Stacktrace:
 [1] (::TextParse.#kw##_csvread_internal)(::Array{Any,1}, ::TextParse.#_csvread_internal, ::String, ::Char) at ./<missing>:0
 [2] (::TextParse.##31#33{Array{Any,1},String,Char})(::IOStream) at /n/home09/kbonham/.julia/v0.6/TextParse/src/csv.jl:97
 [3] open(::TextParse.##31#33{Array{Any,1},String,Char}, ::String, ::String) at ./iostream.jl:152
 [4] #_csvread_f#29(::Array{Any,1}, ::Function, ::String, ::Char) at /n/home09/kbonham/.julia/v0.6/TextParse/src/csv.jl:95
 [5] (::TextParse.#kw##_csvread_f)(::Array{Any,1}, ::TextParse.#_csvread_f, ::String, ::Char) at ./<missing>:0
 [6] #csvread#25(::Array{Any,1}, ::Function, ::String, ::Char) at /n/home09/kbonham/.julia/v0.6/TextParse/src/csv.jl:69
 [7] (::TextParse.#kw##csvread)(::Array{Any,1}, ::TextParse.#csvread, ::String, ::Char) at ./<missing>:0
 [8] getiterator(::CSVFiles.CSVFile) at /n/home09/kbonham/.julia/v0.6/CSVFiles/src/CSVFiles.jl:49
 [9] _DataFrame(::CSVFiles.CSVFile) at /n/home09/kbonham/.julia/v0.6/IterableTables/src/integrations/dataframes-missing.jl:100
 [10] DataFrames.DataFrame(::CSVFiles.CSVFile) at /n/home09/kbonham/.julia/v0.6/IterableTables/src/integrations/dataframes-missing.jl:129
 [11] |>(::CSVFiles.CSVFile, ::Type{T} where T) at ./operators.jl:862

I've also tried:

df = load("teststream.tsv", delim='\t', header=false) |> DataFrame
df = load("teststream.tsv", delim='\t', header_exists=false) |> DataFrame
df = load("teststream.tsv", header=false) |> DataFrame

and using a stream instead of a file:

df = load(Stream(format"TSV", fl), header=false) |> DataFrame
# etc

I can load the file with CSV.read("teststream.tsv", delim='\t', header=false) so I don't think it's an issue with the file itself. First 5 lines of the file are here.

julia> Pkg.status("FileIO")
 - FileIO                        0.7.0

julia> Pkg.status("CSVFiles")
 - CSVFiles                      0.5.0

julia> VERSION
v"0.6.2"

But I just checked out master for both packages and have the same problem...

Typo in README.md

The examples in the "Using the pipe syntax" section have a typo in the import statements.

using CSVFiles, DataFrame should be using CSVFiles, DataFrames

I'm pretty sure most people would work it out, but pointing it out to save future people from finding the missing s.

Error when loading a remote URL CSV file

First, would like to thank you for this amazing ecosystem of interoperable IO packages that you created.

When loading a CSV file from the following European Central Bank URL

http://sdw.ecb.europa.eu/quickviewexport.do?trans=N&start=&end=&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA0&type=csv

using load("http://sdw.ecb.europa.eu/......"), or load("http://sdw.ecb.europa.eu/...... , header_exists=false, skiplines_begin=5"), one gets the following error:

ERROR: LoadError: 
SystemError: opening file http://sdw.ecb.europa.eu/quickviewexport.do?trans=N&start=&end=&SERIES_KEY=165.YC.B.U2.EUR.4F.G_N_A.SV_C_YM.BETA0&type=csv
:Invalid argument

The URL is valid, one can use (although being a much less attractive choice) Requests.get without error, so was expecting we could load this CSV file. Thank you very much for your time.

Reading error with Date types and missing values

I'm having trouble using this package to read in what I think is MWE CSV file with a date formatted field that includes missing values.

I'm issuing this command:

wi = DataFrame(load("temp.csv", escapechar = '"'))

and I'm getting this error:

ERROR: MethodError: Cannot `convert` an object of type DataValues.DataValue{Date} to an object of type Date
This may have arisen from a call to the constructor Date(...),
since type constructors fall back to convert methods.
Stacktrace:
 [1] convert(::Type{DataValues.DataValue{Date}}, ::DataValues.DataValue{DataValues.DataValue{Date}}) at /Users/tcovert/.julia/v0.6/DataValues/src/scalar/core.jl:26
 [2] macro expansion at /Users/tcovert/.julia/v0.6/NamedTuples/src/NamedTuples.jl:143 [inlined]
 [3] NamedTuples._NT_APINo_FileNo_CurrentOperator_CurrentWellName_LeaseName_LeaseNumber_OriginalOperator_OriginalWellName_SpudDate_TD_CountyName_Township_Range_Section_QQ_Footages_FieldName{Float64,Int64,String,String,String,String,String,String,DataValues.DataValue{Date},Int64,String,String,String,Int64,String,String,String}(::Float64, ::Int64, ::String, ::String, ::String, ::String, ::String, ::String, ::DataValues.DataValue{DataValues.DataValue{Date}}, ::Int64, ::String, ::String, ::String, ::Int64, ::String, ::String, ::String) at /Users/tcovert/.julia/v0.6/NamedTuples/src/NamedTuples.jl:149
 [4] macro expansion at /Users/tcovert/.julia/v0.6/TableTraitsUtils/src/TableTraitsUtils.jl:63 [inlined]
 [5] next(::TableTraitsUtils.TableIterator{NamedTuples._NT_APINo_FileNo_CurrentOperator_CurrentWellName_LeaseName_LeaseNumber_OriginalOperator_OriginalWellName_SpudDate_TD_CountyName_Township_Range_Section_QQ_Footages_FieldName{Float64,Int64,String,String,String,String,String,String,DataValues.DataValue{Date},Int64,String,String,String,Int64,String,String,String},Tuple{Array{Float64,1},Array{Int64,1},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},DataValues.DataValueArray{Date,1},Array{Int64,1},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},Array{Int64,1},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}}}}, ::Int64) at /Users/tcovert/.julia/v0.6/TableTraitsUtils/src/TableTraitsUtils.jl:51
 [6] macro expansion at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes-dataarray.jl:91 [inlined]
 [7] _filldf(::Tuple{Array{Float64,1},Array{Int64,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},Array{String,1},DataArrays.DataArray{Date,1},Array{Int64,1},Array{String,1},Array{String,1},Array{String,1},Array{Int64,1},Array{String,1},Array{String,1},Array{String,1}}, ::TableTraitsUtils.TableIterator{NamedTuples._NT_APINo_FileNo_CurrentOperator_CurrentWellName_LeaseName_LeaseNumber_OriginalOperator_OriginalWellName_SpudDate_TD_CountyName_Township_Range_Section_QQ_Footages_FieldName{Float64,Int64,String,String,String,String,String,String,DataValues.DataValue{Date},Int64,String,String,String,Int64,String,String,String},Tuple{Array{Float64,1},Array{Int64,1},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},DataValues.DataValueArray{Date,1},Array{Int64,1},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},Array{Int64,1},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}},PooledArrays.PooledArray{String,UInt8,1,Array{UInt8,1}}}}) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes-dataarray.jl:79
 [8] _DataFrame(::CSVFiles.CSVFile) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes-dataarray.jl:119
 [9] DataFrames.DataFrame(::CSVFiles.CSVFile) at /Users/tcovert/.julia/v0.6/IterableTables/src/integrations/dataframes-dataarray.jl:127

I don't think this is a DataFrames problem since I get roughly the same error using IndexedTables.

The MWE input data is here

I think I may have previously found a similar error in Query.jl (see here: queryverse/Query.jl#134)

julia version bug in 1.5?

In

julia> versioninfo()
Julia Version 1.5.0
Commit 96786e22cc (2020-08-01 23:44 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)
Environment:
  JULIA_NUM_THREADS = 8

and

(@v1.5) pkg> st CSVFiles
Status `~/.julia/environments/v1.5/Project.toml`
  [5d742f6a] CSVFiles v1.0.0

I get

julia> using CSVFiles
[ Info: Precompiling CSVFiles [5d742f6a-9f54-50ce-8119-2520741973ca]
ERROR: LoadError: LoadError: LoadError: Cannot read stream serialized with a newer version of Julia.
Got data version 11 > current version 10
Stacktrace:
 [1] error(::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [2] readheader(::Serialization.Serializer{IOStream}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:715
 [3] handle_deserialize(::Serialization.Serializer{IOStream}, ::Int32) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:878
 [4] deserialize(::Serialization.Serializer{IOStream}) at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:773
 [5] deserialize at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.5/Serialization/src/Serialization.jl:760 [inlined]
 [6] open(::typeof(Serialization.deserialize), ::String, ::Vararg{String,N} where N; kwargs::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}) at ./io.jl:325
 [7] open at ./io.jl:323 [inlined]
 [8] (::TimeZones.var"#3#4"{String})() at /Users/abradley/.julia/packages/TimeZones/cAGJs/src/types/timezone.jl:50
 [9] get!(::TimeZones.var"#3#4"{String}, ::Dict{String,Tuple{Dates.TimeZone,TimeZones.Class}}, ::String) at ./dict.jl:450
 [10] Dates.TimeZone(::String, ::TimeZones.Class) at /Users/abradley/.julia/packages/TimeZones/cAGJs/src/types/timezone.jl:46 (repeats 2 times)
 [11] @tz_str(::LineNumberNode, ::Module, ::Any) at /Users/abradley/.julia/packages/TimeZones/cAGJs/src/types/timezone.jl:86
 [12] include(::Function, ::Module, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [13] include at ./Base.jl:368 [inlined]
 [14] include(::String) at /Users/abradley/.julia/packages/Intervals/iTDy5/src/Intervals.jl:1
 [15] top-level scope at /Users/abradley/.julia/packages/Intervals/iTDy5/src/Intervals.jl:28
 [16] include(::Function, ::Module, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [17] top-level scope at none:2
 [18] eval at ./boot.jl:331 [inlined]
 [19] eval(::Expr) at ./client.jl:467
 [20] top-level scope at ./none:3
in expression starting at /Users/abradley/.julia/packages/Intervals/iTDy5/src/interval.jl:153
in expression starting at /Users/abradley/.julia/packages/Intervals/iTDy5/src/interval.jl:152
in expression starting at /Users/abradley/.julia/packages/Intervals/iTDy5/src/Intervals.jl:28
ERROR: LoadError: Failed to precompile Intervals [d8418881-c3e1-53bb-8760-2df7ec849ed5] to /Users/abradley/.julia/compiled/v1.5/Intervals/ihXRn_X4bCo.ji.
Stacktrace:
 [1] error(::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [2] compilecache(::Base.PkgId, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [3] _require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [4] require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [5] include(::Function, ::Module, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:467
 [9] top-level scope at ./none:3
in expression starting at /Users/abradley/.julia/packages/Polynomials/ZmARV/src/Polynomials.jl:4
ERROR: LoadError: Failed to precompile Polynomials [f27b6e38-b328-58d1-80ce-0feddd5e7a45] to /Users/abradley/.julia/compiled/v1.5/Polynomials/OaK78_X4bCo.ji.
Stacktrace:
 [1] error(::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [2] compilecache(::Base.PkgId, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [3] _require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [4] require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [5] include(::Function, ::Module, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:467
 [9] top-level scope at ./none:3
in expression starting at /Users/abradley/.julia/packages/DoubleFloats/s9LZK/src/DoubleFloats.jl:44
ERROR: LoadError: Failed to precompile DoubleFloats [497a8b3b-efae-58df-a0af-a86822472b78] to /Users/abradley/.julia/compiled/v1.5/DoubleFloats/KzTCm_X4bCo.ji.
Stacktrace:
 [1] error(::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [2] compilecache(::Base.PkgId, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [3] _require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [4] require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [5] include(::Function, ::Module, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:467
 [9] top-level scope at ./none:3
in expression starting at /Users/abradley/.julia/packages/TextParse/EETm2/src/TextParse.jl:3
ERROR: LoadError: Failed to precompile TextParse [e0df1984-e451-5cb5-8b61-797a481e67e3] to /Users/abradley/.julia/compiled/v1.5/TextParse/Ry2K3_X4bCo.ji.
Stacktrace:
 [1] error(::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [2] compilecache(::Base.PkgId, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [3] _require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [4] require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [5] include(::Function, ::Module, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times)
 [6] top-level scope at none:2
 [7] eval at ./boot.jl:331 [inlined]
 [8] eval(::Expr) at ./client.jl:467
 [9] top-level scope at ./none:3
in expression starting at /Users/abradley/.julia/packages/CSVFiles/C68zw/src/CSVFiles.jl:3
ERROR: Failed to precompile CSVFiles [5d742f6a-9f54-50ce-8119-2520741973ca] to /Users/abradley/.julia/compiled/v1.5/CSVFiles/kq3Uy_X4bCo.ji.
Stacktrace:
 [1] error(::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [2] compilecache(::Base.PkgId, ::String) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [3] _require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:?
 [4] require(::Base.PkgId) at /Applications/Julia-1.5.app/Contents/Resources/julia/lib/julia/sys.dylib:? (repeats 2 times

Loading a CSV file results in MethodError: no method matching pointer

I encountered the following error when I tried to read a valid (I think) CSV file.

julia> using CSVFiles, DataFrames

julia> load("data/parking-citations.csv")
Error showing value of type CSVFiles.CSVFile:
ERROR: MethodError: no method matching pointer(::SubString{TextParse.VectorBackedUTF8String}, ::Int64)
Closest candidates are:
  pointer(::String, ::Integer) at strings/string.jl:82
  pointer(::SubString{String}, ::Integer) at strings/substring.jl:105
  pointer(::TextParse.VectorBackedUTF8String, ::Integer) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/VectorBackedStrings.jl:16
  ...
Stacktrace:
 [1] _substring at /Users/kenta/.julia/packages/TextParse/IAMBB/src/field.jl:397 [inlined]
 [2] tryparsenext(::TextParse.StringToken{String}, ::SubString{TextParse.VectorBackedUTF8String}, ::Int64, ::Int64, ::TextParse.LocalOpts{UInt8,UInt8,UInt8}) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/field.jl:368
 [3] macro expansion at /Users/kenta/.julia/packages/TextParse/IAMBB/src/util.jl:27 [inlined]
 [4] tryparsenext(::TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}, ::SubString{TextParse.VectorBackedUTF8String}, ::Int64, ::Int64, ::TextParse.LocalOpts{UInt8,UInt8,UInt8}) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/field.jl:493
 [5] macro expansion at /Users/kenta/.julia/packages/TextParse/IAMBB/src/util.jl:27 [inlined]
 [6] tryparsenext(::TextParse.Field{String,TextParse.Quoted{String,TextParse.StringToken{String},UInt8,UInt8}}, ::SubString{TextParse.VectorBackedUTF8String}, ::Int64, ::Int64, ::TextParse.LocalOpts{UInt8,UInt8,UInt8}) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/field.jl:682
 [7] macro expansion at /Users/kenta/.julia/packages/TextParse/IAMBB/src/util.jl:27 [inlined]
 [8] quotedsplit(::SubString{TextParse.VectorBackedUTF8String}, ::TextParse.LocalOpts{UInt8,UInt8,UInt8}, ::Bool, ::Int64, ::Int64) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/csv.jl:671
 [9] quotedsplit(::SubString{TextParse.VectorBackedUTF8String}, ::TextParse.LocalOpts{UInt8,UInt8,UInt8}, ::Bool) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/csv.jl:662
 [10] #_csvread_internal#26(::Bool, ::Char, ::Char, ::Nothing, ::Type, ::Type, ::Bool, ::Int64, ::Nothing, ::Nothing, ::Int64, ::Nothing, ::Bool, ::Array{String,1}, ::Array{String,1}, ::OrderedCollections.OrderedDict{Union{Int64, String},AbstractArray{T,1} where T}, ::Int64, ::Nothing, ::Array{Any,1}, ::String, ::Int64, ::typeof(TextParse._csvread_internal), ::TextParse.VectorBackedUTF8String, ::Char) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/csv.jl:367
 [11] (::getfield(TextParse, Symbol("#kw##_csvread_internal")))(::NamedTuple{(:filename, :stringarraytype),Tuple{String,UnionAll}}, ::typeof(TextParse._csvread_internal), ::TextParse.VectorBackedUTF8String, ::Char) at ./none:0
 [12] (::getfield(TextParse, Symbol("##22#24")){Base.Iterators.Pairs{Symbol,UnionAll,Tuple{Symbol},NamedTuple{(:stringarraytype,),Tuple{UnionAll}}},String,Char})(::IOStream) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/csv.jl:110
 [13] #open#310(::Base.Iterators.Pairs{Union{},Union{},Tuple{},NamedTuple{(),Tuple{}}}, ::Function, ::getfield(TextParse, Symbol("##22#24")){Base.Iterators.Pairs{Symbol,UnionAll,Tuple{Symbol},NamedTuple{(:stringarraytype,),Tuple{UnionAll}}},String,Char}, ::String, ::Vararg{String,N} where N) at ./iostream.jl:369
 [14] open at ./iostream.jl:367 [inlined]
 [15] #_csvread_f#20 at /Users/kenta/.julia/packages/TextParse/IAMBB/src/csv.jl:107 [inlined]
 [16] #_csvread_f at ./none:0 [inlined]
 [17] #csvread#16(::Base.Iterators.Pairs{Symbol,UnionAll,Tuple{Symbol},NamedTuple{(:stringarraytype,),Tuple{UnionAll}}}, ::Function, ::String, ::Char) at /Users/kenta/.julia/packages/TextParse/IAMBB/src/csv.jl:78
 [18] (::getfield(TextParse, Symbol("#kw##csvread")))(::NamedTuple{(:stringarraytype,),Tuple{UnionAll}}, ::typeof(TextParse.csvread), ::String, ::Char) at ./none:0
 [19] _loaddata(::CSVFiles.CSVFile) at /Users/kenta/.julia/packages/CSVFiles/KysmQ/src/CSVFiles.jl:83
 [20] getiterator(::CSVFiles.CSVFile) at /Users/kenta/.julia/packages/CSVFiles/KysmQ/src/CSVFiles.jl:88
 [21] show(::IOContext{REPL.Terminals.TTYTerminal}, ::CSVFiles.CSVFile) at /Users/kenta/.julia/packages/CSVFiles/KysmQ/src/CSVFiles.jl:22
 [22] show(::IOContext{REPL.Terminals.TTYTerminal}, ::MIME{Symbol("text/plain")}, ::CSVFiles.CSVFile) at ./sysimg.jl:194
 [23] display(::REPL.REPLDisplay, ::MIME{Symbol("text/plain")}, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:131
 [24] display(::REPL.REPLDisplay, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:135
 [25] display(::Any) at ./multimedia.jl:287
 [26] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [27] invokelatest at ./essentials.jl:741 [inlined]
 [28] print_response(::IO, ::Any, ::Any, ::Bool, ::Bool, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:155
 [29] print_response(::REPL.AbstractREPL, ::Any, ::Any, ::Bool, ::Bool) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:140
 [30] (::getfield(REPL, Symbol("#do_respond#38")){Bool,getfield(REPL, Symbol("##48#57")){REPL.LineEditREPL,REPL.REPLHistoryProvider},REPL.LineEditREPL,REPL.LineEdit.Prompt})(::Any, ::Any, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:714
 [31] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [32] invokelatest at ./essentials.jl:741 [inlined]
 [33] run_interface(::REPL.Terminals.TextTerminal, ::REPL.LineEdit.ModalInterface, ::REPL.LineEdit.MIState) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/LineEdit.jl:2273
 [34] run_frontend(::REPL.LineEditREPL, ::REPL.REPLBackendRef) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:1035
 [35] run_repl(::REPL.AbstractREPL, ::Any) at /Users/osx/buildbot/slave/package_osx64/build/usr/share/julia/stdlib/v1.1/REPL/src/REPL.jl:192
 [36] (::getfield(Base, Symbol("##734#736")){Bool,Bool,Bool,Bool})(::Module) at ./client.jl:362
 [37] #invokelatest#1 at ./essentials.jl:742 [inlined]
 [38] invokelatest at ./essentials.jl:741 [inlined]
 [39] run_main_repl(::Bool, ::Bool, ::Bool, ::Bool, ::Bool) at ./client.jl:346
 [40] exec_options(::Base.JLOptions) at ./client.jl:284
 [41] _start() at ./client.jl:436

You can download the data from https://www.kaggle.com/cityofLA/los-angeles-parking-citations. The file size is roughly 1.2 GB (uncompressed).

I'm using CSVFiles v0.14.0 and TextParse v0.8.0.

Add support for DAT files

It would be convenient to have a support for DAT files with the .dat extension and with space/tab separators.
The CSVFiles package can already be used to read the space-separated columns (spacedelim=true) but it looks strange when you have to name such files with the .csv extension.
image

Support compressed files

Here is one approach, would be nice to support this in some easier way. Not super clear how that would interact with the FileIO story, though...

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.