Comments (11)
Yeah, in any case Arrow schema is just a logical construction, most likely can't be mapped to Julia types one to one and shouldn't (other than the primitive bit types)
So we need some kind of Julia side logical schema in the end. (For vector it's nicely covered by AbstractVector, because Julia's type hierarchy is kinda designed for these technical usage)
from arrow-julia.
A column that's vector of vectors (of various length )
from arrow-julia.
Hm ok. I think we should be trying to convert at least before validating, I agree it is very strict otherwise. Legolas has a mechanism for related things, I think maybe we need to add some definitions like this:
Legolas.accepted_field_type(::Legolas.SchemaVersion, ::Type{Vector{String}}) = AbstractVector{<:AbstractString}
Legolas.accepted_field_type(::Legolas.SchemaVersion, ::Type{Vector{T}}) where T = AbstractVector{<:T}
Legolas.accepted_field_type(::Legolas.SchemaVersion, ::Type{Vector}) = AbstractVector
from arrow-julia.
Maybe we should yank v2.6 from General? Since it wasn't meant to be a breaking release. cc @quinnj
from arrow-julia.
What's this actually testing, the :< Vector seems way too strong
from arrow-julia.
I think it calls , but then, yeah, it expects a convert
firstVector{Int}
for the element (not for the whole column, just for the individual element). That is what we got on Arrow 2.5:
julia> Arrow.Table(read(path))
Arrow.Table with 2 rows, 2 columns, and schema:
:x Vector{Int64} (alias for Array{Int64, 1})
:y String
with metadata given by a Base.ImmutableDict{String, String} with 1 entry:
"legolas_schema_qualified" => "test.parent@1"
julia> Arrow.Table(read(path)).x
2-element Arrow.List{Vector{Int64}, Int32, Arrow.Primitive{Int64, Vector{Int64}}}:
[1, 2]
[3, 4]
julia> Arrow.Table(read(path)).x |> typeof
Arrow.List{Vector{Int64}, Int32, Arrow.Primitive{Int64, Vector{Int64}}}
julia> Arrow.Table(read(path)).x[1] |> typeof
Vector{Int64} (alias for Array{Int64, 1})
edit: it seems to not call convert
first in my testing 🤔
from arrow-julia.
Yeah well that's too strong, when you have a jagged column, it's best to not return Vector for each element because that copies.
Arrow stores jagged column as one data column and one offset column, like VectorOfVectors from (ArrayOrArrays.jl). So really the promise is that each element will be :<AbstractVector
from arrow-julia.
What is a jagged column?
from arrow-julia.
FYI, we tweaked the definition here to accommodate the chance that list-type columns will return SubArray
now
from arrow-julia.
Thanks for the clarification, all. I opened beacon-biosignals/Legolas.jl#89 with the fix proposed by @ericphanson, since it (now!) seems clearer that the failing Legolas tests are due to Legolas being too(?) strict to begin with, rather than any change introduced by Arrow 2.6.
from arrow-julia.
Ok let's close this one then, thanks for the help everyone
from arrow-julia.
Related Issues (20)
- html comment tag at the top of main documentation page may have one too many dashes at the beginning
- explanation of Arrow.Stream vs. Arrow.Table seems ambiguous HOT 3
- `Arrow.write` performance on large DataFrame HOT 3
- Bus errors when writing `DataFrame` HOT 8
- Arrow stream writer and reader implementation questions
- [feature request] support run-end encoded layout
- Custom type cannot round trip (Colors.jl) HOT 1
- colmetadata does not read custom metadata with multiple writes
- `getindex` broken with `SVector{3, UInt}` in the presence of missing data HOT 2
- Removing .arrow files without closing Julia seems impossible in Windows HOT 18
- support Dates.CompoundPeriod in deserialization?
- copy does not copy to standard Julia Types HOT 5
- Unexpected allocations HOT 2
- Type instability in getcolumn
- Cannot append DictEncode columns to Stream
- Arrow-over-HTTP client and server examples in Julia
- Deeply nested structs cause long compilation times HOT 9
- `snappy_jll v1.2.0` lead to Arrow_jll failed to build HOT 4
- Deserialization as Vector{SubArray} breaks `push!` on DataFrame HOT 7
- Add support for FileIO HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-julia.