Comments (8)
Ok, that seems like a pretty dated version; I think for now, we'll constrain support arrow 1.0 spec version (as currently noted in the README).
from arrow-julia.
Is there a way you can share the arrow stream as a file somehow? Or allow access to the url so I can reproduce? I have a big refactoring that is fixing a bunch of issues w/ the current code.
from arrow-julia.
This reproduces it for me:
julia> import PyCall
julia> import Arrow
julia> pa = PyCall.pyimport("pyarrow")
julia> schema = pa.schema([("a", pa.int64())])
PyObject a: int64
julia> t = pa.table(Dict("a" => [1,2,3]), schema=schema)
PyObject pyarrow.Table
a: int64
julia> b = IOBuffer()
IOBuffer(data=UInt8[...], readable=true, writable=true, seekable=true, append=false, size=0, maxsize=Inf, ptr=1, mark=-1)
julia> pa.RecordBatchStreamWriter(b, t.schema).write(t)
julia> Arrow.Table(b)
ERROR: type Nothing has no field custom_metadata
Stacktrace:
[1] getproperty(::Nothing, ::Symbol) at ./Base.jl:33
[2] Arrow.Table(::Array{UInt8,1}, ::Int64, ::Nothing; debug::Bool, convert::Bool) at ~/.julia/packages/Arrow/uVYhe/src/table.jl:148
[3] #Table#35 at ~/.julia/packages/Arrow/uVYhe/src/table.jl:41 [inlined]
[4] Table at ~/.julia/packages/Arrow/uVYhe/src/table.jl:41 [inlined] (repeats 2 times)
[5] top-level scope at REPL[10]:1
[6] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088
The pyarrow version used to write is 0.16.0, although I don't think that's important since the first case I hit was produced with Java.
from arrow-julia.
Well, in this case, you're not actually giving Arrow.Table
anything to read; note when you write to b = IOBuffer()
, it's currently positioned at the end of what you just wrote, so then Arrow.Table
is trying to find an arrow ipc message starting at the IOBuffer
's current position. To make this work, you just need to do seekstart(b)
before calling Arrow.Table(b)
.
Now, we should certainly have a better error message here to help the user realize that "nothing"/"empty input" were passed, instead of this obscure no field custom_metadata
error.
from arrow-julia.
Ah, that was silly!
My original case is still failing, though I can't share the raw data. It seems that Arrow.jl can't read the original message, but it can read it after a pyarrow RecordBatchStreamReader + RecordBatchStreamWriter round-trip. The original message is 5204 bytes, but the roundtrip data is only 4976 bytes.
The start of the two message are also different.
https://arrow.apache.org/docs/format/Columnar.html#encapsulated-message-format
This might be the wrong protocol, but it looks like the message should start with 0xFFFFFFFF? The original message starts with 0x84020000, and the roundtripped message starts with 0xFFFFFFFF. I wonder how / why pyarrow is accepting the original? The original data has no occurences of 0xFFFFFFFF anywhere in it. Is it because of "This component was introduced in version 0.15.0 in part to address the 8-byte alignment requirement of Flatbuffers" - should Arrow.jl be tolerant of the old format too?
from arrow-julia.
julia> Arrow.Table(r.body; debug=true)
didn't find continuation byte to keep parsing messages: 644
ERROR: type Nothing has no field custom_metadata
Stacktrace:
[1] getproperty(::Nothing, ::Symbol) at ./Base.jl:33
[2] Arrow.Table(::Array{UInt8,1}, ::Int64, ::Nothing; debug::Bool, convert::Bool) at ~/.julia/packages/Arrow/uVYhe/src/table.jl:148
[3] top-level scope at REPL[93]:1
[4] include_string(::Function, ::Module, ::String, ::String) at ./loading.jl:1088
I just discovered the debug=true flag. It does seems to be related to the continuation bytes (UInt32(644) is 0x84020000))
from arrow-julia.
Can you report what the version is of the message you're reading? It looks like if you read in the message, you can access the version like msg.metadata_version
(link to docs: https://arrow.apache.org/docs/python/generated/pyarrow.ipc.Message.html#pyarrow.ipc.Message.metadata_version). If it's indeed just an older version, I can look into what the difference was with that previous version and we can probably handle it here.
from arrow-julia.
The message seems to be version 0.12, which makes sense given that the change I cited above happened on 0.15.
from arrow-julia.
Related Issues (20)
- Unhandled sentinel value for len in compression causes invalid Array dimensions HOT 5
- Failure to read compressed empty table from java implementation HOT 3
- Should extension metadata tag be more specific? HOT 4
- Release document misses how to register ArrowTypes to the Julia General Registry
- Arrow.jl 2.6 breaks Legolas.jl's tests HOT 11
- Incorrect syntax in ArrowTypes code HOT 2
- Error with v2.6.0 HOT 9
- Issue with `Union{Missing, VersionNumber}` HOT 6
- GitHub Pages build error HOT 8
- Use https://arrow.apache.org/julia/ as the official Website URL HOT 7
- html comment tag at the top of main documentation page may have one too many dashes at the beginning
- explanation of Arrow.Stream vs. Arrow.Table seems ambiguous HOT 3
- `Arrow.write` performance on large DataFrame HOT 3
- Bus errors when writing `DataFrame` HOT 8
- Arrow stream writer and reader implementation questions
- [feature request] support run-end encoded layout
- Custom type cannot round trip (Colors.jl) HOT 1
- colmetadata does not read custom metadata with multiple writes
- `getindex` broken with `SVector{3, UInt}` in the presence of missing data HOT 2
- Removing .arrow files without closing Julia seems impossible in Windows HOT 18
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from arrow-julia.