Giter Site home page Giter Site logo

Unexpected allocations about arrow-julia HOT 2 OPEN

JoaoAparicio avatar JoaoAparicio commented on June 21, 2024 1
Unexpected allocations

from arrow-julia.

Comments (2)

JoaoAparicio avatar JoaoAparicio commented on June 21, 2024

One difference that I've noticed between Vector{Int64} and the Vector{IntWrapper} cases, is on entering this function

arrow-julia/src/utils.jl

Lines 34 to 57 in 3712291

function writearray(io::IO, ::Type{T}, col) where {T}
if col isa Vector{T}
n = Base.write(io, col)
elseif isbitstype(T) && (
col isa Vector{Union{T,Missing}} || col isa SentinelVector{T,T,Missing,Vector{T}}
)
# need to write the non-selector bytes of isbits Union Arrays
n = Base.unsafe_write(io, pointer(col), sizeof(T) * length(col))
elseif col isa ChainedVector
n = 0
for A in col.arrays
n += writearray(io, T, A)
end
else
n = 0
data = Vector{UInt8}(undef, sizeof(col))
buf = IOBuffer(data; write=true)
for x in col
n += Base.write(buf, coalesce(x, ArrowTypes.default(T)))
end
n = Base.write(io, take!(buf))
end
return n
end

In the first case, col is type Vector{Int64} and matches the first if case, the in the second col is type ArrowTypes.ToArrow{Int64,Vector{IntWrapper}} and matches the last. This allocates because
data = Vector{UInt8}(undef, sizeof(col))
won't know the size to be allocated at compile time.

However at this stage something already went wrong, I believe. By inserting prints I can ask for the sizeof of col which in the first case is the whole vector, but in the second case it's just 8 which I guess is the number of bytes for a single Int64.

from arrow-julia.

baumgold avatar baumgold commented on June 21, 2024

It looks like @quinnj added the logic to write to a temporary vector prior to writing the vector to the IO. I don't understand why this is required. @quinnj - do you remember?

https://github.com/apache/arrow-julia/pull/57/files#diff-47c27891e951c8cd946b850dc2df31082624afdf57446c21cb6992f5f4b74aa2R47-R52

from arrow-julia.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.