Giter Site home page Giter Site logo

Comments (8)

deanm0000 avatar deanm0000 commented on July 1, 2024

As an aside, there's https://github.com/kylebarron/parquet-wasm. I don't know if figuring out the interoperability is worth it.

Back to your point, it seems another idea would be to make the compression dependencies a la carte instead of all of them.

from polars.

akhilles avatar akhilles commented on July 1, 2024

it seems another idea would be to make the compression dependencies a la carte instead of all of them.

With #16731, this is possible with zstd:

polars-io = { version = "0.40", default_features = false, features = ["polars-parquet", "zstd"] }

For the other compression dependencies, it's still possible but you have to add polars-parquet as a direct dependency:

polars-io = { version = "0.40", default_features = false, features = ["polars-parquet"] }
polars-parquet = { version = "0.40", default_features = false, features = ["lz4"] }

Adding passthrough feature flags for lz4, brotli, etc. would make this a bit easier but not sure if it's worth it.

from polars.

jccampagne avatar jccampagne commented on July 1, 2024

I am trying to compile Polars for wasm32 target (to be used within Leptos app on client side) and I found this issue here.
After disabling some features related to compression and crossterm to make some progress compile, I am getting this compilation error:

 (git)-[main]-polars % cargo build --target wasm32-unknown-unknown 
...
   Compiling polars-arrow v0.40.0 (/Users/jc/src/polars/crates/polars-arrow)
error: this arithmetic operation will overflow
  --> crates/polars-arrow/src/compute/cast/utf8_to.rs:87:45
   |
87 |         .sliced(0, std::cmp::min(buf.len(), OffsetType::MAX as usize * 2))
   |                                             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ attempt to compute `usize::MAX * 2_usize`, which would overflow
   |
   = note: `#[deny(arithmetic_overflow)]` on by default

error: could not compile `polars-arrow` (lib) due to 1 previous error

This line was introduced in PR 15408 to fix an overflow problem.

I think it's due to this type definition:

#[cfg(not(test))]
type OffsetType = u32;

(see https://github.com/pola-rs/polars/pull/15408/files#diff-7b0d4a626698c6858fc7274640e5c4706b7c336ef259cf861e1a34e453fff70eR74-R75 )

This will cause usize::MAX * 2_usize to overflow on a 32-bit system (like wasm32 target).

from polars.

ritchie46 avatar ritchie46 commented on July 1, 2024

Why would you need parquet on wasm if I may ask? I have never seen uncompressed parquet files and supporting this adds a lot of complexity and breaking assumptions that I don't think are worth it.

from polars.

deanm0000 avatar deanm0000 commented on July 1, 2024

I use duckdb-wasm to read parquet files and it's great. It means I can have a dashboard without an api layer and I can write most of my logic in sql instead of javascript. That said, the duckdb parquet extension supports compression. I don't know how much it'd save on bundle size and load time if it only supported uncompressed parquets. I don't know what the benefit is of uncompressed parquets is vs uncompressed ipc files though.

from polars.

jccampagne avatar jccampagne commented on July 1, 2024

Why would you need parquet on wasm if I may ask?

@ritchie46 sure: I am reading, and analyzing parquet files from within the browser (client side only; no server side).
I was able to read parquet files with the parquet crate and use datafusion for the query engine.

I have never seen uncompressed parquet files

The aim was not to disable all compression. I was just trying to debug the issues and disabled one feature at a time and found this overflow issue in wasm32. I am actually using compression.

and supporting this adds a lot of complexity and breaking assumptions that I don't think are worth it.

I understand. No worries.

from polars.

akhilles avatar akhilles commented on July 1, 2024

Why would you need parquet on wasm if I may ask? I have never seen uncompressed parquet files and supporting this adds a lot of complexity and breaking assumptions that I don't think are worth it.

Similar to @jccampagne, I don't necessarily need uncompressed parquet files. I just want to use a single compression scheme (zstd), but currently all of the compression dependencies are built. This is probably fine for most targets, but not really suitable for wasm for 2 reasons:

  • Compiling C/C++ dependencies to wasm32-unknown-unknown is a pain
  • Keeping binary size smaller is more important

I updated the issue to reflect this more accurately.

Really I just need some way to load a Dataframe into wasm from S3. I'm not even attached to parquet, but IPC has the same issue: enabling the ipc feature forces compilation of both zstd and lz4-rs.

from polars.

akhilles avatar akhilles commented on July 1, 2024

I can make a similar change to polars_io::ipc instead if that's more welcome. Compression isn't critical for Arrow IPC files, and so being able to build polars for wasm + IPC without compression dependencies seems reasonable.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.