Comments (8)
As an aside, there's https://github.com/kylebarron/parquet-wasm. I don't know if figuring out the interoperability is worth it.
Back to your point, it seems another idea would be to make the compression dependencies a la carte instead of all of them.
from polars.
it seems another idea would be to make the compression dependencies a la carte instead of all of them.
With #16731, this is possible with zstd:
polars-io = { version = "0.40", default_features = false, features = ["polars-parquet", "zstd"] }
For the other compression dependencies, it's still possible but you have to add polars-parquet
as a direct dependency:
polars-io = { version = "0.40", default_features = false, features = ["polars-parquet"] }
polars-parquet = { version = "0.40", default_features = false, features = ["lz4"] }
Adding passthrough feature flags for lz4
, brotli
, etc. would make this a bit easier but not sure if it's worth it.
from polars.
I am trying to compile Polars for wasm32 target (to be used within Leptos app on client side) and I found this issue here.
After disabling some features related to compression and crossterm to make some progress compile, I am getting this compilation error:
(git)-[main]-polars % cargo build --target wasm32-unknown-unknown
...
Compiling polars-arrow v0.40.0 (/Users/jc/src/polars/crates/polars-arrow)
error: this arithmetic operation will overflow
--> crates/polars-arrow/src/compute/cast/utf8_to.rs:87:45
|
87 | .sliced(0, std::cmp::min(buf.len(), OffsetType::MAX as usize * 2))
| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^ attempt to compute `usize::MAX * 2_usize`, which would overflow
|
= note: `#[deny(arithmetic_overflow)]` on by default
error: could not compile `polars-arrow` (lib) due to 1 previous error
This line was introduced in PR 15408 to fix an overflow problem.
I think it's due to this type definition:
#[cfg(not(test))]
type OffsetType = u32;
This will cause usize::MAX * 2_usize
to overflow on a 32-bit system (like wasm32 target).
from polars.
Why would you need parquet on wasm if I may ask? I have never seen uncompressed parquet files and supporting this adds a lot of complexity and breaking assumptions that I don't think are worth it.
from polars.
I use duckdb-wasm to read parquet files and it's great. It means I can have a dashboard without an api layer and I can write most of my logic in sql instead of javascript. That said, the duckdb parquet extension supports compression. I don't know how much it'd save on bundle size and load time if it only supported uncompressed parquets. I don't know what the benefit is of uncompressed parquets is vs uncompressed ipc files though.
from polars.
Why would you need parquet on wasm if I may ask?
@ritchie46 sure: I am reading, and analyzing parquet files from within the browser (client side only; no server side).
I was able to read parquet files with the parquet crate and use datafusion for the query engine.
I have never seen uncompressed parquet files
The aim was not to disable all compression. I was just trying to debug the issues and disabled one feature at a time and found this overflow issue in wasm32. I am actually using compression.
and supporting this adds a lot of complexity and breaking assumptions that I don't think are worth it.
I understand. No worries.
from polars.
Why would you need parquet on wasm if I may ask? I have never seen uncompressed parquet files and supporting this adds a lot of complexity and breaking assumptions that I don't think are worth it.
Similar to @jccampagne, I don't necessarily need uncompressed parquet files. I just want to use a single compression scheme (zstd), but currently all of the compression dependencies are built. This is probably fine for most targets, but not really suitable for wasm for 2 reasons:
- Compiling C/C++ dependencies to
wasm32-unknown-unknown
is a pain - Keeping binary size smaller is more important
I updated the issue to reflect this more accurately.
Really I just need some way to load a Dataframe into wasm from S3. I'm not even attached to parquet, but IPC has the same issue: enabling the ipc
feature forces compilation of both zstd
and lz4-rs
.
from polars.
I can make a similar change to polars_io::ipc
instead if that's more welcome. Compression isn't critical for Arrow IPC files, and so being able to build polars for wasm + IPC without compression dependencies seems reasonable.
from polars.
Related Issues (20)
- Add `df1.eq_missing(df2)` function
- Serialize UUIDs or show a better error when panicking on object columns
- Feature request: Faster backward- and forward_fill() functions HOT 3
- Nested array list combination gives invalid type
- It is not possible to create Arrays with zero width HOT 3
- Difference with pandas left join caused by new coalesce flag HOT 1
- SQLSyntaxError: invalid 'part' for EXTRACT/DATE_PART: 'year' HOT 1
- `scan_csv` limited to fewer encoding options than `read_csv` HOT 4
- Panic on SchemaMismatch HOT 5
- docs specify outer join will change default coalesce behaviour, but in python I get additional warnings for left join changing behavior
- Improve error messaging when wrong using a wrong schema/dtype
- Schema validation is deferred until an explicit schema/columns call as opposed to at definition time HOT 3
- Polars fails to recognize the benefits of an already sorted rle_id() HOT 1
- Add conversion to binary
- `to_numpy()` weird behaviour with `map_elements` and `pl.List` polars type HOT 2
- UInt64 column casted to float after `-` operation (since 0.20.23) HOT 1
- Is it possible to use another locale in `.sort()`? HOT 2
- panic when selecting `len` from empty dataframe with projection_pushdown
- In a variety of cases, errors in PySeries::scatter mean Python Series.__setitem__ will corrupt the Series
- Alias not working with multiple tables HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.