Comments (2)
@stinodego the csv is a read herring. This is a stackoverflow in streaming engine. The sink parquet shares the append
union which currently is implemented recursively per pipeline. 17K in this case.
Even if this is not possible, I expect a clear error not a seg fault. (I thought it was theoretically impossible to get a seg fault with Rust.)
No it is not. And a stackoverflow also isn't handled gracefully that's why you don't get a nice error.
from polars.
Ah ok, that makes sense.
My workaround for now is to process the files in batches (17k CSV to 17 parquet in 17 batches of 1000, then another step to combine 17 parquet files into 1 parquet file.). But for different datasets the limit is lower than 17k.
from polars.
Related Issues (20)
- pl.struct usage of schema renders keyword argument renaming ineffective
- Update function signature of `nth` to allow positional input of indices
- `Series` constructor: 2D array to `Array` instead of `List`.
- depr: enforce `ignore_nulls` default change in 1.0 HOT 1
- Multiplying or dividing pl.duration converts to float HOT 2
- Post-join `panic` on `select` after joining on keys with different names with "coalesce=False"
- `df.group_by(series).agg(...)` fails if df was built using `pl.concat` (ShapeError) HOT 2
- `concat_list` PanicException inside `agg()`
- A regression in numpy-to-series conversion
- PanicException when using `collect(streaming=True)` on an empty `.parquet` file with joins.
- Regression: `.collect()` no longer generates `unexpected keywords` error. HOT 1
- Shift() fails if the argument is contained in a column HOT 1
- Support boolean iterable for argument `descending` in `DataFrame.set_sorted()`
- polars.exceptions.ComputeError: could not append value: "2024-03-05T17:39:39Z" of type: str to the builder; make sure that all rows have the same schema or consider increasing `infer_schema_length` HOT 1
- Re-aliasing existing `LazyFrame` column name does not return the last version HOT 2
- dt.trunctate is 3-4x slower in polars compared to pandas HOT 2
- `.struct.with_fields` PanicException inside `agg`
- add `show` method for syntax compatibility with pyspark/duckdb/etc dataframe API
- `gather` in `agg` context gathers values from other groups
- ShapeError: filter's length: 155 differs from that of the series: 0 HOT 9
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.