Comments (4)
pl_flavor
doesn't refer to the difference between a large_string
and a string
. It refers to the difference between a large_string
and a utf8_view
which doesn't seem to be implemented in pyarrow yet.
It seems @ritchie46 intended to close this as not planned so I'll do that now. Sorry if I'm mistaken on that point.
from polars.
I'm still new to Polars. What are some use cases of LargeString?
Our in-memory engine favors large chunks (often single chunked dataframes). It is pretty easy to reach the 2GB string limit on user data that way.
Is it feasible to expose this boolean flag in py-polars as well?
This is to convert to string_view and is only temporary until arrow consumers implement binview
.
from polars.
We will not do that. Arrow default string can only hold 2GB of data per column, leading to all kinds of slicing requirements. We deem the default string utterly unusable for our use cases. You can always cast from LargeString
to String
and implement your own slicing if required.
from polars.
Thanks for the quick reply!
We deem the default string utterly unusable for our use cases
I'm still new to Polars. What are some use cases of LargeString
?
You can always cast from LargeString to String and implement your own slicing if required.
We will probably do this for pyiceberg. apache/iceberg-python#520
It looks like in Rust, there's a pl_flavor boolean flag that can be set to use the regular Arrow string instead (1, 2) but this is not available in Python.
Is it feasible to expose this boolean flag in py-polars as well?
from polars.
Related Issues (20)
- SOLUTION IN COMMENTS: write_parquet() using pyarrow with a "partition_cols" of type "str" maps all partition values to "...". This is NOT an issue in pyarrow.parquet.write_to_dataset HOT 2
- PanicException: impl error, should be a list at this point "invalid series dtype: expected `List`, got `str`" HOT 3
- Lazily evaluated error expression HOT 2
- Unsorted positional inputs to pl.Expr.is_between should be sorted or raise an error HOT 3
- Memory issue whith extend df with column of type List
- Error constructing DataFrame from dict scalar HOT 5
- Convenience method to get the week-commencing-date of an ISO week number HOT 9
- `pl.from_records(series_of_structs)` used to work (in v0.20.8 to v0.20.15), but now errors (in v0.20.16) HOT 8
- Query plan bug with partitioned iceberg tables
- Incorrect SyntaxError for Positional Argument Following Keyword Argument in Polars Aggregation HOT 4
- 0.38.3 Problems in .cargo/registry polars-plan HOT 2
- Community PR reviews HOT 3
- Sort `descending` breaks the `nulls_last` behaviour when sorting by more than one column HOT 3
- concat_list raises an error or returns an empty list if one of the filtered cols inside is empty HOT 3
- New Schema error updating 0.20.15 -> 0.20.16 HOT 4
- Segmentation Fault creating large expression of when.then HOT 5
- Support for PyArrow's ExtensionType. HOT 4
- Polars cannot read DeltaBinaryPacked encoded files HOT 3
- Upgrade PyO3 to 0.21 and switch to new API HOT 2
- LazyFrame with OOC sort panics with: `invalid series dtype: expected String, got BinaryOffset` HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from polars.