Giter Site home page Giter Site logo

Comments (4)

deanm0000 avatar deanm0000 commented on June 11, 2024 2

pl_flavor doesn't refer to the difference between a large_string and a string. It refers to the difference between a large_string and a utf8_view which doesn't seem to be implemented in pyarrow yet.

It seems @ritchie46 intended to close this as not planned so I'll do that now. Sorry if I'm mistaken on that point.

from polars.

ritchie46 avatar ritchie46 commented on June 11, 2024 1

I'm still new to Polars. What are some use cases of LargeString?

Our in-memory engine favors large chunks (often single chunked dataframes). It is pretty easy to reach the 2GB string limit on user data that way.

Is it feasible to expose this boolean flag in py-polars as well?

This is to convert to string_view and is only temporary until arrow consumers implement binview.

from polars.

ritchie46 avatar ritchie46 commented on June 11, 2024

We will not do that. Arrow default string can only hold 2GB of data per column, leading to all kinds of slicing requirements. We deem the default string utterly unusable for our use cases. You can always cast from LargeString to String and implement your own slicing if required.

from polars.

kevinjqliu avatar kevinjqliu commented on June 11, 2024

Thanks for the quick reply!

We deem the default string utterly unusable for our use cases

I'm still new to Polars. What are some use cases of LargeString?

You can always cast from LargeString to String and implement your own slicing if required.

We will probably do this for pyiceberg. apache/iceberg-python#520

It looks like in Rust, there's a pl_flavor boolean flag that can be set to use the regular Arrow string instead (1, 2) but this is not available in Python.

Is it feasible to expose this boolean flag in py-polars as well?

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.