Giter Site home page Giter Site logo

Comments (5)

s-banach avatar s-banach commented on July 22, 2024 1

It's optimized for the use-case where you call df = pl.concat(...) just once up top,
and then you do all the rest of your processing on df.
The idea is that most operations on df will be faster if df is contiguous in memory, rather than separate chunks.

from polars.

s-banach avatar s-banach commented on July 22, 2024

Can you try concat with rechunk=False?

from polars.

Chuck321123 avatar Chuck321123 commented on July 22, 2024

@s-banach It was about 3000x faster. Are there any reasons this isnt set to False by default?

from polars.

Chuck321123 avatar Chuck321123 commented on July 22, 2024

@s-banach I see. Unfortunately, there were no performance gains when i did rechunk=False with how="align"

from polars.

ritchie46 avatar ritchie46 commented on July 22, 2024

You are comparing apples with peaches.

Every time you extend df1, you mutate the original memory. So the first extend is much cheaper than the later ones (that trigger a realloc) as your size of df1 increases every iteration.

for i in range(10):
    df1.extend(df2)
    print(df1.height)
1000000
1500000
2000000
2500000
3000000
3500000
4000000
4500000
5000000
5500000

So your extend "benchmark" has a different input every iterations, whilst your concat "benchmark" needs to concat the 5_500_000 case every time. Please take the time to validate your benchmarks.

On the difference, they will perform different. That's fine, they do different things.

Don't use extend as it mutates underlying memory which leads to these kind of bugs. As the warning of the extend docstring states.

from polars.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.