Deion So maybe im not comparing apples to apples, but I have

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

.extend significantly outperforms concat for the same operation about polars HOT 5 CLOSED

Chuck321123 commented on July 22, 2024

.extend significantly outperforms concat for the same operation

from polars.

Comments (5)

s-banach commented on July 22, 2024 1

It's optimized for the use-case where you call df = pl.concat(...) just once up top,
and then you do all the rest of your processing on df.
The idea is that most operations on df will be faster if df is contiguous in memory, rather than separate chunks.

from polars.

s-banach commented on July 22, 2024

Can you try concat with rechunk=False?

from polars.

Chuck321123 commented on July 22, 2024

@s-banach It was about 3000x faster. Are there any reasons this isnt set to False by default?

from polars.

Chuck321123 commented on July 22, 2024

@s-banach I see. Unfortunately, there were no performance gains when i did rechunk=False with how="align"

from polars.

ritchie46 commented on July 22, 2024

You are comparing apples with peaches.

Every time you extend df1, you mutate the original memory. So the first extend is much cheaper than the later ones (that trigger a realloc) as your size of df1 increases every iteration.

for i in range(10):
    df1.extend(df2)
    print(df1.height)

So your extend "benchmark" has a different input every iterations, whilst your concat "benchmark" needs to concat the 5_500_000 case every time. Please take the time to validate your benchmarks.

On the difference, they will perform different. That's fine, they do different things.

Don't use extend as it mutates underlying memory which leads to these kind of bugs. As the warning of the extend docstring states.

from polars.

.extend significantly outperforms concat for the same operation about polars HOT 5 CLOSED

Comments (5)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent