Comments (1)
Second attempt, after merging #57.
Cluster: https://cloud.coiled.io/clusters/245853?account=dask-engineering
dataset | computation | timing (read_parquet) | timing (read_deltalake) |
---|---|---|---|
ds20f_100M | ddf["int1"].sum().compute() | CPU times: user 41.2 ms, sys: 10.6 ms, total: 51.8 ms, Wall time: 9.48 s | CPU times: user 181 ms, sys: 42.3 ms, total: 224 ms, Wall time: 59.8 s |
ds20f_100M | ddf.describe().compute() | CPU times: user 243 ms, sys: 27.9 ms, total: 271 ms, Wall time: 23 s | CPU times: user 308 ms, sys: 51.7 ms, total: 360 ms, Wall time: 1min 1s |
ds25f_250M | ddf["int1"].sum().compute() | CPU times: user 63.8 ms, sys: 15.9 ms, total: 79.7 ms, Wall time: 16.6 s | CPU times: user 716 ms, sys: 182 ms, total: 897 ms, Wall time: 3min 51s |
ds25f_250M | ddf.describe().compute() | CPU times: user 623 ms, sys: 71 ms, total: 694 ms, Wall time: 1min 9s | CPU times: user 986 ms, sys: 189 ms, total: 1.17 s, Wall time: 3min 52s |
ds50f_500M | ddf["int1"].sum().compute() | CPU times: user 199 ms, sys: 47.5 ms, total: 246 ms, Wall time: 1min | CPU times: user 2.89 s, sys: 799 ms, total: 3.69 s, Wall time: 16min 7s |
ds50f_500M | ddf.describe().compute() | CPU times: user 3.45 s, sys: 383 ms, total: 3.83 s, Wall time: 5min 37s | CPU times: user 5.36 s, sys: 832 ms, total: 6.19 s, Wall time: 16min 24s |
Looks like dask-deltalake
is doing something very inefficient.
from dask-deltatable.
Related Issues (20)
- Handle timestamps other than `datetime64[us]`
- Release soon? HOT 5
- Finalize API for writing Delta Tables HOT 1
- Support pyarrow types_mapper kwarg
- Pickle error with `ParquetFileWriteOptions` and `distributed.Client`
- Support reading and writing to remote filesystems (s3, gcsfs, azure)
- Credentials for remote filesystems?
- `storage_options` inconsistency between `read_deltalake` and `to_deltalake`
- `TypeError`: cannot pickle `builtins.RawDeltaTable` object
- Can we get rid of `filters_to_expression`?
- What are the limitations of to_deltalake? HOT 1
- Problem with `pyarrow` dependency when installing dask-deltatable HOT 3
- Failed import when running `deltalake==0.14.0` HOT 4
- Order data by partitions if available HOT 3
- Specify AWS Permissions if reading from S3 HOT 1
- Overwriting tables
- `ImportError` with `deltalake=0.16.0` HOT 4
- Example in Readme not reproducible HOT 2
- `read_deltalake` breaks with dask>=2024.3.1 HOT 7
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from dask-deltatable.