Comments (6)
Marking this as a bug because Alejandro says he has seen performance issues with auto viz.
from sparkmagic.
IMO 2500 is a good baseline but it seems to me that that's not all that many rows at all. I don't think a graphing mechanism that can only deal with a few hundred rows is very useful. I'll investigate this this week.
Two possibilities:
- We're doing something wrong with datasets of this size. That's possible, and we can fix it.
- Plotly isn't good at datasets of that size. This is the tougher situation but if we can identify the cause, we can make a contribution.
from sparkmagic.
In my brief testing I've done today, autoviz isn't suitable even for datasets of 100 rows. I will have a look at the code.
from sparkmagic.
I did some testing on basic plotly graphs in the browser locally (without the magics or anything) and all of the graphs were reasonably performant even when dealing with thousands of rows except the pie graph, which gets super slow with more than a couple hundred rows. This is not really a big deal since you shouldn't have a pie graph with that many rows anyway (there are only 360 degrees in a circle). On our end I think it makes the most sense to detect when a pie graph would have too many rows and throw an error when that happens ("Your result set has more than 200 rows; please reduce the size of your dataset for better visualizations".) When I push the change to fix the way labels and datasets are being computed for pie graphs, this will be alleviated partially.
What this means on our end is that if it takes a very long time to render line graphs and bar graphs (etc.) that are more than a few hundred rows large, it's our fault, because plotly can handle it.
from sparkmagic.
So after some investigation, it seems like Table representation and Pie graphs have perf issues.
Plotly Table takes ~42 seconds to render 60 rows or so. I believe the bottle neck is on the JS side of it. Opened plotly/plotly.py#383 in Plotly
Then, qgrid is not compatible with Plotly (see plotly/plotly.js#90), so the only alternative with good perf is default pandas df html representation.
from sparkmagic.
There's performance issues with a large amount of slices for pie graphs.
* 1500 rows crash the browser.
* 500 rows take ~15 s.
* 100 rows is almost automatic.
I'll go with (configurable) 100 rows.
from sparkmagic.
Related Issues (20)
- Reference Livy alternative in Sparkmagic Readme file HOT 1
- التحقق من إصدارات Apache Software Foundation HOT 1
- How to get ipympl plots working? HOT 1
- Sparkmagic Kerberos authentication issue HOT 1
- Plotly scala HOT 5
- Publish sparkmagic Docker images regularly HOT 1
- Run Tests Nightly to Catch Upstream Dependency Issues Earlier
- [BUG] Sparkmagic errors out using iPython 7.33.0 HOT 1
- pip deprecation warning when installing hdijupyterutils and autovizwidget HOT 1
- [QST] How to automatically load sparkmagic.magics when open a new ipython kernel tab HOT 1
- Document extending SparkMagic HOT 2
- how to pass python variable to %%sql cell ?
- [BUG] Default Docker container got broken HOT 4
- Support >= Pandas 2.0.0 HOT 5
- [BUG] error when first client connects HOT 1
- Jupyterlab 4.0.2 python 3.10 HOT 1
- [BUG] Cannot build Dockerfile.jupyter HOT 3
- [BUG] SparkMagic pyspark kernel magic(%%sql) hangs when running with Papermill. HOT 17
- Use variables in %%configure HOT 4
- [BUG] %%send-to-spark fails for dataframes with '\n' or ' characters HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from sparkmagic.