Comments (2)
What I had in mind, are a few methods in the Dataset class like
- as_dataframe (for pandas)
- as_astropy_table
- as_numpy (a dict with numpy arrays)
It makes sense for them to take a 'selection' argument, see for instance Dataset.scatter, and the arguments column_names and virtual like export_hdf5, possibly a strings argument like Dataset.get_column_names(..). It may be convenient to give the as_numpy method an extra with_units=False argument to put in the units (see http://docs.astropy.org/en/stable/units/ ). Astropy tables can also take units, for pandas dataframe I am not sure.
You might want to check out in Dataset.scatter how I protect against converting too much data, or see https://github.com/maartenbreddels/vaex/blob/2754156da08fd4fad2555fdf0d85373ebae10a35/vaex/export.py#L291 how I protect against using too much memory, and similary in the gui https://github.com/maartenbreddels/vaex/blob/master/vaex/ui/main.py#L1070 (Just noticed that the gui code should be refactored to use the vaex.utils)
Please also include unittests, looking forward to see you PR!
from vaex.
Implemented in 7ca315b, with a short description here:
http://vaex.astro.rug.nl/latest/getting_data_in_vaex.html#getting-your-data-out
should be available in the next release
from vaex.
Related Issues (20)
- Vaex with Pyinstaller
- [BUG-REPORT] vaex save error
- [FEATURE-REQUEST] Support Python 3.12 HOT 1
- [BUG-REPORT] Printing vaex df after sort running out of memory
- [BUG-REPORT] HOT 1
- [BUG-REPORT] Large Groupby Agg runs out of memory
- Vaex not exporting to file properly inside of a mulitprocessing pool.
- [FEATURE-REQUEST] looking for vaex equivalent of pandas_df.corr(min_periods=100)
- [FEATURE-REQUEST] Getting dtype of columns as they are when rendered in a pandas dataframe?
- [BUG-REPORT] SyntaxError when representing result of a basic operation involving Expression and numpy array, with the array on the right of the operator HOT 1
- How do i overcome this issue? HOT 1
- [BUG-REPORT] why the same dataframe and same groupby(vaex.agg.list) operation, one got error and one got correct if i export to hdf5 and loaded it back
- [BUG-REPORT] jupyter notebook error in tutorial docs
- [BUG-REPORT] HDF5 file remains locked after close
- [BUG-REPORT] MinMaxScaler not working after filtering
- [FEATURE-REQUEST] is it still supported
- [FEATURE-REQUEST] read hdf5 file in C++
- [BUG-REPORT] Docs are down HOT 2
- statistics on 2d grids: control the `bin_centers`
- interactive statistic heatmap / quadrilateral mesh
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from vaex.