Comments (3)
Some introductory notes can be found at this post on Speeding Up Your Code
from python-aos-lesson.
One option might be to have people login to http://pangeo.pydata.org and then do one of the examples from https://github.com/pangeo-data/pangeo-example-notebooks by cloning that repo in the jupyter terminal.
(To get a notebook rather than jupyter lab environment you need to replace lab
with tree
in the URL, e.g. http://pangeo.pydata.org/user/damienirving/tree
)
from python-aos-lesson.
Resources:
This NCI notebook from Kate Snow introduces chunking.
This tutorial from Scott Wales (see recording) introduces more advanced dask usage.
Possible outline:
0. Simple things you can do
Lazy loading, subsetting, intermediate files, looping over depth slices (for instance).
1. Introduction to chunking
Dask chunking
The metadata of an xarray DataArray loaded with open_mfdataset
includes the dask chunk size.
File chunking
The file itself may also be chunked. Filesystem chunking is available in netCDF-4 and HDF5 datasets. CMIP6 data should all be netCDF-4 and include some form of chunking on the file.
You can look at the .encoding
attribute of an xarray variable to see information about the file storage.
2. Chunking best practices
Accessing data across chunks is slower than along chunks.
Optimal chunk sizes:
- http://xarray.pydata.org/en/stable/dask.html#chunking-and-performance
- You can change the dask chunk size. (array.rechunk?)
- Poor choices can make things very slow
3. Parallelising your code
In the notebook:
from dask.distributed import Client
c = Client()
c
From within a script:
import dask.distributed
if __name__ == '__main__':
client = dask.distributed.Client(
n_workers=8, threads_per_worker=1,
memory_limit='4gb', local_dir=tempfile.mkdtemp())
4. Rolling your own dask aware functions
Check if a function is dask aware by watching the progress bar:
import dask.diagnostics
dask.diagnostics.ProgressBar().register()
Use the dask map_overlap
and map_blocks
to make your functions dask aware.
from python-aos-lesson.
Related Issues (20)
- Update 06-github HOT 1
- Finish large data lesson HOT 1
- Add metadata to images HOT 5
- JupyterLab
- Reconsider asserts HOT 5
- Pangeo Binder as a backup
- Use shorter file names?
- Add example of zoomed in lat/lon in addition to the global plot
- Add a map_blocks example
- Add content on Dask task graph and debugging HOT 1
- Create a synthetic large dataset?
- Other options for parallel processing
- New EOS book on Earth Observation Using Python: A Practical Programming Guide HOT 2
- Dead link in the Large Data section HOT 1
- xarray.compute() should return an xarray instance
- conda-forge channel needs full path or can't be added HOT 3
- Expand the vectorisation lesson to "xarray thinking" HOT 1
- Helper script references non-existent data file
- Capturing small changes HOT 2
- Transition to new lesson infrastructure?
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from python-aos-lesson.