Giter Site home page Giter Site logo

Comments (3)

DamienIrving avatar DamienIrving commented on May 16, 2024

Some introductory notes can be found at this post on Speeding Up Your Code

from python-aos-lesson.

DamienIrving avatar DamienIrving commented on May 16, 2024

One option might be to have people login to http://pangeo.pydata.org and then do one of the examples from https://github.com/pangeo-data/pangeo-example-notebooks by cloning that repo in the jupyter terminal.

(To get a notebook rather than jupyter lab environment you need to replace lab with tree in the URL, e.g. http://pangeo.pydata.org/user/damienirving/tree)

from python-aos-lesson.

DamienIrving avatar DamienIrving commented on May 16, 2024

Resources:
This NCI notebook from Kate Snow introduces chunking.
This tutorial from Scott Wales (see recording) introduces more advanced dask usage.

Possible outline:

0. Simple things you can do

Lazy loading, subsetting, intermediate files, looping over depth slices (for instance).

1. Introduction to chunking

Dask chunking

The metadata of an xarray DataArray loaded with open_mfdataset includes the dask chunk size.

File chunking

The file itself may also be chunked. Filesystem chunking is available in netCDF-4 and HDF5 datasets. CMIP6 data should all be netCDF-4 and include some form of chunking on the file.

You can look at the .encoding attribute of an xarray variable to see information about the file storage.

2. Chunking best practices

Accessing data across chunks is slower than along chunks.

Optimal chunk sizes:

3. Parallelising your code

In the notebook:

from dask.distributed import Client
c = Client()
c

From within a script:

import dask.distributed

if __name__ == '__main__':
    client = dask.distributed.Client(
        n_workers=8, threads_per_worker=1,
        memory_limit='4gb', local_dir=tempfile.mkdtemp())

4. Rolling your own dask aware functions

Check if a function is dask aware by watching the progress bar:

import dask.diagnostics
dask.diagnostics.ProgressBar().register()

Use the dask map_overlap and map_blocks to make your functions dask aware.

from python-aos-lesson.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.