Giter Site home page Giter Site logo

stackstac's Introduction

StackSTAC

Documentation Status

Turn a list of STAC items into a 4D xarray DataArray (dims: time, band, y, x), including reprojection to a common grid. The array is a lazy Dask array, so loading and processing the data in parallel—locally or on a cluster—is just a compute() call away.

For more information and examples, please see the documentation.

import stackstac
import satsearch

stac_items = satsearch.Search(
    url="https://earth-search.aws.element84.com/v0",
    intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
    collections=["sentinel-s2-l2a-cogs"],
    datetime="2020-04-01/2020-05-01"
).items()

stack = stackstac.stack(stac_items)
print(stack)
<xarray.DataArray 'stackstac-f350f6bfc3213d7eee2e6cb159246d88' (time: 13, band: 17, y: 10980, x: 10980)>
dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/23)
  * time                        (time) datetime64[ns] 2020-04-01T18:04:04 ......
    id                          (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S...
  * band                        (band) <U8 'overview' 'visual' ... 'WVP' 'SCL'
  * x                           (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
  * y                           (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
    eo:cloud_cover              (time) float64 29.24 1.16 27.26 ... 87.33 5.41
    ...                          ...
    data_coverage               (time) object 33.85 100 33.9 ... 32.84 100 34.29
    platform                    (time) <U11 'sentinel-2b' ... 'sentinel-2b'
    sentinel:sequence           <U1 '0'
    proj:epsg                   int64 32613
    sentinel:data_coverage      (time) float64 33.85 100.0 33.9 ... 100.0 34.29
    title                       (band) object None ... 'Scene Classification ...
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

Once in xarray form, many operations become easy. For example, we can compute a low-cloud weekly mean-NDVI timeseries:

lowcloud = stack[stack["eo:cloud_cover"] < 40]
nir, red = lowcloud.sel(band="B08"), lowcloud.sel(band="B04")
ndvi = (nir - red) / (nir + red)
weekly_ndvi = ndvi.resample(time="1w").mean(dim=("time", "x", "y")).rename("NDVI")
# Call `weekly_ndvi.compute()` to process ~25GiB of raster data in parallel. Might want a dask cluster for that!

Installation

pip install stackstac

Windows notes:

It's a good idea to use conda to handle installing rasterio on Windows. There's considerably more pain involved with GDAL-type installations using pip. Then pip install stackstac.

Things stackstac does for you:

  • Figure out the geospatial parameters from the STAC metadata (if possible): a coordinate reference system, resolution, and bounding box.
  • Transfer the STAC metadata into xarray coordinates for easy indexing, filtering, and provenance of metadata.
  • Efficiently generate a Dask graph for loading the data in parallel.
  • Mediate between Dask's parallelism and GDAL's aversion to it, allowing for fast, multi-threaded reads when possible, and at least preventing segfaults when not.
  • Mask nodata and rescale by dataset-level scales/offsets.
  • Display data in interactive maps in a notebook, computed on the fly by Dask.

Limitations:

  • Raster data only! We are currently ignoring other types of assets you might find in a STAC (XML/JSON metadata, point clouds, video, etc.).
  • Single-band raster data only! Each band has to be a separate STAC asset—a separate red, green, and blue asset on each Item is great, but a single RGB asset containing a 3-band GeoTIFF is not supported yet.
  • COGs work best. "Normal" GeoTIFFs that aren't internally tiled, or don't have overviews, will see much worse performance. Sidecar files (like .msk files) are ignored for performance. JPEG2000 will probably work, but probably be slow unless you buy kakadu. Formats make a big difference.
  • BYOBlocksize. STAC doesn't offer any metadata about the internal tiling scheme of the data. Knowing it can make IO more efficient, but actually reading the data to figure it out is slow. So it's on you to set this parameter. (But if you don't, things should be fine for any reasonable COG.)
  • Doesn't make geospatial data any easier to work with in xarray. Common operations (picking bands, clipping to bounds, etc.) are tedious to type out. Real geospatial operations (shapestats on a GeoDataFrame, reprojection, etc.) aren't supported at all. rioxarray might help with some of these, but it has limited support for Dask, so be careful you don't kick off a huge computation accidentally.
  • I haven't even written tests yet! Don't use this in production. Or do, I guess. Up to you.

Roadmap:

Short-term:

  • Write tests and add CI (including typechecking)
  • Support multi-band assets
  • Easier access to s3://-style URIs (right now, you'll need to pass in gdal_env=stackstac.DEFAULT_GDAL_ENV.updated(always=dict(session=rio.session.AWSSession(...))))
  • Utility to guess blocksize (open a few assets)
  • Support item assets to provide more useful metadata with collections that use it (like S2 on AWS)
  • Rewrite dask graph generation once the Blockwise IO API settles

Long term (if anyone uses this thing):

  • Support other readers (aiocogeo?) that may perform better than GDAL for specific formats
  • Interactive mapping with xarray_leaflet, made performant with some Dask graph-rewriting tricks to do the initial IO at coarser resolution for lower zoom levels (otherwize zooming out could process terabytes of data)
  • Improve ergonomics of xarray for raster data (in collaboration with rioxarray)
  • Implement core geospatial routines (warp, vectorize, vector stats, GeoPandas/spatialpandas interop) in Dask

stackstac's People

Contributors

gjoseph92 avatar richardscottoz avatar kylebarron avatar scottyhq avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.