Giter Site home page Giter Site logo

nsidc-data-tutorials's Introduction

NSIDC-Data-Tutorials

Binder

Test Notebooks

Summary

This combined repository includes tutorials and code resources provided by the NASA National Snow and Ice Data Center Distributed Active Archive Center (NSIDC DAAC). These tutorials are provided as Python-based Jupyter notebooks that provide guidance on working with various data products, including how to access, subset, transform, and visualize data. Each tutorial can be accessed by navigating to the /notebooks folder of this repository. Please see the README files associated with each individual tutorial folder for more information on each tutorial and their learning objectives. Please note that all branches outside of Main should be considered in development and are not supported.

Tutorials

Snow Depth and Snow Cover Data Exploration

Originally demonstrated through the NASA Earthdata Webinar "Let It Snow! Accessing and Analyzing Snow Data at the NSIDC DAAC" on May 6, 2020, this tutorial provides guidance on how to discover, access, and couple snow data across varying geospatial scales from NASA's SnowEx, Airborne Snow Observatory, and Moderate Resolution Imaging Spectroradiometer (MODIS) missions. The tutorial highlights the ability to search and access data by a defined region, and combine and compare snow data across different data formats and scales using a Python-based Jupyter Notebook.

Getting the most out of NSIDC DAAC data: Discovering, Accessing, and Harmonizing Arctic Remote Sensing Data

Originally presented during the 2019 AGU Fall Meeting, this tutorial demonstrates the NSIDC DAAC's data discovery, access, and subsetting services, along with basic open source resources used to harmonize and analyze data across multiple products. The tutorial is provided as a series of Python-based Jupyter Notebooks, focusing on sea ice height and ice surface temperature data from NASA’s ICESat-2 and MODIS missions, respectively, to characterize Arctic sea ice.

Harmonized data for pre-IceBridge, ICESat and IceBridge data sets. These Jupyter notebooks are interactive documents to teach students and researchers interested in cryospheric sciences how to access and work with airborne altimetry and related data sets from NASA’s IceBridge mission, and satellite altimetry data from ICESat and ICESat-2 missions using the NSIDC IceFlow API

Global land ice velocities. The Inter-mission Time Series of Land Ice Velocity and Elevation (ITS_LIVE) project facilitates ice sheet, ice shelf and glacier research by providing a globally comprehensive and temporally dense multi-sensor record of land ice velocity and elevation with low latency. Scene-pair velocities were generated from satellite optical and radar imagery.

The notebooks on this project demonstrate how to search and access ITS_LIVE velocity pairs and provide a simple example on how to build a data cube.

Accessing and working with ICESat-2 Data in the Cloud

Originally presented to the UWG (User Working Group) in May 2022, this tutorial demonstrates how to search for ICESat-2 data hosted in the Earthdata Cloud and how to directly access it from an Amazon Web Services (AWS) Elastic Compute Cloud (EC2) instance using the earthaccess package.

Download, crop, resample, and plot multiple GeoTIFFs

This tutorial guides you through programmatically accessing and downloading GeoTIFF files from the NSIDC DAAC to your local computer. We then crop and resample one GeoTIFF based on the extent and pixel size of another GeoTIFF, then plot one on top of the other.

We will use two data sets from the NASA MEaSUREs (Making Earth System data records for Use in Research Environments) program as an example:

Usage with Binder

The Binder button above allows you to explore and run the notebook in a shared cloud computing environment without the need to install dependencies on your local machine. Note that this option will not directly download data to your computer; instead the data will be downloaded to the cloud environment.

Usage with Docker

On Mac OSX or Linux

  1. Install Docker. Use the left-hand navigation to select the appropriate install depending on operating system.

  2. Download the NSIDC-Data-Tutorials repository from Github.

  3. Unzip the file, and open a terminal window in the NSIDC-Data-Tutorials folder's location.

  4. From the terminal window, launch the docker container using the following command, replacing [path/notebook_folder] with your path and notebook folder name:

docker run --name tutorials -p 8888:8888 -v [path/notebook_folder]:/home/jovyan/work nsidc/tutorials

Example:

docker run --name tutorials -p 8888:8888 -v /Users/name/Desktop/NSIDC-Data-Tutorials:/home/jovyan/work nsidc/tutorials

Or, with docker-compose:

docker-compose up

If you want to mount a directory with write permissions, you need to grant the container the same permissions as the one on the directory to be mounted and tell it that has "root" access (within the container). This is important if you want to persist your work or download data to a local directory and not just the docker container. Run the example command below for this option:

docker run --name tutorials -e NB_UID=$(id -u) --user root -p 8888:8888 -v  /Users/name/Desktop/NSIDC-Data-Tutorials:/home/jovyan/work nsidc/tutorials

The initialization will take some time and will require 2.6 GB of space. Once the startup is complete you will see a line of output similar to this:

To access the notebook, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
    Or copy and paste one of these URLs:
        http://4dc97ddd7a0d:8888/?token=f002a50e25b6f623aa775312737ba8a23ffccfd4458faa6f
     or http://127.0.0.1:8888/?token=f002a50e25b6f623aa775312737ba8a23ffccfd4458faa6f

If you started your container with the -d/--detach option, check docker logs tutorials for this output.

  1. Open up a web browser and copy one of the URLs as instructed above.

  2. You will be brought to a Jupyter Notebook interface running through the Docker container. The left side of the interface displays your local directory structure. Navigate to the work folder of the NSIDC-Data-Tutorials repository folder. You can now interact with the notebooks to explore and access data.

On Windows

  1. Install Docker.

  2. Download the NSIDC-Data-Tutorials repository from Github.

  3. Unzip the file, and open a terminal window (use Command Prompt or PowerShell, not PowerShell ISE) in the NSIDC-Data-Tutorials folder's location.

  4. From the terminal window, launch the docker container using the following command, replacing [path\notebook_folder] with your path and notebook folder name:

docker run --name tutorials -p 8888:8888 -v [path\notebook_folder]:/home/jovyan/work nsidc/tutorials

Example:

docker run --name tutorials -p 8888:8888 -v C:\notebook_folder:/home/jovyan/work nsidc/tutorials

Or, with docker-compose:

docker-compose up

If you want to mount a directory with write permissions you need to grant the container the same permissions as the one on the directory to be mounted and tell it that has "root" access (within the container)

docker run --name tutorials --user root -p 8888:8888 -v C:\notebook_folder:/home/jovyan/work nsidc/tutorials

The initialization will take some time and will require 2.6 GB of space. Once the startup is complete you will see a line of output similar to this:

To access the notebook, open this file in a browser:
        file:///home/jovyan/.local/share/jupyter/runtime/nbserver-6-open.html
    Or copy and paste one of these URLs:
        http://(6a8bfa6a8518 or 127.0.0.1):8888/?token=2d72e03269b59636d9e31937fcb324f5bdfd0c645a6eba3f

If you started your container with the -d/--detach option, check docker logs tutorials for this output.

  1. Follow the instructions and copy one of the URLs into a web browser and hit return. The address should look something like this:

http://127.0.0.1:8888/?token=2d72e03269b59636d9e31937fcb324f5bdfd0c645a6eba3f

  1. You will now see the NSIDC-Data-Tutorials repository within the Jupyter Notebook interface. Navigate to /work to open the notebooks.

  2. You can now interact with the notebooks to explore and access data.

Usage with Mamba/Conda

Note: If we already have conda or mamba installed we can skip the first step.

  1. Install mambaforge (Python 3.9+) for your platform from mamba documentation

  2. Download the NSIDC-Data-Tutorials repository from Github by clicking the green 'Code' button located at the top right of the repository page and clicking 'Download Zip'. Unzip the file, and open a command line or terminal window in the NSIDC-Data-Tutorials folder's location.

  3. From a command line or terminal window, install the required environment with the following commands:

Linux

mamba create -n nsidc-tutorials --file binder/conda-linux-64.lock

OSX

mamba create -n nsidc-tutorials --file binder/conda-osx-64.lock

Windows

mamba create -n nsidc-tutorials --file binder/conda-win-64.lock

You should now see that the dependencies were installed and our environment is ready to be used.

Activate the environment with

conda activate nsidc-tutorials

Launch the notebook locally with the following command:

jupyter lab

This should open a browser window with the JupyterLab IDE, showing your current working directory on the left-hand navigation. Navigate to the tutorial folder of choice and click on their associated *.ipynb files to get started.

Tutorial Environments

Although the nsidc-tutorial environment should run all the notebooks in this repository, we also include tutorial-specific environments that will only contain the dependencies for them. If we don't want to "pollute" our conda environments and we are only going to work with one of the tutorials we recommend to use them instead of the nsidc-tutorial environment. The steps to install them are exactly the same but the environment files are inside the environment folders in each of the tutorials. e.g. for ITS_LIVE

cd notebooks/itslive 
mamba create -n nsidc-itslive --file environment/conda-linux-64.lock
conda activate nsidc-itslive
jupyter lab

This should create a pinned environment that should be fully reproducible across platforms.

NOTE: Sometimes Conda environments change (break) even with pinned down dependencies. If you run into an issue with dependencies for the tutorials please open an issue and we'll try to fix it as soon as possible.

Credit

This software is developed by the National Snow and Ice Data Center with funding from multiple sources.

License

This repository is licensed under the MIT license. License: MIT

nsidc-data-tutorials's People

Contributors

andypbarrett avatar asteiker avatar betolink avatar jessicas11 avatar jroebuck932 avatar lisakaser avatar mfisher87 avatar mikala-nsidc avatar nicholas-kotlinski avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

nsidc-data-tutorials's Issues

AttributeError from import of iceflow ui: NASAGIBS.BlueMarble no longer listed in xyzservices

What Happened

The 0_introduction.ipynb notebook raises an AttributeError in the first cell when trying to import iceflow.ui. This appears to result from a call to ipyleaflet.basemaps.NASAGIBS.BlueMarble.

Investigation

Looking at https://xyzservices.readthedocs.io/en/stable/introduction.html under NASAGIBS services, it appears that there is no longer an entry for BlueMarble. BlueMarble is also not shown in https://xyzservices.readthedocs.io/en/stable/gallery.html.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [~/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py:44](http://localhost:8889/home/apbarret/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py#line=43), in Bunch.__getattr__(self, key)
     43 try:
---> 44     return self.__getitem__(key)
     45 except KeyError as err:

KeyError: 'BlueMarble'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[1], line 2
      1 # Importing IceFlow client library
----> 2 from iceflow.ui import IceFlowUI
      3 from iceflow.client import IceflowClient
      5 import earthaccess

File [~/src/NSIDC-Data-Tutorials/notebooks/iceflow/iceflow/ui.py:9](http://localhost:8889/lab/tree/iceflow/ui.py#line=8)
      6 from IPython.display import display, HTML
      7 from ipyleaflet import (Map, SearchControl, AwesomeIcon, GeoJSON,
      8                         Marker, DrawControl, LayersControl)
----> 9 from .layers import custom_layers, flight_layers, widget_projections
     10 from .client import IceflowClient
     13 class IceFlowUI:

File [~/src/NSIDC-Data-Tutorials/notebooks/iceflow/iceflow/layers.py:106](http://localhost:8889/lab/tree/iceflow/layers.py#line=105)
     64 north_3413 = {
     65     'name': 'EPSG:3413',
     66     'custom': True,
   (...)
     81     ]
     82 }
     84 south_3031 = {
     85     'name': 'EPSG:3031',
     86     'custom': True,
   (...)
    101     ]
    102 }
    104 widget_projections = {
    105     'global': {
--> 106         'base_map': basemaps.NASAGIBS.BlueMarble,
    107         'projection': projections.EPSG3857,
    108         'center': (30, -30),
    109         'zoom': 2,
    110         'max_zoom': 8
    111     },
    112     'north': {
    113         'base_map': basemaps.NASAGIBS.BlueMarble3413,
    114         'projection': north_3413,
    115         'center': (80, -50),
    116         'zoom': 1,
    117         'max_zoom': 4
    118     },
    119     'south': {
    120         'base_map': basemaps.NASAGIBS.BlueMarble3031,
    121         'projection': south_3031,
    122         'center': (-90, 0),
    123         'zoom': 1,
    124         'max_zoom': 4
    125     }
    126 }

File [~/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py:46](http://localhost:8889/home/apbarret/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py#line=45), in Bunch.__getattr__(self, key)
     44     return self.__getitem__(key)
     45 except KeyError as err:
---> 46     raise AttributeError(key) from err

AttributeError: BlueMarble

Define a common `read_h5` function for _h5py + pandas_ and _dask array_

def read_h5(fname, vnames=[]):
    """Read a list of vars [v1, v2, ..] -> 2D."""
    f = h5py.File(fname, 'r')
    return np.column_stack([f[v][()] for v in vnames])

could be used for the pandas and dask array cells. Maybe this could be added to icepyx or offered as part of a separate tool set.

make tools easily installable as modules

I'm trying to use IceFlow in another workflow, but it's non-trivial to install in its current configuration. You can't install the module using a full path because of the dashes in the repo name (NSIDC-Data_Tutorials), and if you add the repo to your path and try to import there are all sorts of relative path/module errors. @betolink are there any plans to set this up? Currently I'm modifying iceflow/__init__.py to import each module, and then debugging the relative path calls for each module.

Break down environments and put them in separate git branches.

Right now we have a single environment for all the tutorials and it works but it's not the best practice. We should create a branch for each tutorial, say for SNOWEX there would be a git branch called binder-snowex with only the binder directory and the dependencies needed for that specific tutorial. Same with the others. Another advantage of this is that commits to main that don't interfere with the environment won't trigger a new build for Binder.

error running Customize and Access Data.ipynb

in the notebooks/ICESat-2_MODIS_Arctic_Sea_Ice/ folder

Running in Binder instance (from GH link on repo). I logged in with my Earthdata account (via code in the notebook), so not being logged in is not the problem.

The last code is

fn.request_data(param_dict,session)
fn.clean_folder()

This error is returned. Note I tried this yesterday too and got the same error so doesn't seem to be a temporary 404 problem.

Request HTTP response:  201

Order request URL:  https://n5eil02u.ecs.nsidc.org/egi/request?short_name=MOD29&version=6&bounding_box=140%2C72%2C153%2C80&temporal=2019-03-23T00%3A00%3A00Z%2C2019-03-23T23%3A59%3A59Z&page_size=2000&email=eli.holmes%40noaa.gov&bbox=140%2C72%2C153%2C80&time=2019-03-23T00%3A00%3A00%2C2019-03-23T23%3A59%3A59&coverage=%2Fgt1l%2Fsea_ice_segments%2Fdelta_time%2C%2Fgt1l%2Fsea_ice_segments%2Flatitude%2C%2Fgt1l%2Fsea_ice_segments%2Flongitude%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_confidence%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_height%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_quality%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_surface_error_est%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_length_seg%2C%2Fgt2l%2Fsea_ice_segments%2Fdelta_time%2C%2Fgt2l%2Fsea_ice_segments%2Flatitude%2C%2Fgt2l%2Fsea_ice_segments%2Flongitude%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_confidence%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_height%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_quality%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_surface_error_est%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_length_seg%2C%2Fgt3l%2Fsea_ice_segments%2Fdelta_time%2C%2Fgt3l%2Fsea_ice_segments%2Flatitude%2C%2Fgt3l%2Fsea_ice_segments%2Flongitude%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_confidence%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_height%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_quality%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_surface_error_est%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_length_seg&request_mode=async

order ID:  5000002704796
status URL:  https://n5eil02u.ecs.nsidc.org/egi/request/5000002704796
HTTP response from order response URL:  201

Initial request status is  pending

Status is not complete. Trying again.
Retry request status is:  pending
Status is not complete. Trying again.
Retry request status is:  pending
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  complete
Zip download URL:  https://n5eil02u.ecs.nsidc.org/esir/5000002704796.zip
Beginning download of zipped output...
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_150/2671428711.py in <module>
----> 1 fn.request_data(param_dict,session)
      2 fn.clean_folder()

~/notebooks/ICESat-2_MODIS_Arctic_Sea_Ice/tutorial_helper_functions.py in request_data(param_dict, session)
    235             zip_response = session.get(downloadURL)
    236             # Raise bad request: Loop will stop for bad response code.
--> 237             zip_response.raise_for_status()
    238             with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
    239                 z.extractall(path)

/srv/conda/envs/notebook/lib/python3.9/site-packages/requests/models.py in raise_for_status(self)
    951 
    952         if http_error_msg:
--> 953             raise HTTPError(http_error_msg, response=self)
    954 
    955     def close(self):

HTTPError: 404 Client Error: Not Found for url: https://n5eil02u.ecs.nsidc.org/esir/5000002704796.zip

Matplotlib warning get_cmap warning in notebooks/SMAP/03_smap_quality_flags.ipynb

I received a matplot lib warning for each image in the notebooks/SMAP/03_smap_quality_flags.ipynb notebook:
/tmp/ipykernel_478/3335087365.py:3: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead. cax = ax.imshow((surf_flag_L3_P>>i)&1, cmap=plt.cm.get_cmap('bone', 2))

New issue: CRYO-206

Related to CRYO-187

Suggest using xarray and cartopy for SMAP tutorial 2

The SMAP tutoral 2.0 read_and_plot_smap_data uses h5py and numpy. The whole notebook could be simplified and streamlined by using xarray.

If we stick with h5py a lot of the existing code could also be streamlined and made more transparent.

For example, code cell 3 involves a lot of code to get a list of groups and dataset paths, which can be simplified to the following.

with h5py.File(smap_files[0], 'r') as root:
    list_of_names = []
    root.visit(list_of_names.append)
list_of_names
['Metadata',
 'Metadata/AcquisitionInformation',
 'Metadata/AcquisitionInformation/platform',
 'Metadata/AcquisitionInformation/platformDocument',
 'Metadata/AcquisitionInformation/radar',
 'Metadata/AcquisitionInformation/radarDocument',
 'Metadata/AcquisitionInformation/radiometer',
 'Metadata/AcquisitionInformation/radiometerDocument',
 'Metadata/DataQuality',
 'Metadata/DataQuality/CompletenessOmission',
 'Metadata/DataQuality/DomainConsistency',
 'Metadata/DatasetIdentification',
 'Metadata/Extent',
 'Metadata/GridSpatialRepresentation',
 'Metadata/GridSpatialRepresentation/Column',
 'Metadata/GridSpatialRepresentation/GridDefinition',
 'Metadata/GridSpatialRepresentation/GridDefinitionDocument',

Code cell 5 that gets soil_moisture for the AM pass could be rewritten to use the path to the dataset

with h5py.File(smap_files[0], 'r') as root:
    soil_moisture = root['Soil_Moisture_Retrieval_Data_AM/soil_moisture'][:]
soil_moisture
array([[-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       ...,
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.]],
      dtype=float32)

But as I note, this is much, much simpler with xarray.

Introduce Valkyrie before missions

It think it is better to introduce Valkyrie and the problems it solves before giving an overview of the missions. For example...

Why Valkyrie

In 2003, NASA launched the Ice, Cloud and Land Elevation Satellite (ICESat) mission. Over the following six years, ICESat collected valuable data about ice thickness in the Polar Regions. Unfortunately, the ICESat mission ended before a follow-on mission could be launched. To fill the gap, an airborne campaign called Operation IceBridge was started. Between 2009 and 2019, Operation IceBridge flew numerous campaigns over the Greenland and Antarctic icesheets, as well as over sea ice in the Arctic and Southern Oceans. The last campaign was fill in date here. In September 2018, ICESat-2 was launched to continue NASA's collecting ice, cloud and land elevation data.

The wealth of data from these three missions, as well as from earlier missions, presents an opportunity to measure the evolution of ice thickness over several decades. However, combining data from these missions is a challenge. Data from the Airborne Topographic Mapper (ATM) flown during IceBridge campaigns is store in N different formats. ICESat and ICESat-2 data are also in different file formats. Data needs to be harmonized (put into similar formats) before comparisons can be made. A further complication is that the coordinate reference systems used to locate measurements have changed. The Earth's surface is not static and changes shape. To account for these changes, terrestrial reference frames that relate latitude and longitude to points on the Earth are updated on a regular basis. Since the launch of ICESat, the International Terrestrial Reference Frame has been updated three times. The geolocation accuracy of instruments means that a point measured at the beginning of the record is not the same point as that measured at the end of the record. Even though the latitude and longitude is the same. These changes in geolocation need to be reconciled if meaningful comparisons of measurements are to be made.

Valkyrie solves this problem...

This needs some work

Brief overview of ICESat

Brief Overview of Operation IceBridge

Brief Overview of ICESat-2

Integrate data-access-notebook

In order to streamline maintenance/sustainment of all of our NSIDC DAAC notebooks related to data access and customization in a single repo, we ought to migrate https://github.com/nsidc/NSIDC-Data-Access-Notebook to this repo.

As part of this integration, we should also update to using earthaccess where possible, and consider breaking out separate notebooks by generic search capabilities followed by a separate on-prem subsetter API-focused notebook.

New access patterns for NOAA@NSIDC data

I've been exploring requests and BeautifulSoup to get a list of files on HTTPS. I have code to recursively list files in a directory. I'm in two minds if this should be a tutorial or a how-to. The code "walks" the server directory tree and returns a generator containing the urls for each file. Recursion and generators are hard for many to get their heads around (they are for me at least). But it fills a need.

Ideally, we would have a STAC catalog for these datasets so that we do not need to have these kinds of access patterns. This might be for my next playtime.

import time
from http import HTTPStatus

import requests
from requests.exceptions import HTTPError

from bs4 import BeautifulSoup


retry_codes = [
    HTTPStatus.TOO_MANY_REQUESTS,
    HTTPStatus.INTERNAL_SERVER_ERROR,
    HTTPStatus.BAD_GATEWAY,
    HTTPStatus.SERVICE_UNAVAILABLE,
    HTTPStatus.GATEWAY_TIMEOUT,
]


def get_page(url: str, 
             retries: int = 3) -> requests.Response:
    """Gets resonse from requests

    Parameters
    ----------
    url : url to resource
    retries : number of retries before failing

    Returns
    -------
    requests.Response object
    """
    for n in range(retries):
        try:
            response = requests.get(url)
            response.raise_for_status()

            return response

        except HTTPError as exc:
            code = exc.response.status_code
        
            if code in retry_codes:
                # retry after n seconds
                time.sleep(n)
                continue

            raise    


def get_filelist(url: str, 
                 ext: str = ".nc"):
    """Returns a generator containing files in directory tree
    below url.

    Parameters
    ----------
    url : url to resource
    ext : file extension of files to search for

    Returns
    -------
    Generator containing list files
    """
    
    def is_subdirectory(href):
        return (href.endswith("/") and 
                href not in url and
                not href.startswith("."))

    def is_file(href, ext):
        return href.endswith(ext)
        
    response = get_page(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    for a in soup.find_all('a', href=True):
        if is_subdirectory(a["href"]):
            yield from get_filelist(url+a["href"])
        if is_file(a["href"], ext):
            yield(url + a["href"])

Use pathlib.Path throughout SMAP tutorials

SMAP tutorials use os for path and file operations.

See notebooks/SMAP/01_download_smap_data_rendered.ipynb

pathlib.Path is a better option. This is a small change but provides a more pythonic approach.

Error running `from iceflow.ui import IceFlowUI`

Error seems to be with the loading of the BlueMarble base layer.
This code generates the error:

from iceflow.ui import IceFlowUI

run from the Binder instance of the notebooks.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.9/site-packages/xyzservices/lib.py in __getattr__(self, key)
     41         try:
---> 42             return self.__getitem__(key)
     43         except KeyError:

KeyError: 'BlueMarble'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_96/3719722181.py in <module>
----> 1 from iceflow.ui import IceFlowUI

~/notebooks/iceflow/iceflow/ui.py in <module>
      7 from ipyleaflet import (Map, SearchControl, AwesomeIcon, GeoJSON,
      8                         Marker, DrawControl, LayersControl)
----> 9 from .layers import custom_layers, flight_layers, widget_projections
     10 from .client import IceflowClient
     11 

~/notebooks/iceflow/iceflow/layers.py in <module>
    101 widget_projections = {
    102     'global': {
--> 103         'base_map': basemaps.NASAGIBS.BlueMarble,
    104         'projection': projections.EPSG3857,
    105         'center': (30, -30),

/srv/conda/envs/notebook/lib/python3.9/site-packages/xyzservices/lib.py in __getattr__(self, key)
     42             return self.__getitem__(key)
     43         except KeyError:
---> 44             raise AttributeError(key)
     45 
     46     def __dir__(self):

AttributeError: BlueMarble

h5coro error reading ATL06 file

When running the https://github.com/nsidc/NSIDC-Data-Tutorials/blob/cryo-184/notebooks/ICESat-2_Cloud_Access/ATL10-h5coro.ipynb notebook for the whole "Antarctic" region, h5coro gives the following error.

H5Coro encountered error reading gt1r/freeboard_segment/latitude: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/longitude: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/delta_time: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/seg_dist_x: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/heights/height_segment_length_seg: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/beam_fb_height: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/heights/height_segment_type: invalid heap signature: 0x0

This causes a TypeError to be returned instead of a GeoPandas.Dataframe. The concatenation step in read_atl10 then fails.

File ~/NSIDC-Data-Tutorials/notebooks/ICESat-2_Cloud_Access/h5cloud/read_atl10.py:132, in read_atl10(files, bounding_box, executors, environment, credentials)
    129     return df
    131 dfs = pqdm(files, read_h5coro, n_jobs=executors)
--> 132 combined = pd.concat(dfs)
    134 return combined

I think we need a try, except block so that a None or some other value is returned.

We also need to then filter out the Nones so that pd.concat works.

Include a `valkyrie.download()` type cell

We should include a description of how to download the data. I don't think h5py can read a remote file.

Would we have to use requests?

Another thought is can we leverage icepyx for this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.