nsidc / nsidc-data-tutorials Goto Github PK

View Code? Open in Web Editor NEW

75.0 9.0 38.0 36.78 MB

Jupyter notebook-based tutorials to learn how to access and work with select NSIDC DAAC data.

License: MIT License

Dockerfile 0.01% Jupyter Notebook 99.13% Python 0.87%

nsidc-data-tutorials's Issues

The file referenced in ../data is not there

Do we include a small example file?

Use pathlib.Path throughout SMAP tutorials

SMAP tutorials use os for path and file operations.

See notebooks/SMAP/01_download_smap_data_rendered.ipynb

pathlib.Path is a better option. This is a small change but provides a more pythonic approach.

New access patterns for NOAA@NSIDC data

I've been exploring requests and BeautifulSoup to get a list of files on HTTPS. I have code to recursively list files in a directory. I'm in two minds if this should be a tutorial or a how-to. The code "walks" the server directory tree and returns a generator containing the urls for each file. Recursion and generators are hard for many to get their heads around (they are for me at least). But it fills a need.

Ideally, we would have a STAC catalog for these datasets so that we do not need to have these kinds of access patterns. This might be for my next playtime.

import time
from http import HTTPStatus

import requests
from requests.exceptions import HTTPError

from bs4 import BeautifulSoup


retry_codes = [
    HTTPStatus.TOO_MANY_REQUESTS,
    HTTPStatus.INTERNAL_SERVER_ERROR,
    HTTPStatus.BAD_GATEWAY,
    HTTPStatus.SERVICE_UNAVAILABLE,
    HTTPStatus.GATEWAY_TIMEOUT,
]


def get_page(url: str, 
             retries: int = 3) -> requests.Response:
    """Gets resonse from requests

    Parameters
    ----------
    url : url to resource
    retries : number of retries before failing

    Returns
    -------
    requests.Response object
    """
    for n in range(retries):
        try:
            response = requests.get(url)
            response.raise_for_status()

            return response

        except HTTPError as exc:
            code = exc.response.status_code
        
            if code in retry_codes:
                # retry after n seconds
                time.sleep(n)
                continue

            raise    


def get_filelist(url: str, 
                 ext: str = ".nc"):
    """Returns a generator containing files in directory tree
    below url.

    Parameters
    ----------
    url : url to resource
    ext : file extension of files to search for

    Returns
    -------
    Generator containing list files
    """
    
    def is_subdirectory(href):
        return (href.endswith("/") and 
                href not in url and
                not href.startswith("."))

    def is_file(href, ext):
        return href.endswith(ext)
        
    response = get_page(url)
    soup = BeautifulSoup(response.text, 'html.parser')
    for a in soup.find_all('a', href=True):
        if is_subdirectory(a["href"]):
            yield from get_filelist(url+a["href"])
        if is_file(a["href"], ext):
            yield(url + a["href"])

Break down environments and put them in separate git branches.

Right now we have a single environment for all the tutorials and it works but it's not the best practice. We should create a branch for each tutorial, say for SNOWEX there would be a git branch called binder-snowex with only the binder directory and the dependencies needed for that specific tutorial. Same with the others. Another advantage of this is that commits to main that don't interfere with the environment won't trigger a new build for Binder.

Introduce Valkyrie before missions

It think it is better to introduce Valkyrie and the problems it solves before giving an overview of the missions. For example...

Why Valkyrie

In 2003, NASA launched the Ice, Cloud and Land Elevation Satellite (ICESat) mission. Over the following six years, ICESat collected valuable data about ice thickness in the Polar Regions. Unfortunately, the ICESat mission ended before a follow-on mission could be launched. To fill the gap, an airborne campaign called Operation IceBridge was started. Between 2009 and 2019, Operation IceBridge flew numerous campaigns over the Greenland and Antarctic icesheets, as well as over sea ice in the Arctic and Southern Oceans. The last campaign was fill in date here. In September 2018, ICESat-2 was launched to continue NASA's collecting ice, cloud and land elevation data.

The wealth of data from these three missions, as well as from earlier missions, presents an opportunity to measure the evolution of ice thickness over several decades. However, combining data from these missions is a challenge. Data from the Airborne Topographic Mapper (ATM) flown during IceBridge campaigns is store in N different formats. ICESat and ICESat-2 data are also in different file formats. Data needs to be harmonized (put into similar formats) before comparisons can be made. A further complication is that the coordinate reference systems used to locate measurements have changed. The Earth's surface is not static and changes shape. To account for these changes, terrestrial reference frames that relate latitude and longitude to points on the Earth are updated on a regular basis. Since the launch of ICESat, the International Terrestrial Reference Frame has been updated three times. The geolocation accuracy of instruments means that a point measured at the beginning of the record is not the same point as that measured at the end of the record. Even though the latitude and longitude is the same. These changes in geolocation need to be reconciled if meaningful comparisons of measurements are to be made.

Valkyrie solves this problem...

This needs some work

Brief overview of ICESat

Brief Overview of Operation IceBridge

Brief Overview of ICESat-2

Prefer some other term than "holistic"

I'm not sure what is meant by "holistic". Maybe "use together", "combine", "compare"

Define a common `read_h5` function for _h5py + pandas_ and _dask array_

def read_h5(fname, vnames=[]):
    """Read a list of vars [v1, v2, ..] -> 2D."""
    f = h5py.File(fname, 'r')
    return np.column_stack([f[v][()] for v in vnames])

could be used for the pandas and dask array cells. Maybe this could be added to icepyx or offered as part of a separate tool set.

The parameter of earthaccess.login()

Hey, I When I run Jupyter, I enter my username and password interactively to log in (earthaccess.login(strategy='interactive', persist=True)). Now I want to run it in a local script, how can I set a non-interactive login method in earthaccess.login()?

Change learning goals to "will be able to..."

For example, change "After using this notebook you should be able to:" to "On completely this tutorial you will be able to..."

Matplotlib warning get_cmap warning in notebooks/SMAP/03_smap_quality_flags.ipynb

I received a matplot lib warning for each image in the notebooks/SMAP/03_smap_quality_flags.ipynb notebook:
/tmp/ipykernel_478/3335087365.py:3: MatplotlibDeprecationWarning: The get_cmap function was deprecated in Matplotlib 3.7 and will be removed two minor releases later. Use ``matplotlib.colormaps[name]`` or ``matplotlib.colormaps.get_cmap(obj)`` instead. cax = ax.imshow((surf_flag_L3_P>>i)&1, cmap=plt.cm.get_cmap('bone', 2))

New issue: CRYO-206

Related to CRYO-187

Break out h5coro notebook into separate tutorial folder

We now have several notebooks and subfolders under https://github.com/nsidc/NSIDC-Data-Tutorials/tree/main/notebooks/ICESat-2_Cloud_Access. Consider reorganizing and breaking out the h5coro notebooks and/or scripts to better distinguish between basic cloud access vs more advanced and larger scale access + aggregation.

Suggest using xarray and cartopy for SMAP tutorial 2

The SMAP tutoral 2.0 read_and_plot_smap_data uses h5py and numpy. The whole notebook could be simplified and streamlined by using xarray.

If we stick with h5py a lot of the existing code could also be streamlined and made more transparent.

For example, code cell 3 involves a lot of code to get a list of groups and dataset paths, which can be simplified to the following.

with h5py.File(smap_files[0], 'r') as root:
    list_of_names = []
    root.visit(list_of_names.append)
list_of_names

['Metadata',
 'Metadata/AcquisitionInformation',
 'Metadata/AcquisitionInformation/platform',
 'Metadata/AcquisitionInformation/platformDocument',
 'Metadata/AcquisitionInformation/radar',
 'Metadata/AcquisitionInformation/radarDocument',
 'Metadata/AcquisitionInformation/radiometer',
 'Metadata/AcquisitionInformation/radiometerDocument',
 'Metadata/DataQuality',
 'Metadata/DataQuality/CompletenessOmission',
 'Metadata/DataQuality/DomainConsistency',
 'Metadata/DatasetIdentification',
 'Metadata/Extent',
 'Metadata/GridSpatialRepresentation',
 'Metadata/GridSpatialRepresentation/Column',
 'Metadata/GridSpatialRepresentation/GridDefinition',
 'Metadata/GridSpatialRepresentation/GridDefinitionDocument',

Code cell 5 that gets soil_moisture for the AM pass could be rewritten to use the path to the dataset

with h5py.File(smap_files[0], 'r') as root:
    soil_moisture = root['Soil_Moisture_Retrieval_Data_AM/soil_moisture'][:]
soil_moisture

array([[-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       ...,
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.],
       [-9999., -9999., -9999., ..., -9999., -9999., -9999.]],
      dtype=float32)

But as I note, this is much, much simpler with xarray.

Make sure ICESat and ICESat-2 are used consistently

also IceBridge

Integrate data-access-notebook

In order to streamline maintenance/sustainment of all of our NSIDC DAAC notebooks related to data access and customization in a single repo, we ought to migrate https://github.com/nsidc/NSIDC-Data-Access-Notebook to this repo.

As part of this integration, we should also update to using earthaccess where possible, and consider breaking out separate notebooks by generic search capabilities followed by a separate on-prem subsetter API-focused notebook.

make tools easily installable as modules

I'm trying to use IceFlow in another workflow, but it's non-trivial to install in its current configuration. You can't install the module using a full path because of the dashes in the repo name (NSIDC-Data_Tutorials), and if you add the repo to your path and try to import there are all sorts of relative path/module errors. @betolink are there any plans to set this up? Currently I'm modifying iceflow/__init__.py to import each module, and then debugging the relative path calls for each module.

error running Customize and Access Data.ipynb

in the notebooks/ICESat-2_MODIS_Arctic_Sea_Ice/ folder

Running in Binder instance (from GH link on repo). I logged in with my Earthdata account (via code in the notebook), so not being logged in is not the problem.

The last code is

fn.request_data(param_dict,session)
fn.clean_folder()

This error is returned. Note I tried this yesterday too and got the same error so doesn't seem to be a temporary 404 problem.

Request HTTP response:  201

Order request URL:  https://n5eil02u.ecs.nsidc.org/egi/request?short_name=MOD29&version=6&bounding_box=140%2C72%2C153%2C80&temporal=2019-03-23T00%3A00%3A00Z%2C2019-03-23T23%3A59%3A59Z&page_size=2000&email=eli.holmes%40noaa.gov&bbox=140%2C72%2C153%2C80&time=2019-03-23T00%3A00%3A00%2C2019-03-23T23%3A59%3A59&coverage=%2Fgt1l%2Fsea_ice_segments%2Fdelta_time%2C%2Fgt1l%2Fsea_ice_segments%2Flatitude%2C%2Fgt1l%2Fsea_ice_segments%2Flongitude%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_confidence%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_height%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_quality%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_surface_error_est%2C%2Fgt1l%2Fsea_ice_segments%2Fheights%2Fheight_segment_length_seg%2C%2Fgt2l%2Fsea_ice_segments%2Fdelta_time%2C%2Fgt2l%2Fsea_ice_segments%2Flatitude%2C%2Fgt2l%2Fsea_ice_segments%2Flongitude%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_confidence%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_height%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_quality%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_surface_error_est%2C%2Fgt2l%2Fsea_ice_segments%2Fheights%2Fheight_segment_length_seg%2C%2Fgt3l%2Fsea_ice_segments%2Fdelta_time%2C%2Fgt3l%2Fsea_ice_segments%2Flatitude%2C%2Fgt3l%2Fsea_ice_segments%2Flongitude%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_confidence%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_height%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_quality%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_surface_error_est%2C%2Fgt3l%2Fsea_ice_segments%2Fheights%2Fheight_segment_length_seg&request_mode=async

order ID:  5000002704796
status URL:  https://n5eil02u.ecs.nsidc.org/egi/request/5000002704796
HTTP response from order response URL:  201

Initial request status is  pending

Status is not complete. Trying again.
Retry request status is:  pending
Status is not complete. Trying again.
Retry request status is:  pending
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  processing
Status is not complete. Trying again.
Retry request status is:  complete
Zip download URL:  https://n5eil02u.ecs.nsidc.org/esir/5000002704796.zip
Beginning download of zipped output...
---------------------------------------------------------------------------
HTTPError                                 Traceback (most recent call last)
/tmp/ipykernel_150/2671428711.py in <module>
----> 1 fn.request_data(param_dict,session)
      2 fn.clean_folder()

~/notebooks/ICESat-2_MODIS_Arctic_Sea_Ice/tutorial_helper_functions.py in request_data(param_dict, session)
    235             zip_response = session.get(downloadURL)
    236             # Raise bad request: Loop will stop for bad response code.
--> 237             zip_response.raise_for_status()
    238             with zipfile.ZipFile(io.BytesIO(zip_response.content)) as z:
    239                 z.extractall(path)

/srv/conda/envs/notebook/lib/python3.9/site-packages/requests/models.py in raise_for_status(self)
    951 
    952         if http_error_msg:
--> 953             raise HTTPError(http_error_msg, response=self)
    954 
    955     def close(self):

HTTPError: 404 Client Error: Not Found for url: https://n5eil02u.ecs.nsidc.org/esir/5000002704796.zip

h5coro error reading ATL06 file

When running the https://github.com/nsidc/NSIDC-Data-Tutorials/blob/cryo-184/notebooks/ICESat-2_Cloud_Access/ATL10-h5coro.ipynb notebook for the whole "Antarctic" region, h5coro gives the following error.

H5Coro encountered error reading gt1r/freeboard_segment/latitude: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/longitude: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/delta_time: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/seg_dist_x: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/heights/height_segment_length_seg: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/beam_fb_height: invalid heap signature: 0x0
H5Coro encountered error reading gt1r/freeboard_segment/heights/height_segment_type: invalid heap signature: 0x0

This causes a TypeError to be returned instead of a GeoPandas.Dataframe. The concatenation step in read_atl10 then fails.

File ~/NSIDC-Data-Tutorials/notebooks/ICESat-2_Cloud_Access/h5cloud/read_atl10.py:132, in read_atl10(files, bounding_box, executors, environment, credentials)
    129     return df
    131 dfs = pqdm(files, read_h5coro, n_jobs=executors)
--> 132 combined = pd.concat(dfs)
    134 return combined

I think we need a try, except block so that a None or some other value is returned.

We also need to then filter out the Nones so that pd.concat works.

AttributeError from import of iceflow ui: NASAGIBS.BlueMarble no longer listed in xyzservices

What Happened

The 0_introduction.ipynb notebook raises an AttributeError in the first cell when trying to import iceflow.ui. This appears to result from a call to ipyleaflet.basemaps.NASAGIBS.BlueMarble.

Investigation

Looking at https://xyzservices.readthedocs.io/en/stable/introduction.html under NASAGIBS services, it appears that there is no longer an entry for BlueMarble. BlueMarble is also not shown in https://xyzservices.readthedocs.io/en/stable/gallery.html.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
File [~/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py:44](http://localhost:8889/home/apbarret/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py#line=43), in Bunch.__getattr__(self, key)
     43 try:
---> 44     return self.__getitem__(key)
     45 except KeyError as err:

KeyError: 'BlueMarble'

The above exception was the direct cause of the following exception:

AttributeError                            Traceback (most recent call last)
Cell In[1], line 2
      1 # Importing IceFlow client library
----> 2 from iceflow.ui import IceFlowUI
      3 from iceflow.client import IceflowClient
      5 import earthaccess

File [~/src/NSIDC-Data-Tutorials/notebooks/iceflow/iceflow/ui.py:9](http://localhost:8889/lab/tree/iceflow/ui.py#line=8)
      6 from IPython.display import display, HTML
      7 from ipyleaflet import (Map, SearchControl, AwesomeIcon, GeoJSON,
      8                         Marker, DrawControl, LayersControl)
----> 9 from .layers import custom_layers, flight_layers, widget_projections
     10 from .client import IceflowClient
     13 class IceFlowUI:

File [~/src/NSIDC-Data-Tutorials/notebooks/iceflow/iceflow/layers.py:106](http://localhost:8889/lab/tree/iceflow/layers.py#line=105)
     64 north_3413 = {
     65     'name': 'EPSG:3413',
     66     'custom': True,
   (...)
     81     ]
     82 }
     84 south_3031 = {
     85     'name': 'EPSG:3031',
     86     'custom': True,
   (...)
    101     ]
    102 }
    104 widget_projections = {
    105     'global': {
--> 106         'base_map': basemaps.NASAGIBS.BlueMarble,
    107         'projection': projections.EPSG3857,
    108         'center': (30, -30),
    109         'zoom': 2,
    110         'max_zoom': 8
    111     },
    112     'north': {
    113         'base_map': basemaps.NASAGIBS.BlueMarble3413,
    114         'projection': north_3413,
    115         'center': (80, -50),
    116         'zoom': 1,
    117         'max_zoom': 4
    118     },
    119     'south': {
    120         'base_map': basemaps.NASAGIBS.BlueMarble3031,
    121         'projection': south_3031,
    122         'center': (-90, 0),
    123         'zoom': 1,
    124         'max_zoom': 4
    125     }
    126 }

File [~/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py:46](http://localhost:8889/home/apbarret/mambaforge/envs/nsidc-iceflow/lib/python3.9/site-packages/xyzservices/lib.py#line=45), in Bunch.__getattr__(self, key)
     44     return self.__getitem__(key)
     45 except KeyError as err:
---> 46     raise AttributeError(key) from err

AttributeError: BlueMarble

Include a `valkyrie.download()` type cell

We should include a description of how to download the data. I don't think h5py can read a remote file.

Would we have to use requests?

Another thought is can we leverage icepyx for this?

Error running `from iceflow.ui import IceFlowUI`

Error seems to be with the loading of the BlueMarble base layer.
This code generates the error:

from iceflow.ui import IceFlowUI

run from the Binder instance of the notebooks.

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
/srv/conda/envs/notebook/lib/python3.9/site-packages/xyzservices/lib.py in __getattr__(self, key)
     41         try:
---> 42             return self.__getitem__(key)
     43         except KeyError:

KeyError: 'BlueMarble'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_96/3719722181.py in <module>
----> 1 from iceflow.ui import IceFlowUI

~/notebooks/iceflow/iceflow/ui.py in <module>
      7 from ipyleaflet import (Map, SearchControl, AwesomeIcon, GeoJSON,
      8                         Marker, DrawControl, LayersControl)
----> 9 from .layers import custom_layers, flight_layers, widget_projections
     10 from .client import IceflowClient
     11 

~/notebooks/iceflow/iceflow/layers.py in <module>
    101 widget_projections = {
    102     'global': {
--> 103         'base_map': basemaps.NASAGIBS.BlueMarble,
    104         'projection': projections.EPSG3857,
    105         'center': (30, -30),

/srv/conda/envs/notebook/lib/python3.9/site-packages/xyzservices/lib.py in __getattr__(self, key)
     42             return self.__getitem__(key)
     43         except KeyError:
---> 44             raise AttributeError(key)
     45 
     46     def __dir__(self):

AttributeError: BlueMarble