Giter Site home page Giter Site logo

nrel / nsrdb Goto Github PK

View Code? Open in Web Editor NEW
5.0 6.0 0.0 846.14 MB

This repository contains all of the methods for the NSRDB data processing pipeline.

Home Page: https://nrel.github.io/nsrdb/

License: BSD 3-Clause "New" or "Revised" License

Shell 0.01% Python 92.44% C 3.58% Jupyter Notebook 1.30% Dockerfile 0.08% HTML 2.59%
machine-learning nrel solar-energy

nsrdb's Introduction

Welcome to the National Solar Radiation Data Base (NSRDB)!

This repository contains all of the methods for the NSRDB data processing pipeline. You can read more about the NSRDB here and here. For details on NSRDB variable units, datatypes, and attributes, see the NSRDB variable meta data.

The PXS All-Sky Irradiance Model

The PXS All-Sky Irradiance Model is the main physics package that calculates surface irradiance variables. The code base and additional documentation can be found here.

The NSRDB Data Model

The NSRDB Data Model is the data aggregation framework that sources, processes, and prepares data for input to All-Sky. The code base and additional documentation can be found here.

Installation

  1. Use conda (anaconda or miniconda with python 3.9) to create an nsrdb environment: conda create --name nsrdb python=3.9

  2. Activate your new conda env: conda activate nsrdb

  3. Follow the steps used in the pytest actions, described here.

    • These actions refer to the required repositories needed to run all tests and the commands which should be run from the local location of those repositories
    • If you plan to run without MLClouds the step associated with this repository can be skipped.
  4. Test your installation:

    1. Start ipython and test the following import: from nsrdb.data_model import DataModel
    2. Navigate to the tests/ directory and run the command: pytest
  5. If you are a developer, also run pre-commit install in the directory containing .pre-commit-config.yaml.

NSRDB Versions

NSRDB Verions History
Version Effective Date Data Years* Notes
1.0.0 2015 2005-2012

Initial release of PSM v1 (no FARMS)

  • Satellite Algorithm for Shortwave Radiation Budget (SASRAB) model
  • MMAC model for clear sky condition
  • The DNI for cloud scenes is then computed using the DISC model
2.0.0 2016 1998-2015

Initial release of PSM v2 (use of FARMS, downscaling of ancillary data introduced to account for elevation, NSRDB website distribution developed)

  • Clear sky: REST2, Cloudy sky: NREL FARMS model and DISC model
  • Climate Forecast System Reanalysis (CFSR) is used for ancillary data
  • Monthly 0.5º aerosol optical depth (AOD) for 1998-2014 using satellite and ground-based measurements. Monthly results interpolated to daily 4-km AOD data. Daily data calibrated using ground measurements to develop accurate AOD product.
3.0.0 2018 1998-2017

Initial release of PSM v3

  • Hourly AOD (1998-2016) from Modern-Era Retrospective analysis for Research and Applications Version 2 (MERRA2).
  • Snow-free Surface Albedo from MODIS (2001-2015), (MCD43GF CMG Gap-Filled Snow-Free Products from University of Massachusetts, Boston).
  • Snow cover from Integrated Multi-Sensor Snow and Ice Mapping System (IMS) daily snow cover product (National Snow and Ice Data Center).
  • GOES-East time-shift applied to cloud properties instead of solar radiation.
  • Modern-Era Retrospective analysis for Research and Applications, Version 2 (MERRA-2) is used for ancillary data (pressure, humidity, wind speed etc.)
3.0.1 2018 2017+ Moved from timeshift of radiation to timeshift of cloud properties.
3.0.2 2/25/2019 1998-2017 Air temperature data recomputed from MERRA2 with elevation correction
3.0.3 2/25/2019 1998-2017 Wind data recomputed to fix corrupted data in western extent
3.0.4 3/29/2019 1998-2017 Aerosol optical depth patched with physical range from 0 to 3.2
3.0.5 4/8/2019 1998-2017 Cloud pressure attributes and scale/offset fixed for 2016 and 2017
3.0.6 4/23/2019 1998-2017 Missing data for all cloud properties gap filled using heuristics method
3.1.0 9/23/2019 2018+ Complete refactor of NSRDB processing code for NSRDB 2018
3.1.1 12/5/2019 2018+, TMY/TDY/TGY-2018 Complete refactor of TMY processing code.
3.1.2 6/8/2020 2020 Added feature to adjust cloud coordinates based on solar position and shading geometry.
3.2.0 3/17/2021 2020 Enabled cloud solar shading coordinate adjustment by default, enabled MLClouds machine learning gap fill method for missing cloud properties (cloud fill flag #7)
3.2.1 1/12/2021 2021 Implemented an algorithm to re-map the parallax and shading corrected cloud coordinates to the nominal GOES coordinate system. This fixes the issue of PC cloud coordinates conflicting with clearsky coordinates. This also fixes the strange pattern that was found in the long term means generated from PC data.
3.2.2 2/25/2022 1998-2021 Implemented a model for snowy albedo as a function of temperature from MERRA2 based on the paper "A comparison of simulated and observed fluctuations in summertime Arctic surface albedo" by Becky Ross and John E. Walsh
3.2.3 4/13/23 None Fixed MERRA interpolation issue #51 and deprecated python 3.7/3.8. Added changes to accommodate pandas v2.0.0.
4.0.0 5/1/23 2022 Integrated new FARMS-DNI model.

Recommended Citation

Update with current version and DOI:

Grant Buster, Brandon Benton, Mike Bannister, Yu Xie, Aron Habte, Galen Maclaurin, Manajit Sengupta. National Solar Radiation Database (NSRDB). https://github.com/NREL/nsrdb (version v4.0.0), 2023. DOI: 10.5281/zenodo.10471523

Acknowledgments

This work (SWR-23-77) was authored by the National Renewable Energy Laboratory, operated by Alliance for Sustainable Energy, LLC, for the U.S. Department of Energy (DOE) under Contract No. DE-AC36-08GO28308. Funding provided by the DOE Grid Deployment Office (GDO), the DOE Advanced Scientific Computing Research (ASCR) program, the DOE Solar Energy Technologies Office (SETO), the DOE Wind Energy Technologies Office (WETO), the United States Agency for International Development (USAID), and the Laboratory Directed Research and Development (LDRD) program at the National Renewable Energy Laboratory. The research was performed using computational resources sponsored by the Department of Energy's Office of Energy Efficiency and Renewable Energy and located at the National Renewable Energy Laboratory. The views expressed in the article do not necessarily represent the views of the DOE or the U.S. Government. The U.S. Government retains and the publisher, by accepting the article for publication, acknowledges that the U.S. Government retains a nonexclusive, paid-up, irrevocable, worldwide license to publish or reproduce the published form of this work, or allow others to do so, for U.S. Government purposes.

*Note: The “Data Years” column shows which years of NSRDB data were updated at the time of version release. However, each NSRDB file should be checked for the version attribute, which should be a more accurate record of the actual data version.

nsrdb's People

Contributors

bnb32 avatar grantbuster avatar mikebannis avatar mrossol avatar xieyupku avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

nsrdb's Issues

Failed IMS data download for Albedo module

Bug Description
Ran the albedo module for 2022. The module failed with an ImsDataNotFound error. However, I downloaded the data directly through the ftp server and the module then proceeded successfully.

Screenshot 2023-02-06 173646

Charge code
SETP > -- 10304 - 71.01.01

Automatic modis download

Why this feature is necessary:
When running the albedo module it would be nice to have modis data automatically downloaded if it is available and to include verification checks. It seems that as of 2/7/22 modis data is not available later than 2017 but downloading directly from usgs will return files (just not complete ones).

Implement here

def _download():
, and integrate into modis workflow.

Charge code
SETP > -- 10304 - 71.01.01

Change AOD physical limits

Bug Description
AOD goes above the previous limit (1.5?) during fires. We should up the AOD limit to 10. Make sure we check the bitrange.

Add glob utility for data model filepath spec

Why this feature is necessary:
The file spec for cloud data and other data model inputs is pretty custom/hacky right now. It would be nice to be able to do source_pattern instead of source_directory and put in a linux style filepath with w*ldcards.

Charge code
None available right now and NSRDB is overspent.

Urgency / Timeframe
Not urgent but would be nice when we do more dev.

Remove pyhdf dependence

Bug Description
For the Puerto Rico project we need to dockerize NSRDB and you CANNOT install pyhdf from pypi (despite the presence of the package in pypi!).

As a fall back I've moved the pyhdf import into the MODIS class and removed it from the requirements.txt, that said HDF4 is a this point so archaic that we want to remove pyhdf entirely and instead pre-convert the files to HDF5 using this tool:

def convert_h4(path4, f_h4, path5, f_h5):

Charge code
Puerto Rico: WDHS 11822 00.01.10

NSRDB Meta Data NaN values and Dtypes

Why this feature is necessary:
NaN values in string columns should be converted to str("None"), Timezone should be integer, Elevation should be float32 or int32. These fixes will make export to HSDS easier.

Data model output files to temp scratch then move

If the data model crashes while writing a dataset output the file could look complete but would not have all the actual data. Better practice is to write the file to temp scratch then shutil copy the file to the output dir.

additonal flags for cli

Might be nice to have flags just for pc, shading, and remap_pc instead of having to specify files also for when running python -m nsrdb.cli direct data-model?

Would need to add flags to cli.py and pass them to the data-model routine.

Not urgent at all though.

Pandas v1 compatibility

Can the NSRDB requirements be relaxed to allow for any pandas version? reV requires pandas>1 but NSRDB has previously only wanted pandas<1. Pytest pass on eagle with pandas v1.2.1. To close this, lets test pandas 1.2.1 using the nsrdb pipeline (next time we run it) then relax our nsrdb requirements file.

NSRDB pipeline single job debug

Why this feature is necessary:
It would be super nice to be able to submit an NSRDB pipeline job but only actually submit 1 out of the 365 jobs to slurm so you can debug on a single job instead of a full production run. The feature would still have to enter the other jobs into the status file but only a single job would actually get submitted on the hpc.

Current alternative is probably to submit manually from python.

Pull 5 year average aerosols for Puerto Rico data

Why this feature is necessary:
Need "climatological" average as a backup for aerosol data.

A possible solution is:
Pull average aerosols from MERRA2, downscale to 5min/2km Puerto Rico

Charge code
WDHS 11822 00.01.10

Urgency / Timeframe
ASAP

Add docker file and docker compose for lambda

Why this feature is necessary:
Need a docker container for lambda

Need a container to be lambda compliant:
https://docs.aws.amazon.com/lambda/latest/dg/lambda-images.html

The container must have a handler wrapper as the entry point:
https://docs.aws.amazon.com/lambda/latest/dg/python-handler.html

Example handler files:
https://github.com/awsdocs/aws-doc-sdk-examples/blob/master/python/example_code/lambda/boto_client_examples/lambda_handler_scheduled.py

Charge code
Puerto Rico

Urgency / Timeframe
ASAP

Sanity check nsrdb time series frequency

Using a frequency of "10m" instead of "10min" results in the error ValueError: zero-size array to reduction operation maximum which has no identity due using a 10 month frequency for interpolation. Possible verify that "h", "min", or "t" is in the freq string?

"Shark's teeth" artifacts in composite albedo data

Bug Description
The current system for mapping IMS snow data to the MODIS snow free albedo to create a composite albedo has a flaw that sometimes results in the erroneous assignment of "snowy" or "dry" along snow/no-snow boundaries. These artifacts resemble triangles or shark's teeth and are most noticeable above roughly 60N latitude.

A nearest-neighbor approach is used to find the IMS data that is closest to a given MODIS raster cell. A consequence of the polar projection is each raster cells has completely unique lat and longs, requiring a large KD tree to map IMS data to the MODIS data, which is stored as a more common WGS-84 based raster format. Making a KD tree from the full IMS data set is impractical from both a memory and time perspective. Fortunately, for the purposes of making the composite albedo, we don't care which IMS point is nearest but only if it's snowy or dry. Further, there are typically large snowy and dry areas in the IMS data. We only need to know if a modis point is in a snowy or dry area. The existence of large snowy and dry areas can be used to dramatically reduce the number of IMS points needed to classify any MODIS cell as snowy or dry.

The IMS snow data is saved in a polar projection as shown below from day 190 of 1998. The north pole is in the center of the image, with the Himalayas towards the top in yellow, and the Alaskan and Canadian pacific coast towards the bottom left. Yellow cells are snow, orange is sea ice, and blue is snow and ice free. For the NSRDB, snow and sea ice are considered equivalent.
image

Below is the same image zoomed and centered on Greenland.
image

The scipy.ndimage.morphology.binary_dilation() function is used to find the boundaries of the snow and dry regions. Example output is shown below, blue indicates the boundary.
image

The edge layer indicates the boundary between snowy and dry, but does not provide enough information to classify an arbitrary location as snowy or dry. To include more information, the edge is buffered using the binary_dilation() function. This expands the edge from one pixel wide to three as shown below, yellow indicates the boundary pixels (please ignore the change in color scheme).
image

In theory, the buffered boundary should contain enough information to accurately classify any location as snowy or dry. Viewing the composite data shows a number of "shark's teeth" at higher latitudes as shown in the image below of Greenland. White indicates snow from the IMS data. Black or gray areas are considered dry and show the higher resolution data from MODIS. Examples of shark's teeth are circled.
image

This occurs due to the mismatch between the two projections. When "unwrapped", it becomes apparent that the IMS cell density near the north pole is much less than that near the equator. The image below shows the composite albedo overlayed with the IMS points, with shape and color indicating snow, sea ice, or dry. Note the low point density near the north pole.
image

The image below shows the selected IMS points in the polar projection after the buffer. Yellow are snowy points, maroon are dry points, and blue points are considered unnecessary for building the KD tree. Looking at the selected points in a polar projection is appears there is enough information to define arbitrary point as snowy or dry.
image

The below image shows the selected points unwrapped to WGS 84 and overlaid on the composite albedo. Note that at locations of the shark's teeth there appear to be missing yellow IMS snow dots, effectively "exposing" the pink dry dots, and creating the shark's teeth. While the selected points for the KD tree appears adequate in the polar projection, when converted to WGS 84 it is apparent that more points are needed to accurately define the snow/no-snow boundary. This is generally only a probably at higher latitudes. At lower latitudes the density and general orientation of points is more consistent between the two projections.
image

pyhdf

@mikebannis I can't seem to install pyhdf which will make pip installing NSRDB a pain. Any way we can remove pyhdf and instead access modis data by:

  1. converting from hdf4 to hdf5 using this tool:
    def convert_h4(path4, f_h4, path5, f_h5):
  2. access using h5py which is already a dependency?

MERRA Data Interpolation Bug

Bug Description
Several ancillary variables are retrieved and interpolated from MERRA on a daily basis. Data at the end of the day is forward-filled resulting in some timesteps with constant values. Correct behavior would be to either linearly extrapolate or (ideally) retrieve the MERRA timestep around the current data to interpolate to.

Screenshots
MicrosoftTeams-image

Charge code
SETP 10304 71.01.01

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.