Giter Site home page Giter Site logo

arcticsnow / topopyscale Goto Github PK

View Code? Open in Web Editor NEW
34.0 6.0 6.0 14.55 MB

TopoPyScale: a Python library to perform simplistic climate downscaling at the hillslope scale

Home Page: https://topopyscale.readthedocs.io

License: MIT License

Python 95.30% TeX 4.70%
science climate downscaling geoscience climate-science clustering dem era5 timeseries xarray

topopyscale's Introduction

DOI badge

DOI GitHub license GitHub release (latest by date) Test

Binder Notebooks Examples: Binder

TopoPyScale

Python version of Toposcale packaged as a Pypi library. Toposcale is an original idea of Joel Fiddes to perform topography-based downscaling of climate data to the hillslope scale.

Documentation avalaible: https://topopyscale.readthedocs.io

References:

And the original method it relies on:

  • Fiddes, J. and Gruber, S.: TopoSCALE v.1.0: downscaling gridded climate data in complex terrain, Geosci. Model Dev., 7, 387–405, https://doi.org/10.5194/gmd-7-387-2014, 2014.
  • Fiddes, J. and Gruber, S.: TopoSUB: a tool for efficient large area numerical modelling in complex topography at sub-grid scales, Geosci. Model Dev., 5, 1245–1257, https://doi.org/10.5194/gmd-5-1245-2012, 2012.

Kristoffer Aalstad has a Matlab implementation: https://github.com/krisaalstad/TopoLAB

Contribution Workflow

Please follow these simple rules:

  1. a bug -> fix it!
  2. an idea or a bug you cannot fix? -> create a new issue if none doesn't already exist. If one exist, then add material to it.
  3. wanna develop a new feature/idea? -> create a new branch. Go wild. Merge with main branch when accomplished.
  4. Create release version when significant improvements and bug fixes have been done. Coordinate with others on Discussions

Create a new release: Follow procedure and conventions described in: https://www.youtube.com/watch?v=Ob9llA_QhQY

Our forum is now on Github Discussions. Come visit!

Design

  1. Inputs
    • Climate data from reanalysis (ERA5, etc)
    • Climate data from future projections (CORDEX) (TBD)
    • DEM from local source, or fetch from public repository: SRTM, ArcticDEM, ASTER
  2. Run TopoScale
    • compute derived values (from DEM)
    • toposcale (k-mean clustering)
    • interpolation (bilinear, inverse square dist.)
  3. Output
    • Cryogrid format
    • FSM format
    • CROCUS format
    • Snowmodel format
    • basic netcfd
    • For each method, have the choice to output either the abstract cluster points, or the gridded product after interpolation
  4. Validation toolset
    • validation to local observation timeseries
    • plotting
  5. Gap filling algorithm
    • random forest temporal gap filling (TBD)

Validation (4) and Gap filling (4) are future implementation.

Installation

We have now added an environments.yml file to handle versions of depencencies that are tested with the current codebase, to use this run:

conda env create -f environment.yml

Alternatively you can follow this method for dependencies (to be deprecated):

conda create -n downscaling python=3.9 ipython
conda activate downscaling

# Recomended way to install dependencies:
conda install -c conda-forge xarray matplotlib scikit-learn pandas numpy netcdf4 h5netcdf rasterio pyproj dask rioxarray

Then install the code:

# OPTION 1 (Pypi release):
pip install TopoPyScale

# OPTION 2 (development):
cd github  # navigate to where you want to clone TopoPyScale
git clone [email protected]:ArcticSnow/TopoPyScale.git
pip install -e TopoPyScale    #install a development version

#----------------------------------------------------------
#            OPTIONAL: if using jupyter lab
# add this new Python kernel to your jupyter lab PATH
python -m ipykernel install --user --name downscaling

# Tool for generating documentation from code docstring
pip install lazydocs

Then you need to setup your cdsapi with the Copernicus API key system. Follow this tutorial after creating an account with Copernicus. On Linux, create a file nano ~/.cdsapirc with inside:

url: https://cds.climate.copernicus.eu/api/v2
key: {uid}:{api-key}

Basic usage

  1. Setup your Python environment
  2. Create your project directory
  3. Configure the file config.ini to fit your problem (see config.yml for an example)
  4. Run TopoPyScale
import pandas as pd
from TopoPyScale import topoclass as tc
from matplotlib import pyplot as plt

# ========= STEP 1 ==========
# Load Configuration
config_file = './config.yml'
mp = tc.Topoclass(config_file)
# Compute parameters of the DEM (slope, aspect, sky view factor)
mp.compute_dem_param()

# ========== STEP 2 ===========
# Extract DEM parameters for points of interest (centroids or physical points)

mp.extract_topo_param()

# ----- Option 1:
# Compute clustering of the input DEM and extract cluster centroids
#mp.extract_dem_cluster_param()
# plot clusters
#mp.toposub.plot_clusters_map()
# plot sky view factor
#mp.toposub.plot_clusters_map(var='svf', cmap=plt.cm.viridis)

# ------ Option 2:
# inidicate in the config file the .csv file containing a list of point coordinates (!!! must same coordinate system as DEM !!!)
#mp.extract_pts_param(method='linear',index_col=0)

# ========= STEP 3 ==========
# compute solar geometry and horizon angles
mp.compute_solar_geometry()
mp.compute_horizon()

# ========= STEP 4 ==========
# Perform the downscaling
mp.downscale_climate()

# ========= STEP 5 ==========
# explore the downscaled dataset. For instance the temperature difference between each point and the first one
(mp.downscaled_pts.t-mp.downscaled_pts.t.isel(point_id=0)).plot()
plt.show()

# ========= STEP 6 ==========
# Export output to desired format
mp.to_netcdf()

TopoClass will create a file structure in the project folder (see below). TopoPyScale assumes you have a DEM in GeoTiFF, and a set of climate data in netcdf (following ERA5 variable conventions). TopoPyScale can easier segment the DEM using clustering (e.g. K-mean), or a list of predefined point coordinates in pts_list.csv can be provided. Make sure all parameters in config.ini are correct.

my_project/
    ├── inputs/
        ├── dem/ 
            ├── my_dem.tif
            └── pts_list.csv  (optional)
        └── climate/
            ├── PLEV*.nc
            └── SURF*.nc
    ├── outputs/
    └── config.ini

topopyscale's People

Contributors

arcticsnow avatar ealonsogzl avatar hugoledoux avatar joelfiddes avatar krisaalstad avatar nilick avatar paswyss avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

topopyscale's Issues

`topo_scale.py` skip for loop

write a new version of the code to avoid the for loop, and use the full potential of xarray. Add a new dimension called point_id along which to do the interpolation.

drop setting lat/lon extent from config file

Modify the code that we do not need to indicate the lat/lon extent to download. Instead grab directly the extent of the DEM. Why indicating twice something that is readily available in the DEM!!!

Shortwave partitiong bug

  1. Kt is unconstrained to interval 0-1
  2. Kd was temporally constant (np.max term) - should be vector of same length as time.
  3. Inverted logic in illumination angle: down_pt['cos_illumination'] = down_pt.cos_illumination_tmp * (down_pt.cos_illumination_tmp < 0) # remove selfdowing ccuring when |Solar.azi - aspect| > 90

unable to read some netcdf files with h5ncdf option

commit: ebf380c#diff-14a14030423355c28b612bef60be37e718aeed0e301499661d91f8f84a78c46f

ds_plev = xr.open_mfdataset(path_forcing + 'PLEV*.nc', engine='h5netcdf',parallel=True).sel(time=slice(start_date, end_date))
ds_surf = xr.open_mfdataset(path_forcing + 'SURF*.nc', engine='h5netcdf', parallel=True).sel(time=slice(start_date, end_date))

Option engine='h5netcdf' causes error in opening netcdf.

From docu:
https://xarray.pydata.org/en/stable/generated/xarray.open_mfdataset.html

engine ({"netcdf4", "scipy", "pydap", "h5netcdf", "pynio", "cfgrib", "zarr"}, optional) – Engine to use when reading files. If not provided, the default engine is chosen based on available dependencies, with a preference for “netcdf4”.

engine="netcdf4" is probably most useful for era5 data?
Or add an option to config if there are other use cases?

Wind: implement Liston and Elder method

Wind can be corrected using a simplistic correction method described in Liston and Elder (2006).
This can be implemented as a separate/optional method.
See Appendice C in Fiddes and Gruber (2014) for algorithm structure

#toposcale: How to deal with situation where the lowest pressure level

In the case the elevation of the point is lower than the geopotential elevation of the lowest pressure level (1000hPa for ERA5), toposcale currently breaks down.

Strategies:

  1. skip the point and return NaNs
  2. take values from closest pressure level without interpolation

Implementation

  • see line 134 of topo_scale.py

topo_scale.py downscaling fails on last itersation of SW/LW section

config:

split:
    IO: False       # Flag to split downscaling in time or not
    time: 2         # number of years to split timeline in
    space: None     # NOT IMPLEMENTED

fails at this line (l.327):

horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=31)), method='nearest')

with:

IndexError: Index is not smaller than dimension 31 >= 31

Explanation:

The object ds_solar.azimuth is:

Out[2]: 
<xarray.DataArray 'azimuth' (point_id: 31, time: 50376)>
dask.array<open_dataset-6baaeed24838583c892469b5b4861c12azimuth, shape=(31, 50376), dtype=float32, chunksize=(31, 50376), chunktype=numpy.ndarray>
Coordinates:
  * point_id  (point_id) int64 1 2 3 4 5 6 7 8 9 ... 23 24 25 26 27 28 29 30 31
  * time      (time) datetime64[ns] 2016-09-01 ... 2022-05-31T23:00:00

point_id = 1:31

the iterator here is the issue (l.266):

for i, row in df_centroids.iterrows():

makes values 1:31 SHOULD be 0:30

this works:
ds_solar.azimuth.isel(point_id=0)

this causes fail:
ds_solar.azimuth.isel(point_id=31)

More info:

        print('Downscaling LW, SW for point: {} out of {}'.format(pt_id+1,
                                                                  df_centroids.point_id.max()+1))

this print statement assumes a python index as adds one. for this example print statemnts give iterrations 2:32, should be 1:31.

I guess this is connected to the last changes of making downscaling split in time?

Solution:

add -1 here:

horizon = horizon_da.sel(x=row.x, y=row.y, azimuth=np.rad2deg(ds_solar.azimuth.isel(point_id=pt_id-1)), method='nearest')

All other incidences of pt_id are correct with current indexing.

removed +1 here:

        print('Downscaling LW, SW for point: {} out of {}'.format(pt_id,
                                                                  df_centroids.point_id.max()))

to get correct index in print statement

rewrite `topo_scale.py` using barebone Dask

The topo_scale.py routine currently works well for small project, but quickly runs into trouble when used for larger projects. The internal computation of topo_scale could be improved and re-imprelemented using pure Dask syntax in order to limit the intermediate computation and memory usage.

TODO:

  • figure Dask usage
  • seek potential Dash user
  • identify bottleneck of current method

This will justify a new release version

Intra-comparison toposcale output

First comparison of this version with previous ones.

Temperature -> good but a biais of few degrees
SW -> good, but presence of an artefact in topopyscale close to 0
LW -> complete wack in topopyscale
ws -> look good
q -> good
precip -> ??
pressure -> ?? (not avail in Kris's file)

LW and SW, no contribution from surrounding terrain

Currently neither LW nor SW has a component coming from the surrunding terrain. In the case of snow I/O type of place, this radiation budget can be greatly affected.

idea: Terrain exposure of one pixel could be simplified by being a half sphere - sky view factor

Documentation

Documentation

  1. which format (RTH or other)
  2. how to include?
  3. list config variable possibilities with compatibilities
  4. example codes with TopoPyScale_examples
  5. how to prepare DEM

Sensitivity Analysis

There is a number of sensitivity analysis to perform:

  1. DEM resolution. How far does it make sense to push the DEM resolution. Does toposcale hit a plateau, stop adding values? Could there be an optimal resolution (given a certain landscape?)
  2. DEM segmentation sensitivity to the random seeding, and clustering parameters

extract_topo_param() from Topoclass fails, when running with the version in current main branch.

execution:
mp = tc.Topoclass(config_file)
mp.compute_dem_param()
mp.extract_topo_param()

-> Same would work correctly with environment with release version of topopyscale in it.

Terminal output
Traceback (most recent call last):
File "pipeline.py", line 9, in
mp.extract_topo_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 308, in extract_topo_param
self.extract_topo_cluster_param()
File ".../TopoPyScale/TopoPyScale/topoclass.py", line 261, in extract_topo_cluster_param
df_scaled, self.toposub.scaler = ts.scale_df(df_param, features=self.config.sampling.toposub.clustering_features)
File ".../TopoPyScale/TopoPyScale/topo_sub.py", line 56, in scale_df
df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 859, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
return self.partial_fit(X, y, sample_weight)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
X = self._validate_data(
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/base.py", line 546, in _validate_data
X = check_array(X, input_name="X", **check_params)
File "...miniconda3/envs/TopoPyScaleClone/lib/python3.8/site-packages/sklearn/utils/validation.py", line 940, in check_array
raise ValueError(

Precipitation totals wrong at timesteps higher than 1h

precip in era5 is still hourly sums (m) even at timesteps more than 1h - we are just simply missing precip budget. eg 6h data could be:

0, 0, 0.002, 0.002 (m)

this doesnt mean daily sum of 0.004m but more likely 0.004*4 = 0.016m

here is the bug in topo_scale.py:

down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr

do not divide by timestep.

Furthermore, a rough way to get the budget right is to multiply by timestep.

Current code produces precip budgets around 9 times too low for a 3h timestep - twice an error of factor three.

Frequency of precipitation

Problem: precipitation frequency too high

Probable reason: comes from the fact ( I think) that we do a 2d interpolation which will lead to more non zero events (even if very small) than just a single gridbox would contain. This can of course cause problems in terms of frequency related processes.

Solution:
(1) filter small precip events out (lose precip, conserve frequency)
(2) allocate small precip amounts to larger wet days (conserve annual total and frequency) - take frequency from the gridbox? Just apply a simple threshold of 1mm/day
(3) others?

image

Hi Joel,

thanks a lot for the data!

Quick analysis - toposcale temperature looks better than original era5, R^2 vs AWS measurements is respectively 0.87 and 0.81.

For precipitation, the toposcale predicts at least some precipitation 4.5x more frequently compared to era5 - 82 % of the hourly timesteps have some precipitation, vs 18 % for era5 (see attached plot). At daily aggregation (used by the mass balance model) toposcale has some precipitation on 96 % of days, vs 46 % for era5. This is not a problem for the model, but I'm just curious - have you observed a similar pattern at other locations?

Cheers,

Enrico

Feature Cryogrid for flat areas

Do simple horizontal interpolation of surface ERA5 and no vertical interp for large flat areas. This is relevant in places where there close to no discrepancy in between ERA5 topography and the actual topography.

export to MuSa

Add an export function to_musa()
Netcdf should contain:

  • variables: RH, Tair, Precip, WS
  • cluster mapping

Check all accumulated variables TP,ISWR, ILWR

In era5 accumulated variables treatment changed from accumulated since forecast start to accumulated over last timestep. As I understood it a 3h step precip value is still precip over 1h. To be honest docs confuse the hell out of me so we should double check this once eg:

https://confluence.ecmwf.int/pages/viewpage.action?pageId=197702790
https://confluence.ecmwf.int/pages/viewpage.action?pageId=74764925

Diffenernt issue buyt interesting about "rain bombs" in era5:
https://confluence.ecmwf.int/display/CUSF/ERA5+versus+ERA-Interim+Total+Precipitation

Example issue:

Topo_scale.py l.169
down_pt['tp'] = surf_interp.tp * 1 / tstep_dict.get(tstep) * 10**3 # Convert to mm/hr

should be, I think:
down_pt['tp'] = surf_interp.tp * 1 / 1* 10**3 # Convert to mm/hr

or just:
down_pt['tp'] = surf_interp.tp * 10**3 # Convert m/hr to mm/hr

Precip distribution in Toposub

Not really a bug as behaviour is I think as expected. Map below is monthly precip total over davos with range of c. 90-130mm. I have applied an elevation lapse rate. The distribution of precip is correct (mainly from NW). SE is drier. Tops are wetter valleys are drier. What I think is up though is that there is way too much slope detail. Aspect seems to be comming through here. I would expect to see mainly an elevation gradient with also the NW to SE Precip gradient. I think this is because elev and aspect are weighted the same in toposub. I guess for a variable like precip we just want elevation and x y as dimensions of variability. I think in my Toposub elevation was weighted much higher so if I have few samples the main variability is elevation. I had a weighting factor for each dimension before it went into Kmeans.
image

export netcdf function from Topoclass

When no variables defined, the to_netcdf() function of the Topoclass (not topoexport!) raises an error.

Suggestion how to fix it:
add the follow at the beginning of the function:

if variables is None:
    variables = list(self.downscaled_pts.keys())

This initializes the variables when not defined to all (which is described as default in the function description.)

Another option would be to remove the sub-setting of 'self.downscaled_pts[variables]' in the te.to_netcdf call and leave it to the topoexport class, which handles it already. But this I would not recommend, since it is prone to errors (when the function changes for example).

improve logic to check if forcing file exist

Try to remove the o.is_file() in the lambda line 71. The current implementatino checking if the file exist is slow when looping over more than 30 files.

One option is to load file name with glob and compare string instead

Error when getting to topo parameters extraction stage

Hi, I'm currently reviewing the software for JOSS here, and have got through the install ok, but am running into problems when getting to the extraction of topographic metrics stage. I'm not entirely sure from the error output message where exactly it is falling over. I'm using the Norway ex1 example dataset.

2023-02-21 21:56:22,639 INFO Download rate 2.5M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202107.nc complete
2023-02-21 22:09:21,200 INFO Request is completed
2023-02-21 22:09:21,200 INFO Downloading https://download-0008-clone.copernicus-climate.eu/cache-compute-0008/cache/data7/adaptor.mars.internal-1677017196.2052243-11103-8-173d15ea-9bf4-42a7-9dd6-1ff637b84745.nc to [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc (2M)
2023-02-21 22:09:21,862 INFO Download rate 3.1M/s
[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/inputs/climate/PLEV_202108.nc complete

---> Extracting DEM parameters (slope, aspect, svf)
Computing slope and aspect ...
Computing svf ...
---> File [USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/outputs/ds_param.nc saved
---> Scaling data prior to clustering
Traceback (most recent call last):
  File "[USER_DIRS]/TopoPyScale_examples/ex1_norway_finse/pipeline.py", line 9, in <module>
    mp.extract_topo_param()
  File "[USER_DIRS]/TopoPyScale/TopoPyScale/topoclass.py", line 321, in extract_topo_param
    self.extract_topo_cluster_param()
  File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topoclass.py", line 269, in extract_topo_cluster_param
    df_scaled, self.toposub.scaler = ts.scale_df(df_param,
  File "[USER_DIRS]/dev/TopoPyScale/TopoPyScale/topo_sub.py", line 57, in scale_df
    df_scaled = pd.DataFrame(scaler.fit_transform(df_param[feature_list].values),
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/_set_output.py", line 142, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 859, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 824, in fit
    return self.partial_fit(X, y, sample_weight)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/preprocessing/_data.py", line 861, in partial_fit
    X = self._validate_data(
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/base.py", line 546, in _validate_data
    X = check_array(X, input_name="X", **check_params)
  File "[USER_DIRS]/miniconda3/envs/downscaling/lib/python3.9/site-packages/sklearn/utils/validation.py", line 940, in check_array
    raise ValueError(
ValueError: Found array with 0 feature(s) (shape=(40512, 0)) while a minimum of 1 is required by StandardScaler.

partition_snow() is provising negative value for rain

Check code of partition_snow() function as the rain fraction provides some negative values.

def partition_snow(precip, temp, tair_low_thresh=272.15, tair_high_thresh=274.15):

The current implementation follows the one in Cryogrid. there might be a bug there too:
https://github.com/CryoGrid/CryoGrid/blob/1436176a76ccdec9d9f0ddf34f153ffbc27ad51f/source/IO/FORCING/FORCING_slope_seb_readNc.m#L111

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.