sdtaylor / grasslandmodels Goto Github PK

View Code? Open in Web Editor NEW

0.0 0.0 0.0 3.37 MB

License: MIT License

Python 100.00%

grasslandmodels's People

Contributors

Watchers

grasslandmodels's Issues

fit using LEAP global optimization method instead of scipy.optimize

rumor has it LEAP talks nicely with dask

https://github.com/AureumChaos/LEAP

yep they totally do it

have metadata within models

Basically have an open ended dictionary to store whatever in. Will allow for things like fitting params, data, etc.

Add to Base() init

self.metadata = {}

Have an update metadata so things aren't overwritten

def update_metadata(self, **kwargs):
    self.metadata.update(kwargs)

Make sure it's saved by adding it to get_model_info

def _get_model_info(self):
        return {'model_name': type(self).__name__,
                    'parameters': self._fitted_params,
                    'metadata': self.metadata}

Read it in from saved files in load_model_parameters


    else:
        # For all other ones just need to pass the parameters
        Model = load_model(model_info['model_name'])
        model = Model(parameters=model_info['parameters'])
        model.metadata = model_info['metadata']

create test for numerous dimensions on prediction

need to make sure the phenograss numpy methods can handle any number of dimensions as long as time is the first one, which is how its designed.

Copy the test data onto several other dimensions and make sure the predictions match along them.

loading pre-fit models

need a load_fitted_model util to load specific parameter sets.

ie. load_fitted_model('phenograss-original') for the Hufkens 2016 parameters
load_fitted_model('CholerPR1-original')

or maybe load_prefit_model

add Vmin, Vmax to NaiveMapCorrected

Otherwise they extrapolate to large f values

speed optimiziation

example outlined here looks very promising with cython

https://ipython-books.github.io/56-optimizing-cython-code-by-writing-less-python-and-more-c/

remove Wcap from Choler PR1

Soil water holding capacity is not actually used in the PR1 model, but it is used in PR2 and PR3.

set default Vmax to 0.9999

If it's set to 1 then V will potentially never decrease due to the senesence part of eq. 2 cancelling out

d * b3 * V[i] * (1-V[i])

Will only happen if V reaches 1.0 to begin with, which is a huge outlier but still possible.

make phenograss b1 correct

see #2

Need to drop the phenograss parameter b4 and make b1 actually be used instead of it just being set to Wp.

Maybe keep the original phenograss intact so it matches the phenograss.f90 from the original paper? And make this adjustment just to PhenoGrassNDVI

take MAP and h out of phenograss_cython

They aren't needed as the scaling associated with these is done on the normal python end.

Generate paper figure

The python code generates a csv with both the modelled and observed GCC. The R code recreates the timeseries and R2 figures from Hufkins 2016

Python:

from GrasslandModels import models, utils
import numpy as np

GCC, predictor_vars = utils.load_test_data()

original_phenograss_params = {'b1':124.502121,
                           'b2':0.00227958267,
                           'b3':0.0755224228,
                           'b4':0.519348383,
                           'L':2.4991734,
                           'Phmin':8.14994431,
                           'h': 222.205673,
                           'Topt':33.3597641,
                           'Phmax':37.2918091}

m = models.PhenoGrass(parameters='original_phenograss_params')
prediction = m.predict(predictor_vars)

available_sites = ['freemangrass_grass',
                       'ibp_grassland',
                       'kansas_grassland',
                       'lethbridge_grassland',
                       'marena_canopy',
                       'vaira_grass']

# put the modelled GCC back into site files
import pandas as pd

all_site_data = pd.read_csv('GrasslandModels/data/site_data.csv.gz')
all_site_data['modelled_gcc'] = np.nan

for site_i, site_name in enumerate(available_sites):
    all_site_data.loc[all_site_data.Site == site_name,'modelled_gcc'] = prediction[:,site_i]
    
    
all_site_data.to_csv('phenograss_test_run_cython.csv', index=False)

R code

library(tidyverse)


phenograss_output = read_csv('~/projects/GrasslandModels/phenograss_test_run_cython.csv',
                             col_types = cols(gcc=col_double()))

phenograss_output$Site = factor(phenograss_output$Site, 
                                levels = c("marena_canopy","freemangrass_grass","kansas_grassland","vaira_grass","lethbridge_grassland","ibp_grassland"),
                                labels = c('Marena',        'Freemangrass',     'Kansas',          'Vaira',      'Lethbridge',          'IBP'))

# Timeseries plots
phenograss_output %>%
  filter(year>=2012) %>%
  select(date, phenocam_gcc = gcc, modelled_gcc, year, site=Site) %>%
 # gather(gcc_source, gcc_value, phenocam_gcc, modelled_gcc) %>%
  ggplot(aes(x=date)) + 
  geom_point(aes(y=phenocam_gcc), color='grey40') + 
  geom_line(aes(y=modelled_gcc), color='red', size=1) +
  facet_wrap(~site, ncol=1)

# R2 plots

phenograss_output %>%
  filter(year>=2012) %>%
  select(date, phenocam_gcc = gcc, modelled_gcc, year, site=Site) %>%
  # gather(gcc_source, gcc_value, phenocam_gcc, modelled_gcc) %>%
  ggplot(aes(x=phenocam_gcc, y=modelled_gcc)) + 
  geom_point() + 
  geom_abline(intercept = 0, slope=1) + 
  facet_wrap(~site, ncol=1, scales='free')

declare phenograss defaults only once

right now they're declared 3 times, twice in phenograss.py and again in phenograss_cython.pyx

spin up option and state variable saving

Would be nice to have a spin option to save initial conditions. that way the entire timeseries doesn't have to be run each model iterations

on the other hand the entire timeseries would need to be run with each set of parameters anyway....

check that no extra predictor args are passed to fit

causes very confusing fitting errors if ,ie. Tm, is used if Tm isn't used in the model

need phenograss model for NDVI

Just need to take out the scaling stuff I think.

have return=='all' inside base predict

Need to be able to get all state variables out of a model like W and Dt.

Hopefully someway to do it automatically.

maybe

model.state_variables = ['Dt','W','V'] etc.

fitting with dask

The following wrapper can be used in the optimize arguments to fit using whatever dask distributed array is setup.

from dask.distributed import Client

client = Client()

def dask_scipy_mapper(func, iterable, c=client):
    futures = c.map(func, iterable)
    return [f.result() for f in futures]

de_fitting_params = {'maxiter':5,
                                  'popsize':10,
                                  'mutation':(0.5,1),
                                  'recombination':0.25,
                                  'workers': dask_scipy_mapper,
                                  'disp':True}
    
m = models.PhenoGrass()    
m.fit(GCC, predictor_vars, 
        optimizer_params=de_fitting_params, 
        debug = True)

The workers argument in differential evolution (and other optimize functions) can take a map-like callable in the form map(func, iterable) and expects a list of results back (since scipy 1.2.0).

https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.differential_evolution.html

more robust predictor validation

need better validation of the shapes for when multiple sites are being used. ie. if site level predictors are shape (2,), then the timeseries ones need to be shape (xxx, 2).

also potentially check that axis 0 of timeseries predictors are actually the timeseries (generally should be the longest, but not always)

phenograss model things to clarify

Eq. 9 not represented anywhere in phenograss.f90 code

The phenograss.f90 b1 is actually set, and used for, Wp, while the phenograss.f90 params b2,b3,b4 correspond to the paper b1,b2,b3, respectively.

save scipy log output somewhere

like whether it completed from max iterations of finding an optimal solution

Models within choler2010.py and choler2011.py were copy pasted from the original phenograss model (they're all really similar) and then adjusted. Thus there is a lot of commented out code that should be removed.

drop all unused phenograss args

Things like m and Sd which were present in the example fortran code but not actually used. Also things like d which, unlike fortran, don't need to be declared before their assigned.

take the nan out of metric names

they're fine as just rmse instead of nan_rmse

fix warnings

/home/shawn/miniconda3/lib/python3.7/site-packages/scipy/fft/__init__.py:97
  /home/shawn/miniconda3/lib/python3.7/site-packages/scipy/fft/__init__.py:97: DeprecationWarning: The module numpy.dual is deprecated.  Instead of using dual, use the functions directly from numpy or scipy.
    from numpy.dual import register_func

/home/shawn/miniconda3/lib/python3.7/site-packages/scipy/sparse/sputils.py:17: 15 tests with warnings
  /home/shawn/miniconda3/lib/python3.7/site-packages/scipy/sparse/sputils.py:17: DeprecationWarning: `np.typeDict` is a deprecated alias for `np.sctypeDict`.
    supported_dtypes = [np.typeDict[x] for x in supported_dtypes]

/home/shawn/miniconda3/lib/python3.7/site-packages/scipy/special/orthogonal.py:81
/home/shawn/miniconda3/lib/python3.7/site-packages/scipy/special/orthogonal.py:81
  /home/shawn/miniconda3/lib/python3.7/site-packages/scipy/special/orthogonal.py:81: DeprecationWarning: `np.int` is a deprecated alias for the builtin `int`. To silence this warning, use `int` by itself. Doing this will not modify any behavior and is safe. When replacing `np.int`, you may wish to use e.g. `np.int64` or `np.int32` to specify the precision. If you wish to review your current use, check the release note link for additional information.
  Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
    from numpy import (exp, inf, pi, sqrt, floor, sin, cos, around, int,

/home/shawn/miniconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py:339: 37 tests with warnings
test/test_core_models.py: 2 tests with warnings
  /home/shawn/miniconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py:339: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
    task_str = task.tostring()

/home/shawn/miniconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py:360: 16 tests with warnings
test/test_core_models.py: 1 test with warning
  /home/shawn/miniconda3/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py:360: DeprecationWarning: tostring() is deprecated. Use tobytes() instead.
    task_str = task.tostring().strip(b'\x00').strip()

test/test_core_models.py::test_internal_broadcasting[PhenoGrass-fitted_model0]
test/test_core_models.py::test_phenograss_internal_methods
  /home/shawn/projects/GrasslandModels/GrasslandModels/models/phenograss.py:325: RuntimeWarning: invalid value encountered in power
    g[:] = ((Tmax - Tm[i]) / (Tmax - Topt)) * (((Tm[i] - Tmin) / (Topt - Tmin)) ** (Topt/(Tmax-Topt)))

test/test_core_models.py::test_internal_broadcasting[PhenoGrassNDVI-fitted_model1]
  /home/shawn/projects/GrasslandModels/GrasslandModels/models/phenograss.py:594: RuntimeWarning: invalid value encountered in power
    g[:] = ((Tmax - Tm[i]) / (Tmax - Topt)) * (((Tm[i] - Tmin) / (Topt - Tmin)) ** (Topt/(Tmax-Topt)))

-- Docs: https://docs.pytest.org/en/latest/warnings.html

have preloaded ndvi data

instead of just GCC from the phenograss paper.

tests to impliment

make sure validation is working correctly, expect errors when site numbers don't match
some known value testing for parameter estimation

drop deepcopy

drop this from the base method, ie here

GrasslandModels/GrasslandModels/models/base.py

Line 358 in 5809d55

doy_estimates = self._apply_model(**deepcopy(self.fitting_predictors),

It's probably not needed since these are not phenology models and definitely slows things down.

add more complex model constraints

The phenograss model especially, need various constraints on the parameters. This is done to some to extent in the fitting by specificity parameter ranges, but sometimes more is needed. ie see here 7787263

Potentially add a model method for this

def check_constraints(self):
    Phmax > Phmin
    ...


# inside the mode fitting
if self.constraits_met:
    raise warning
    V[:] = 1e10
    return V

sdtaylor / grasslandmodels Goto Github PK

grasslandmodels's People

Contributors

Watchers

grasslandmodels's Issues

Recommend Projects

Recommend Topics

Recommend Org