twallema / pysodm Goto Github PK

Simulating and Optimising Dynamical Models in Python 3

License: Other

Python 100.00%

emcee gillespie-algorithm model-calibration modelling-framework ordinary-differential-equations xarray integration-flow simulation

pysodm's Introduction

pySODM

Simulating and Optimising Dynamical Models in Python 3

Resources: documentation, peer-reviewed paper, pyPI

Quick installation

pip install pySODM

Aim & Scope

All the simulation projects I've undertaken over the past six years required me to do most of the following,

Integrate a system dynamical model
Its states may be represented by n-dimensional numpy arrays, labeled using coordinates
Its parameters may have time-dependencies
Its parameters may have to be sampled from distributions
It may have to be calibrate to a dataset(s) by defining and optimising a cost function

all these features required me to wrap code around an ODE solver, typically scipy.solve_ivp, and I got tired of recycling the same code over and over again, so I packaged it into pySODM.

Does other simulation software exist in Python? Sure, but most of them hold your hand by having you define symbolic transitions, which places a limit on the attainable complexity of a model, making it unfit for academic research. I wanted a piece a software that nicely does all the nasty bookkeeping like keeping track of state sizes, time dependencies on parameters, aligning simulations with datasets etc. and does so in a generically applicable way so that I'd never hit a software wall mid-project.

Overview of features

Workflow	Features
Construct a dynamical model	Implement systems of coupled differential equations
	Labeled n-dimensional model states, states can have different sizes
	Leverages `xarray.Dataset` to store labeled n-dimensional simulation output
Simulating the model	Deterministic (ODE) or stochastic simulation (Jump process)
	Time-dependent model parameter functions to vary parameters during the course of a simulation
	Draw functions to vary model parameters during consecutive simulations.
Calibrate the model	Construct and maximize a posterior probability function
	Automatically aligns data and model forecast
	Nelder-Mead Simplex and Particle Swarm Optimization
	Bayesian inference with `emcee.EnsembleSampler`

Getting started

Detailed installation instructions.
The quistart tutorial teaches the basics of building and simulating models with n-dimensional labeled states in pySODM. It demonstrates the use of time-dependent parameter functions (TDPFs) to vary model parameters over the course of a simulation and draw functions to vary model parameters during consecutive simulations.
The workflow tutorial provides a step-by-step introduction to building a mathematical model and calibrating its parameters to a dataset. An SIR disease model is built and the basic reproduction number during an outbreak is determined by calibrating the model to the outbreak data.
The enzyme kinetics and influenza 17-18 case studies apply the workflow to more advanced, real-world problems. In the enzyme kinetics case study, a 1D packed-bed reactor model is implemented in pySODM by reducing the PDEs to a set of coupled ODEs by using the method-of-lines. In the Influenza 17-18 case study, a stochastic, age-structured model for influenza is developped and calibrated to the Influenza incidence data reported by the Belgian Federal Institute of Public Health. These case studies mainly serve to demonstrate pySODM's capabilities across scientific disciplines and highlight the arbitrarily complex nature of the models that can be built with pySODM. For an academic exposee of pySODM, the Enzyme Kinetics and Influenza 17-18 case studies, checkout our peer-reviewed paper.

Versions

Version 0.2.0 (2023-01-19, PR #25)

Introduction of dimensions_per_state in model declaration, allowing the user to define models where states can have different sizes.
- Version 0.2.1 (2023-01-27, PR #30)
  
  Fixed bugs encountered when incorporating pySODM into the SARS-CoV-2 Dynamic Transmission Models. More thorough checks on bounds in the log posterior probability function. Finished manuscript.
- Version 0.2.2 (2023-01-27, PR #31)
  
  Published to pyPI.
- Version 0.2.3 (2023-05-04, PR #46)
  
  Fixed minor bugs encountered when using pySODM for a dynamic input-output model of the Belgian economy. Published to pyPI.
- Version 0.2.4 (2023-12-04, PR #62)
  
  Validated the use of Python 3.11. Efficiency gains in simulation of jump processes. Ommitted dependency on Numba. All changes related to publishing our software manuscript in Journal of Computational Science. Improved nomenclature in model defenition.
- IN PROGRESS: 0.2.5
  
  Validated the use of Python 3.12. Validated pySODM on macOS Sonoma 14.5. 'draw functions' only have 'parameters' as mandatory input, followed by an arbitrary number of additional parameters (PR #75). Tutorial environment can now be found in tutorial_env.yml and was renamed PYSODM-TUTORIALS (PR #76).
Version 0.1 (2022-12-23, PR #14)

Application pySODM to three use cases. Documentation website. Unit tests for ODE, JumpProcess and calibration.
- Version 0.1.1 (2023-01-09, PR #20)
  
  Start of semantic versions: Major.Minor.Patch
- Version 0.1.2 (2023-01-11, PR #23)
  
  Calibration of 1-D model parameters generalized to n-dimensions. Added 'aggregation functions' to the log_posterior_probability class to perform custom aggregations of model output before matching with data. xarray.DataArray/xarray.Dataset can be used as datasets during calibration. Internally converted to pd.DataFrame.
Version 0.0 (2022-11-14)
- First pySODM version. Obtained by splitting the generally applicable parts from the ad-hoc parts in UGentBiomath/COVID19-Model. Without documentation website.
Pre-development (2020-05-01 - 2022-11-24)
- Code developped to model the spread of SARS-CoV-2 in Belgium (UGentBiomath/COVID19-Model).

pysodm's People

Contributors

Stargazers

Watchers

Forkers

woldmuyn ritavazverstraeten

pysodm's Issues

Omit "stratification" in favor of "dimension"

Just a random tought: perhaps omit the use of "stratification" in favor of "dimension"?
This may be more consistent with xarray.

Different states can have different stratifications: coupled models and submodels

The number one issue with pySODM's stratification concept is that all states must have the same number of stratifications (and thus be of the same size). However, this makes the coupling of completely different models, or models with submodels, impossible to simulate.

As an example, within the COVID-19 project, I want to couple the differentials of a macro-economic IO model, where every state is stratified into 64 economical sectors, to an epidemiological model where each age group is stratified into ten age groups. xarray allows to build datasets where the data_variables have different stratifications.

I think this issue can be fixed rather easily by defining an additional variable in the model declaration declaring the dimensions of every state. From this declaration, the size of every state can be deduced and saved. Then, these sizes can be used to adequately flatten/deflatten the model's states when they are sent to the integrator.

Defining observed model states and observation process in model declaration

Current implementation

# Import the ODEModel class
from pySODM.models.base import ODEModel

# Define the model equations
class ODE_SIR(ODEModel):
    """
    Simple SIR model without dimensions
    """
    
    state_names = ['S','I','R']
    parameter_names = ['beta','gamma']

    @staticmethod
    def integrate(t, S, I, R, beta, gamma):
        
        # Calculate total population
        N = S+I+R
        # Calculate differentials
        dS = -beta*S*I/N
        dI = beta*S*I/N - 1/gamma*I
        dR = 1/gamma*I

        return dS, dI, dR

# Define parameters and initial condition
params={'beta':0.35, 'gamma':5}
init_states = {'S': 1000, 'I': 1}

# Initialize model
model = ODE_SIR(states=init_states, parameters=params)

# Define dataset
data=[d, ]
states = ["I",]
log_likelihood_fnc = [ll_negative_binomial,]
log_likelihood_fnc_args = [alpha,]

# Calibated parameters and bounds
pars = ['beta',]
labels = ['$\\beta$',]
bounds = [(1e-6,1),]

# Setup objective function (no priors --> uniform priors based on bounds)
objective_function = log_posterior_probability(model, pars, bounds, data, states, log_likelihood_fnc, log_likelihood_fnc_args, labels=labels)

Proposed implementation

Define what states are observed, and what statistical process underpins the observations in the model declaration. Then provide the parameters of the observation process (if any) upon model declaration. When simulating the model, automatically add the observation noise to the observed states, thus avoiding the use of add_gaussian_noise() or add_poisson_noise(). This further simplifies the call to log_posterior_probability.

# Import the ODEModel class
from pySODM.models.base import ODEModel

# Define the model equations
class ODE_SIR(ODEModel):
    """
    Simple SIR model without dimensions
    """
    
    state_names = ['S','I','R']
    observed_state_names = ['I']
    observation_process = ['negative_binomial']
    parameter_names = ['beta','gamma']

    @staticmethod
    def integrate(t, S, I, R, beta, gamma):
        
        # Calculate total population
        N = S+I+R
        # Calculate differentials
        dS = -beta*S*I/N
        dI = beta*S*I/N - 1/gamma*I
        dR = 1/gamma*I

        return dS, dI, dR

# Define parameters and initial condition
params={'beta':0.35, 'gamma':5}
init_states = {'S': 1000, 'I': 1}

# Define the parameters of the observation process
observation_process_params = {'I': 0.04} # overdispersion parameter `alpha` 

# Initialize model
model = ODE_SIR(states=init_states, parameters=params, observation_parameters=observation_process_params)

# Define dataset
data=[d, ]
states=[I,]

# Calibated parameters and bounds
pars = ['beta',]
labels = ['$\\beta$',]
bounds = [(1e-6,1),]

# Setup objective function (no priors --> uniform priors based on bounds)
objective_function = log_posterior_probability(model, pars, bounds, data, states, labels=labels)

'samples_dict' in a draw function, 'samples' in simulation function

A draw function is defined as follows,

def draw_function(param_dict, samples_dict):
    return param_dict

The argument samples_dict is passed from the sim function, where it is named samples. I suggest renaming samples_dict to samples and param_dict to parameters for uniformity.

Order of non-stratified vs. stratified parameters in the `integrate` function shouldn't matter when initializing a model

Suggestion

Consider an SIR model stratified into two age groups (0-20 yo, 20-120 yo) with two parameters: beta, the per-contact chance of infection, and gamma, the recovery rate of the infected individuals. In the example beta is stratified and gamma is not.

class SIRstratified(ODEModel):

    # state variables and parameters
    state_names = ['S', 'I', 'R']
    parameter_names = ['gamma']
    parameters_stratified_names = ['beta']
    stratification_names = ['age_groups']

    @staticmethod
    def integrate(t, S, I, R, beta, gamma):
        """Basic SIR model"""
        # Model equations
        N = S + I + R
        dS = -beta*S*I/N
        dI = beta*S*I/N - gamma*I
        dR = gamma*I
        return dS, dI, dR

parameters = {"gamma": 0.2, "beta": np.array([0.8, 0.9])}
initial_states = {"S": [600_000 - 20, 400_000 - 10], "I": [20, 10], "R": [0, 0]}
coordinates = {'age_groups': ['0-20','20-120']}
model = SIRstratified(initial_states, parameters, coordinates=coordinates)

Providing the parameter beta before gamma yields the following error:

ValueError: The parameters in the 'integrate' function definition do not match the parameter_names + parameters_stratified_names + stratification: ['beta', 'gamma'] vs ['gamma', 'beta']

Ideally, the order in which the parameters are given should not matter when initializing the model.

Change `log_posterior_probability` function datasets from type `pd.DataFrame` to `xarray.Dataset`

In the log posterior probability function, which we intend to optimise using NM, PSO or MCMC, we must provide the datasets as type pd.DataFrame with a multiindex if the data are stratified. However, as we are already using xarray.Dataset to format simulation output, why not provide the data to the log posterior probability function as an xarray.Dataset as well?

Probability function returned NaN during Markov-Chain Monte-Carlo sampling

Technical questions

Python version: 3.10.6
Operating System: windows 10 / hpc ugent

Description

Sometimes in the middle of Markov-Chain Monte-Carlo sampling, I get following error: ValueError: Probability function returned NaN
This don't happen always (same script could run without the error)

What I Did

Traceback (most recent call last):
  File "D:\GitHub\COVID19-Model\notebooks\calibration\woldmuyn_postponed_healthcare_calibration.py", line 439, in <module>    fig_path=fig_path+model_name, samples_path=samples_path+model_name, print_n=print_n, backend=None, processes=processes, progress=True,
  File "d:\github\pysodm\src\pySODM\optimization\mcmc.py", line 63, in run_EnsembleSampler
    for sample in sampler.sample(pos, iterations=max_n, progress=progress, store=True, tune=False):
  File "C:\Users\wolfd\.conda\envs\COVID_MODEL\lib\site-packages\emcee\ensemble.py", line 402, in sample
    state, accepted = move.propose(model, state)
  File "C:\Users\wolfd\.conda\envs\COVID_MODEL\lib\site-packages\emcee\moves\red_blue.py", line 93, in propose
    new_log_probs, new_blobs = model.compute_log_prob_fn(q)
  File "C:\Users\wolfd\.conda\envs\COVID_MODEL\lib\site-packages\emcee\ensemble.py", line 535, in compute_log_prob        
    raise ValueError("Probability function returned NaN")
ValueError: Probability function returned NaN

Error performing `df.values` in `call()` method of class `log_posterior_probability()` when optimising a model with a timeseries of length 1

I accidently tried optimising a model to a timeseries with only one datapoint,

date
2020-03-15    2753.51636
Name: GDP, dtype: float64

Which resulted in an error on the following line in __call__() of class log_posterior_probability(),

ydata = np.expand_dims(df.squeeze().values,axis=1)

Specifically, the .values method of df.squeeze() does not work and results in the following error,

AttributeError: 'numpy.float64' object has no attribute 'values'

It's a specific situation but I may need to address this at some point.

The suffix '_names' in the model declaration seems redundant

Current implementation

They're all _names technically.

class ODE_influenza_model(ODEModel):
    """
    Simple SEIR model for influenza with undetected carriers
    """
    
    state_names = ['S','E','Ip','Iud','Id','R','Im_inc']
    parameter_names = ['alpha', 'beta', 'gamma', 'delta','N']
    parameter_stratified_names = ['f_ud']
    dimension_names = ['age_group']

    @staticmethod
    def integrate(t, S, E, Ip, Iud, Id, R, Im_inc, alpha, beta, gamma, delta, N, f_ud):
        
        # Calculate total population
        T = S+E+Ip+Iud+Id+R
        # Calculate differentials
        dS = -beta*N@((Ip+Iud+0.22*Id)*S/T)
        dE = beta*N@((Ip+Iud+0.22*Id)*S/T) - 1/alpha*E
        dIp = 1/alpha*E - 1/gamma*Ip
        dIud = f_ud/gamma*Ip - 1/delta*Iud
        dId = (1-f_ud)/gamma*Ip - 1/delta*Id
        dR = 1/delta*(Iud+Id)
        # Calculate incidence mild disease
        dIm_inc_new = (1-f_ud)/gamma*Ip - Im_inc

        return dS, dE, dIp, dIud, dId, dR, dIm_inc_new

Proposed implementation

Additionally, drop the stratified parameters?

class ODE_influenza_model(ODEModel):
    """
    Simple SEIR model for influenza with undetected carriers
    """
    
    states = ['S','E','Ip','Iud','Id','R','Im_inc']
    parameters = ['alpha', 'beta', 'gamma', 'delta','N']
    stratified_parameters = ['f_ud']
    dimensions = ['age_group']

    @staticmethod
    def integrate(t, S, E, Ip, Iud, Id, R, Im_inc, alpha, beta, gamma, delta, N, f_ud):
        
        # Calculate total population
        T = S+E+Ip+Iud+Id+R
        # Calculate differentials
        dS = -beta*N@((Ip+Iud+0.22*Id)*S/T)
        dE = beta*N@((Ip+Iud+0.22*Id)*S/T) - 1/alpha*E
        dIp = 1/alpha*E - 1/gamma*Ip
        dIud = f_ud/gamma*Ip - 1/delta*Iud
        dId = (1-f_ud)/gamma*Ip - 1/delta*Id
        dR = 1/delta*(Iud+Id)
        # Calculate incidence mild disease
        dIm_inc_new = (1-f_ud)/gamma*Ip - Im_inc

        return dS, dE, dIp, dIud, dId, dR, dIm_inc_new

`int too big to convert` while calibrating

Technical questions

Python version: 3.10.6
Operating System: windows / hpc ugent

Description

During MCMC calibration I sometimes get following error Python int too large to convert to C long.
If i set warmup to 1, the error does not occur

What I Did

calibration of PI model
MDC_included = ['01',]
start = '2020-08-01'
end = '2021-10-01'

pars = ['Kp', 'Ki', 'alpha', 'gamma']
labels = ['$K_p$', '$K_i$', '$\alpha$', '$\gamma$']
bounds = [(0,10),(0,10),(0,1),(0,1)]
theta = [6.75913708e-04, 9.93008121e-04, 7.32513491e-09, 9.72186282e-02]

 File "pandas/_libs/tslibs/timedeltas.pyx", line 372, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
  File "pandas/_libs/tslibs/conversion.pyx", line 130, in pandas._libs.tslibs.conversion.cast_from_unit
OverflowError: Python int too large to convert to C long

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/emcee/ensemble.py", line 624, in __call__
    return self.f(x, *self.args, **self.kwargs)
  File "/kyukon/data/gent/448/vsc44803/pySODM/src/pySODM/optimization/objective_functions.py", line 618, in __call__
    out = self.model.sim([self.start_sim,self.end_sim], **simulation_kwargs)
  File "/kyukon/data/gent/448/vsc44803/pySODM/src/pySODM/models/base.py", line 670, in sim
    output.append(self._mp_sim_single(dictionary, time, actual_start_date, method=method, rtol=rtol, l=l))
  File "/kyukon/data/gent/448/vsc44803/pySODM/src/pySODM/models/base.py", line 527, in _mp_sim_single
    out = self._sim_single(time, actual_start_date, method, rtol, l)
  File "/kyukon/data/gent/448/vsc44803/pySODM/src/pySODM/models/base.py", line 494, in _sim_single
    output = solve_ivp(fun, time, y0, args=[self.parameters], t_eval=t_eval, method=method, rtol=rtol)
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/scipy/integrate/_ivp/ivp.py", line 555, in solve_ivp
    solver = method(fun, t0, y0, tf, vectorized=vectorized, **options)
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/scipy/integrate/_ivp/rk.py", line 96, in __init__
    self.h_abs = select_initial_step(
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/scipy/integrate/_ivp/common.py", line 113, in select_initial_step
    f1 = fun(t0 + h0 * direction, y1)
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/scipy/integrate/_ivp/base.py", line 138, in fun
    return self.fun_single(t, y)
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/scipy/integrate/_ivp/base.py", line 20, in fun_wrapped
    return np.asarray(fun(t, y), dtype=dtype)
  File "/data/gent/448/vsc44803/anaconda/envs/COVID_MODEL/lib/python3.10/site-packages/scipy/integrate/_ivp/ivp.py", line 527, in <lambda>
    fun = lambda t, x, fun=fun: fun(t, x, *args)
  File "/kyukon/data/gent/448/vsc44803/pySODM/src/pySODM/models/base.py", line 388, in func
    date = self.int_to_date(actual_start_date, t)
  File "/kyukon/data/gent/448/vsc44803/pySODM/src/pySODM/models/base.py", line 538, in int_to_date
    date = actual_start_date + pd.Timedelta(t, unit='D')
  File "pandas/_libs/tslibs/timedeltas.pyx", line 1695, in pandas._libs.tslibs.timedeltas.Timedelta.__new__
  File "pandas/_libs/tslibs/timedeltas.pyx", line 351, in pandas._libs.tslibs.timedeltas.convert_to_timedelta64
  File "pandas/_libs/tslibs/timedeltas.pyx", line 374, in pandas._libs.tslibs.timedeltas._maybe_cast_from_unit
pandas._libs.tslibs.np_datetime.OutOfBoundsTimedelta: Cannot cast 113763.54743547516 from D to 'ns' without overflow.

Split base.py in ODE and JumpProcess

Importing the ODE and JumpProcess modules is currently done as follows,

from pySODM.models.base import ODE, JumpProcess

But it would actually be more straightforward to move these classes from the base.py to a separate file ODE.py and JumpProcess.py so they can be loaded as follows,

from pySODM.models import ODE, JumpProcess

twallema / pysodm Goto Github PK

pysodm's Introduction

pySODM

Quick installation

Aim & Scope

Overview of features

Getting started

Versions

pysodm's People

Contributors

Stargazers

Watchers

Forkers

pysodm's Issues

Current implementation

Proposed implementation

Suggestion

Technical questions

Description

What I Did

Current implementation

Proposed implementation

Technical questions

Description

What I Did

Recommend Projects

Recommend Topics

Recommend Org