Giter Site home page Giter Site logo

risk-networks's People

Contributors

agarbuno avatar dburov190 avatar glwagner avatar jinlong83 avatar odunbar avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

risk-networks's Issues

Move `examples/paper` to `/paper`

I think it'd be forward-looking to move the code that is meant to produce data and figures for the paper to move to its own directory.

The purpose of examples/ is to demonstrate how the code works. Its effectiveness is dependent on the smallness and simplicity of its contents. It is currently a bit bloated.

Consider that the code in paper/ will be frozen to a certain commit at the time of submission, even while the code continues to evolve. We will want to keep examples/ working for the current version of the code, but we will not want to change the code in paper/, as it is meant to reproduce results reported in the paper.

It could even make sense at some point to move paper/ to a separate repository (this will make reproducing the results a bit easier, since we can record the relevant commit hash of epiforecast). It doesn't really make sense to mix a code (which is a living, evolving thing) with the scripts that are needed to reproduce paper results (which are fixed in time and depend on a specific version of the code / git commit). We do not have to decide on this point now, however; this could be the last thing that's done before publication and opening up of the repo.

No more `DiurnalContactInceptionRate`?

I was doing some work with the DA and the kinetic model. When I merge into master I found out that the class DiurnalContactInceptionRate is no longer available. Is this reflected on the Readme.md?

The piece of the code that needs it is:

epidemic_simulator = EpidemicSimulator(
                 contact_network = contact_network,
           mean_contact_lifetime = mean_contact_lifetime,
          contact_inception_rate = DiurnalContactInceptionRate(minimum = 2, maximum = 22),
                transition_rates = transition_rates,
     community_transmission_rate = community_transmission_rate,
 hospital_transmission_reduction = hospital_transmission_reduction,
         static_contact_interval = static_contact_interval,
                  health_service = health_service,
                      start_time = start_time
                                      )

Exogenous infections

In a recent meeting it was discussed to include Exogenous infections, where users of the app can be infected by those without it.
This could be treated by

  1. Obtaining the number of external connections subgraphs produced in UserBase (this is already in a function there)
  2. Adding these weights to the subgraph network nodes.
  3. Adding an exogenous rate to the "E" equation of MasterEquationModelEnsemble

EpidemicSimulator faces reproducibility issues again

After merging the new "ContactSimulator", I realized that the reproducibility issues appeared again. I performed some simulations using "simulate_idealized_NYC_epidemic_nointervention.py" and always obtain different trajectories.

@glwagner Any ideas what causes these issues (dictionaries, random access elements, ...)?

I couldn't find any additional libraries in the new contact simulator function that could cause these reproducibility issues. I also checked different "seeding" protocols, but it didn't solve the issue.

Ease of use issues for `ContactGenerator`

  • It should be possible to initialize ContactGenerator with parameters such as:

    • edge_list a list that denotes which w_ji are non-zero
    • parameters associated with the contact generation model
  • We probably also want to have another class --- say DiurnalContactModulation which contains information about how mean rates of contact are modulated throughout the day. This will allow us to change this function, for example, from (1 - cos(t)^4)^4 to (1-cos(t)^2)^2. This class should have a function of the form mean_contact_duration(time_of_day).

  • The ContactGenerator should have a function --- something like contact_generator.simulate_contacts. This method should return a time-averaged contact network in some form, whether that be a dictionary corresponding to w_ji, or a sparse matrix. The inputs should be something like

    • start_time: the time of day to start, referenced to midnight
    • averaging_interval: the time-interval over which to run the contact network

Output times for temporal contacts simulation seem wrong

This code:

import numpy as np
import os, sys; sys.path.append(os.path.join(".."))

from epiforecast.contacts import ContactGenerator,StaticNetworkTimeSeries

np.random.seed(12345)
edge_list_file = os.path.join('..', 'data', 'networks', 'edge_list_SBM_1e3.txt')
active_edge_list_frac = 0.034
mean_contact_duration = 1.0 / 1920.0  # unit: days

# construct the generator
contact_generator= ContactGenerator(edge_list_file,
                                     active_edge_list_frac,
                                     mean_contact_duration)

# create the list of static networks (for 1 day)
static_intervals_per_day = int(2) #must be int
static_network_interval = 1.0/static_intervals_per_day
day_of_contact_networks = StaticNetworkTimeSeries()
contact_generator.generate_static_networks(static_network_list=day_of_contact_networks,
                                           dt_averaging=static_network_interval)

print("Start time:", contact_generator.t0)
print("Stop time:", contact_generator.t1)
print("Birth/death times:", contact_generator.times)

produces:

Start time: 0.0
Stop time: 1.0
Birth/death times: [0.         0.00401336 0.00812667 0.01216957 0.01609839 0.02003514
 0.02400519 0.02800519 0.03208136 0.03601827 0.0400094  0.04400589
 0.04806824 0.05201493 0.05603721 0.06003129 0.06403446 0.06802471
 0.07211888 0.07612433 0.08005874 0.08405014 0.0880175  0.09204642
 0.09621028 0.10030389 0.10403115 0.1080373  0.11202207 0.11601673
 0.12000081 0.1240108  0.12806288 0.13205449 0.13606364 0.14000924
 0.14401498 0.1480657  0.15206046 0.15608879 0.16025658 0.16402694
 0.16804135 0.17204249 0.17600942 0.18051468 0.18406326 0.18808714
 0.1920246  0.1961438  0.20000846 0.20432648 0.20812067 0.21206776
 0.21605961 0.22018659 0.22400221 0.22806346 0.23200533 0.23604351
 0.24037212 0.24402917 0.248025   0.25205581 0.25608147 0.26002858
 0.26400065 0.26802923 0.2720463  0.27604605 0.28002648 0.28402813
 0.28802523 0.29200641 0.29600096 0.30000615 0.30404507 0.30801393
 0.31200526 0.31615849 0.32002301 0.32401775 0.32807395 0.33204853
 0.33606264 0.34004273 0.34400716 0.34807622 0.35202952 0.35602722
 0.36002322 0.36406266 0.36804313 0.37211678 0.37605928 0.38013173
 0.38403772 0.38813364 0.39217248 0.39609233 0.4001655  0.40404885
 0.40806934 0.41232591 0.41610893 0.42074454 0.42400289 0.42821755
 0.43300812 0.43605162 0.44008399 0.44448888 0.44898761 0.45224854
 0.456072   0.46009569 0.4647293  0.46880973 0.4721274  0.47695183
 0.48005003 0.48533513 0.48837028 0.49304084 0.49931858 0.50176534
 0.50654111 0.51156591 0.51236821 0.51631232 0.53107713 0.53140987
 0.53188511 0.53345727 0.55193081 0.55922347 0.56213265 0.56228804
 0.56306499 0.56772741 0.57091388 0.58156196 0.60149484 0.61111747
 0.64823248]

It looks like the output times do not extent very close to the final time, t1=1.0. Why is this? Is this correct?

Is the Ensemble Kalman updating states correctly?

I input 2 uniform(0,1) states for 10 ensemble members each. No model parameters need updating, just the state.

We have true data of 1.0, variance of 1e-9 -> essentially certain data.

I expect therefore, that the output state to be an ensemble very close to 1 with small variance. Instead i get an ensemble almost identical to the input.

This snippet is the MWE (github doesn't allow me to attach .py file here):

import os, sys; sys.path.append(os.path.join(".."))
import numpy as np
import copy
from epiforecast.ensemble_adjusted_kalman_filter import EnsembleAdjustedKalmanFilter

np.random.seed(999111999)
ensemble_size = 10
#data
truth = np.array([1.0,1.0])
var = np.array([1e-9,1e-9])
cov=np.diag(var)
#generate ensemble_state
ensemble_state=np.random.random(size=(ensemble_size,2))
#pass in 'no' model parameters
ensemble_transition_rates = np.empty((ensemble_size, 0), dtype=float)
ensemble_transmission_rate = np.empty((ensemble_size, 0), dtype=float)
eakf=EnsembleAdjustedKalmanFilter()
prev_ensemble_state = copy.deepcopy(ensemble_state)
(ensemble_state[:,:],
new_ensemble_transition_rates,
new_ensemble_transmission_rate
) = eakf.update(ensemble_state[:,:],
ensemble_transition_rates,
ensemble_transmission_rate,
truth,
cov)
print(np.mean(prev_ensemble_state,axis=0))
print(" ")
print(np.mean(ensemble_state,axis=0))

Numba seed/reproducibility issue

Running the same epidemic simulation twice doesn't produce the same output. As an example, I used "super_simple_epidemic.py" and obtained

super_simple_epidemic

and (for the same parameters):

super_simple_epidemic1

The origin of this issue may be the numba parallelization routines in "ContactSimulator", as pointed out by @glwagner:

numba/numba#3249

`ContactSimulator` is too slow

@lubo93 said:

Probably we have to go back to my original implementation where I simulated only one diurnal cycle (it's useful for system sizes N > 1e3). The current (continuous) contact simulation seems to be relatively slow.

We need to know the requirements of this system. What is the size of the network that we want to run, and how quickly do we need it to run?

Uniquely identify people with nodes; use HealthService to manage hospitalization

@odunbar, @agarbuno, and I propose to make the size of the contact network equal to the size of the population, omitting placeholder nodes.

The ContactSimulator will simulate all possible contacts, which includes contacts between healthcare workers and hospital beds.

Given the hospitalization state of a population, some of these edges are "phantom edges", since they represent connections between community members and people who are hospitalized, or connections between heath care workers and empty hospital beds.

In consequence:

  • len(contact_network) = population
  • n_contacts = population + hospital_beds

The HealthService keeps track of who is in the hospital. This mapping determines the structure of the contact network, because when a node is hospitalized, their neighbors change. The HealthService will also manage setting the weights of the contact network, since the HealthService manages the mapping of contacts (which includes "phantom edges") to "actual" edges on the contact network.

Both the KineticModel and the MasterEquationModelEnsemble will simulate an epidemic on the "actual" contact network, which does not include placeholder nodes.

There are two advantages of this approach. One is that it corresponds more closely to the "true" system --- nodes represent people, and "hospitalization" changes the contact network, not the "identity" of a node.

The second advantage is corollary to the first: clinical information attached to nodes does not need to be "copied" to placeholder nodes when somebody is hospitalized (which requires extra lines of code for the correct simulation of an epidemic with correct rates). "Hospitalization" merely implies a change in the contact network, not a change in the clinical parameters of a node. This reduces the complexity of the code; the hospitalization procedure is the sole domain of the HealthService.

`update_beta_rates` is updating both transmission rate and average contact rates

This function

https://github.com/dburov190/risk-networks/blob/00c5ab689f8cd4bb8b1fe286c49c681390af327d/epiforecast/kinetic_model_simulator.py#L235

may be misleading. It looks like beta_dict is not a dictionary of transmission rates, but rather a dictionary of the conflated quantity

beta * w_ij

Is this correct?

In other words, the transmission rate (which we currently assume is constant across the network, and is a parameter that needs to be inferred in data assimilation) is multiplied by the average contact rate. I think it is incorrect to call this product of terms "beta".

cc @odunbar @tapios @dburov190 @lubo93

End time for Gillespie_simple_contagion is *not* tmax

We specify tmax in the call to Gillespie_simple_contagion:

https://github.com/dburov190/risk-networks/blob/3fb0f3a104cdcc705ef201dd25a0cea80b17332e/epiforecast/kinetic_model_simulator.py#L128

Note, however, that this is not the final time of the simulation. Similar to ContactSimulator, the Gillespie_simple_contagion "overshoots" the end of the interval. In other words, self.time in

https://github.com/dburov190/risk-networks/blob/3fb0f3a104cdcc705ef201dd25a0cea80b17332e/epiforecast/kinetic_model_simulator.py#L130

is not equal to tmax (which I think is assumed in this function).

We need to take this into account when specifying interval-wise simulations. We can probably reuse the same method / algorithm developed for the same purpose in ContactSimulator.

Cycle_contacts is an unnecessary feature

@lubo93 @glwagner It seems we were having some conceptual issues with the how the cycle contacts would work with the full epidemic simulator with DA.

I feel this feature is actually unnecessary as it does not work with the changing contact network such as the HealthService and thus should be removed rather than accommodated.

Greg's new simulator means that setting the contact durations is going to be very fast, and not the bottleneck in the system so the cycling provides no benefit.

Issues raised in (June 7th)

A Possibly incomplete list of tasks from the meeting
Bugs that need fixing/testing:

  • Probabilities sum - to - one testing after DA (data assimilation) @odunbar
  • Remove capacity of health service (then either remove wait list entirely or remove the reset to 'I' if on wait list) @glwagner

Imminently required features:

  • Truncated EAKF to replace EAKF @jinlong83
  • Learnable exogenous infection rate on user network and Master equations. @odunbar
  • Decouple the DA and observations, so DA is less frequent (this will require storage of the static contact networks)
  • Accuracy plot, User network size (fraction) vs number of tests. Plotting the correct classification of I and E. @agarbuno

Other features:

  • Interventions, for example targeted isolations @dburov190
  • Smoothing / Backwards Forwards DA @jinlong83 @dburov190
    Interesting features
  • Observations - budget
  • Island populations accuracy plot,

Avoid assumption that nodes are a simple range

The issues discussed in #58 and #60 arise because the node id's generated from 'edge_list_SBM_1e4.txt' are not a range from 0 to population-1.

In contrast, many initialization functions --- populate_ages, sample_distribution, and TransitionRates assume that the nodes are a range from 0 to population-1, and therefore correspond 1-1 with the node id's.

To relax the assumption that node id's are a range, we need to either use dictionaries, or the contact network itself to assign ages and clinical statistics to the population.

I propose that we use the contact network itself. In this abstraction, we will assign ages to the contact network according to a distribution. Thus populate_ages will be replaced with assign_ages(contact_network, distribution). The generation of distributions for the clinical parameters, as well as the constructor for TransitionRates will have to be modified accordingly.

I don't think this will take long. @lubo93, please advise.

'do_step_full' fails in master equations ensemble

Creating MasterEquationModelEnsemble with reduced_system=False will result in a crash when simulate method is executed:

Traceback (most recent call last):
  File "./DA_interventions.py", line 197, in <module>
    states_ensemble = master_eqn_ensemble.simulate(day)
  File "../../epiforecast/risk_simulator.py", line 274, in simulate
    self.y0[mm] += self.dt * self.do_step_full(t, self.y0[mm], member, closure = closure)
  File "../../epiforecast/risk_simulator.py", line 226, in do_step_full
    member.y_dot = member.coeffs.dot(member.y0)
  File "/Library/Python/3.7/site-packages/scipy/sparse/base.py", line 363, in dot
    return self * other
  File "/Library/Python/3.7/site-packages/scipy/sparse/base.py", line 502, in __mul__
    raise ValueError('dimension mismatch')
ValueError: dimension mismatch

Refactor ContactSimulator / KineticModel / MasterEquationEnsembleModel interaction

Let's implement a forward simulation example where we

  1. Obtain a list of edges.
  2. Diagnose the "hospitalization" state of the network from KineticModel.
  3. Simulate contacts for all nodes over some short interval, such as 1/8 of a day.
  4. Pass both the hospitalization state and the contact duration to correctly determine the contact network in KineticModel.
  5. Pass both the hospitalization state and the contact duration to correctly determine the contact network in MasterEquationEnsembleModel.
  6. Run the KineticModel forward.
  7. Run the MasterEquationEnsembleModel forward.

We should also update simulate_kinetic_model.py to do steps 1-4 and 6 separately, without the MasterEquationModelEnsemble.

Once this is done, we can nuke contacts.py, ContactGenerator, and StaticNetworkTimeSeries.

cc @odunbar @lubo93

Two classes "Observations"

There are currently two classes named Observations defined in:

  • observations.py
  • measurements.py

In the latest simulation examples (like the ones with idealized NYC data) it seems that the latter is used; can we get rid of the former? Or merge? or rename? If we need both, what is the purpose of each one of them?

`ContactGenerator` produces NaNs if `dt_averaging` is too small

This code:

import numpy as np
from epiforecast.contacts import ContactGenerator, StaticNetworkTimeSeries

np.random.seed(12345)

second = 1 / 60 / 60 / 24 # in units of days

contact_generator = ContactGenerator(os.path.join('..', 'data', 'networks', 'edge_list_SBM_1e3.txt'), 0.034, 45 * second)
    
networks_time_series = StaticNetworkTimeSeries()

for dt_averaging in (1, 1/2, 1/4, 1/8):
    contact_generator.generate_static_networks(static_network_list = networks_time_series,
                                                      dt_averaging = dt_averaging)
    
    print('dt_averaging:', dt_averaging)
    print(networks_time_series.community_networks_dict[-1][(0, 1)])

produces

dt_averaging: 1
0.9793103448275862
dt_averaging: 0.5
1.0
/Users/gregorywagner/opt/anaconda3/envs/risknet/lib/python3.6/site-packages/numpy/core/fromnumeric.py:3335: RuntimeWarning: Mean of empty slice.
  out=out, **kwargs)
/Users/gregorywagner/opt/anaconda3/envs/risknet/lib/python3.6/site-packages/numpy/core/_methods.py:154: RuntimeWarning: invalid value encountered in true_divide
  ret, rcount, out=ret, casting='unsafe', subok=False)
dt_averaging: 0.25
nan
dt_averaging: 0.125
1.0

Thus when dt_averaging = 1/4, there are NaNs in the weights. This can also occur for dt_averaging = 1/8, depending on the seed, I think.

(Running the above requires epiforecast to be on one's path, eg import os, sys; sys.path.append(os.path.join("..")) from /examples.)

cc @odunbar

simulate_idealized_NYC_epidemic.py does not run

It produces this error:

(risknet) ~/Projects/risk-networks/examples$ python3 simulate_idealized_NYC_epidemic.py 
Traceback (most recent call last):
  File "simulate_idealized_NYC_epidemic.py", line 138, in <module>
    static_population_network = contact_network)
TypeError: __init__() got an unexpected keyword argument 'patient_capacity'

I suggest that we nuke this example, as there are several similar examples for an NYC epidemic.

Package name "epiforecast" may be a misnomer

As discussed elsewhere, the name epiforecast may be a misnomer --- the main functionality of our package does not provide forecasting, due to the fact that the contact network cannot be inferred.

It's relative easy to change the name so this is not urgent. But, good ideas as they come up can be left here.

It's been mentioned that the name should evoke the concept of "risk".

Possible starting points: epirisk, epitrace, risktracer, safecontact...

README is out of date

README and examples need to be updated to reflect new abstractions: ContactSimulator, KineticModel.set_mean_contact_duration, etc.

Soliciting tests

I'm going to put together some tests using pytest.

It's fairly simple: we'll just have a folder called /test, and files named test_* that implement tests according to the pytest style.

I'll put in basic unit tests that ensure that critical functions runs, and some "regression" style tests that ensure output is reproducible.

However, we will also need tests that ensure behavior is correct. These are more difficult to come up with. We can discuss ideas on this issue. A few ideas:

  • known solutions to the master equation model for simple networks
  • statistical tests that make sure the samplers work as intended

Set initial fraction of active edges to equilibrium solution

@lubo93 has pointed out that the initial fraction of active edges set, for example, here:

https://github.com/dburov190/risk-networks/blob/17835e95279ef47a76b86c07ac55655e38cc1c35/examples/generate_static_contact_networks.py#L11

should be set to the equilibrium solution

initial_fraction_active_edges = lambda / (1 / mean_contact_duration + lambda)

where lambda is the current number of mean contacts.

This means that the function outputting the mean contact rate, which is used by the ContactSimulator here:

https://github.com/dburov190/risk-networks/blob/17835e95279ef47a76b86c07ac55655e38cc1c35/epiforecast/contact_simulator.py#L73

needs to be provided during instantiation of ContactSimulator.

Master Equation and Kinetic model inconsistent with same parameters

Myself and @agarbuno were looking at efficacy of different tests, and noticed that we seem to get very different paced epidemics when running master equations and kinetic model.

I have run the master and kinetic equations with:

  • same ICs,
  • same transition_rates (set all ensembles to be the transition_rates chosen for
  • same community_transmission_rate

This is achieved by setting
transition_rates_ensemble[i] = transition_rates
transmission_rate_community_ensemble[i] = transmission_rate_community

da_ric_tprobs_

The lines are from the master equation solution, the 'X's are totals for infected, hospitalized and deceased from the kinetic model.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.