yannickoswald / covpol Goto Github PK

COVPOL: An agent-based model of the international COVID-19 policy response

Python 6.22% Jupyter Notebook 93.78%

agent-based-modeling data-assimilation data-science pandemic-analysis pandemic-data particle-filter policy-analysis covid19

covpol's People

Contributors

Stargazers

Watchers

Forkers

nickmalleson

covpol's Issues

Use random seed for reproducibility

Would it be worth us setting a random seed at the top of the script for producing figures so that every time someone runs the script they get reproducible results?

Change branch name: `master` -> `main`

A while back, GitHub (along with many other version control platforms) changed the naming convention for the default branch on repositories from master to main (https://www.theregister.com/2020/06/15/github_replaces_master_with_main).

I'd suggest making that change for the repository (https://www.git-tower.com/learn/git/faq/git-rename-master-to-main/).

Submission

I will submit today or latest by Monday.

I think the replication should work now (or soon at least with minor changes). I reproduced everything many times now and tested all scripts many times now so that should be fine :).

thanks for all your help with this paper. I learned so much about coding, version control and collaborative development. Looking forward to the next one. @ksuchak1990 @nickmalleson

Add brief description to "About" section

The "About" section for the repository should contain a brief description of the project to let people know what it is about.

A collection of minor comments ...

Variable and class names

The convention in python is to have these lower case (so n in the following), but with underscores (which you do :-) )

https://github.com/eeyouol/Covid_policy_response_abm/blob/da92205d3542eac18b6a499d9cc7bfb1aadcf04f/code/run_base_model_and_filter_with_plotting.py#L54

Classes should be in camel case (each word capitalised with no underscore) which I think you do throughout

Paralelisation

You can find out how many cores the computer has with multiprocessing.cpu_count() so don't need to hard-code the number of cores (not that this matters particularly

https://github.com/eeyouol/Covid_policy_response_abm/blob/da92205d3542eac18b6a499d9cc7bfb1aadcf04f/code/run_base_model_only_parallelized.py#L66

`else: pass`

I gues you know that this kind of thing is unnecessary:

https://github.com/eeyouol/Covid_policy_response_abm/blob/da92205d3542eac18b6a499d9cc7bfb1aadcf04f/code/agent_class.py#L226

as it doesn't do anything. But its fine to leave it in, in some ways it may help readability.

List comprehensions

... are really useful but sometimes they might make things more complicated than they need to be. E.g. with the following I'm not sure if the list comprehension is better than the more verbose version:

https://github.com/eeyouol/Covid_policy_response_abm/blob/da92205d3542eac18b6a499d9cc7bfb1aadcf04f/code/agent_class.py#L210

Original:

           total_differences_array = np.sort(np.array(
               [  1/3 * abs(y1 - x.income) / range_income 
                + 1/3 * abs(y2 - x.politicalregime) / range_politicalregime
                + 1/3 * (geo_distance(y3_1, x.latitude,
                                      y3_2, x.longitude) / max_distance_on_earth)
                 for x in self.model.schedule.agents if x.state == 1]
                  )
               )

Without list comprehension: (which, by the way, I got ChatGTP to write for me :-) , but I think it's right .. )

total_differences_array = []
for x in self.model.schedule.agents:
    if x.state == 1:
        total_differences =  1/3 * abs(y1 - x.income) / range_income + 1/3 * abs(y2 - x.politicalregime) / range_politicalregime + 1/3 * (geo_distance(y3_1, x.latitude, y3_2, x.longitude) / max_distance_on_earth)
        total_differences_array.append(total_differences)
total_differences_array = np.sort(np.array(total_differences_array))

Can't find `number_of_particles_experiment_MSE.py`

I can't find the number_of_particles_experiment_MSE.py which is mentioned in the README.

Building the environment failed

I couldn't build the environment because anaconda couldn't find some of the packages. But lots of these are dependencies so the precise version might not matter. One thing you could try is building an environment file that only includes the packages that you explicitly ask for, then leave anaconda to find the dependencies itself. You can do this with the --from-history flag. See:
Exporting an environment file across platforms

If you do that you'll see that the env.yml file is much shorter because it doesn't list all the dependencies, only the packages that you have explicitly installed.

(base) macbook:Covid_policy_response_abm nick$ conda env create -f env.yml 
Collecting package metadata (repodata.json): done
Solving environment: failed

ResolvePackageNotFound: 
  - libgcc-ng=11.2.0
  - click=8.1.3
  - python-slugify=6.1.2
  - libxcb=1.15
  - _openmp_mutex=5.1
  - cookiecutter=2.1.1
  - chardet=5.0.0
  - charset-normalizer=2.1.1
  - libgomp=11.2.0
  - networkx=2.8.7
  - libxkbcommon=1.0.1
  - libclang=10.0.1
  - python_abi=3.10
  - unidecode=1.3.6
  - ld_impl_linux-64=2.38
  - libstdcxx-ng=11.2.0
  - mesa=1.1.1

Code at the beginning of class source files

In https://github.com/eeyouol/Covid_policy_response_abm/blob/master/code/agent_class.py there is a load of code at the beginning of the script that reads data and sets some variables. @ksuchak1990 may know more about this, but I think it may be bad practice because it means whenever this script is accessed by pythom, then that code will be run, even if you don't want it to. E.g. just putting import agent_class will cause all the code to be run but you might not actually want that (maybe you want to use another function in agent_class and don't care about loading data first). Or if you start a parallel process it may re-read that file and run the code in each of the child processes (we came across something similar before).

Instead, you could put that code into the CountryAgent class, so that it is only run the first time an object of tyle CountryAgent is created. (I think). Or you could put it into a specific function in CountryAgent and call it directly. Lets discuss

Great readme!

This isn't an issue as such, but I thought it worth pointing out where I thought things are imporessive. Otherwise you only get my suggestions for changes, not the more positive comments :-)

Beautiful documentation :-)

https://github.com/eeyouol/Covid_policy_response_abm/blob/master/code/agent_class.py#L114

https://github.com/eeyouol/Covid_policy_response_abm/blob/da92205d3542eac18b6a499d9cc7bfb1aadcf04f/code/agent_class.py#L114

Full replication/code run by external? e.g. Keiran?

I just updated the Jupyter notebook filepaths also. Hope that works.

Can we run a complete test now whether the env. install works + all file paths works? Or is still sth. missing?

`ClassMethod`

In the agent_class.reset() function:

https://github.com/eeyouol/Covid_policy_response_abm/blob/da92205d3542eac18b6a499d9cc7bfb1aadcf04f/code/agent_class.py#L269

you should explicitly tell python that it is a class method, not an object method.

Currently it is:

    def reset(cls):
        CountryAgent.instances = []

But I think it should be:

    @clasmethod
    def reset(cls):
        cls.instances = []

The beaviour is the same for both functions, but for the first one you need an agent object to access it (e.g. agent.reset()) whereas with the other you can call it without access to a particular agent object (e.g. CountryAgent.reset()).

Comments on Yannick's code

Hi Yannick,

Here are some comments on your code. It's all minor stuff, very impressive how much the code has come on.

I'm not sure if using a github issue is the best way to do this, but at least it makes it easy to reference the source directly.

First general comment: now that you're using gitub you don't need to save different file version (model1.py, model2.py, etc.). Every time you commits, github saves the old and new versions of the files, so you can go back to old file versions if you need to. You can also 'tag' the code to label it and make it easy to find a old version )

You should add a .gitignore file to your repository so that it contains the things you need to run the code (mainly source and documentation), but no temporary files that are created when you run it. Here's an example of one I have used: https://github.com/Urban-Analytics/dust/blob/main/.gitignore You should ignore things that are specific to your local installation, like temporary python compiled files (e.g. __pycache__ directory https://github.com/eeyouol/Covid_policy_response_abm/tree/master/__pycache__ ) and possibily your spyder project stuff (https://github.com/eeyouol/Covid_policy_response_abm/tree/master/.spyproject/config) (although maybe google 'sypder project files github' to see what should and shouldn't be included in a repository). These don't need to be in the repo because when someone runs the code they will be created automatically.

Some specific comments:

https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/model_class2.py#L19 be aware that this means the code will only work for you on that laptop (no big deal but in the longer term we can look at how to make sure it runs for everyone by using relative rather than absolute paths)
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L59 best to do all importing at the top of the file
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L323 the indentation here is horribe! :-) I think sypder messed it up trying to be helpful? Python has conventions at how code should be formatted, but my understanding is their main rule is that the code should be easy to read, so if a convention doesn't work in a particular case then ignore it and format it however is best.
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L393 Not sure what's going on here, but I think I may be reading an old version of the file which has subsequently been split into classes.
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L436 lovely documentation :-)
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L487 I don't think you need the copy here. The current_model variable will definitely represent a new model, not a model created in a previous loop iteration (if that's what you were worried about). The line: https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L478 creates a new object and assigns it to the variable current_model (whatever object current_model used to point to, it doesn't point there now).
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L525 Nice use of a list comprehension here. This is what they're great for, clear and concise ways to create a list.
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L550 Not sure why you do this. You now have to variables (weights and weights_arg) that point to the same thing. I think (unless I'm missing something) that you could re-define your function as follows, and then you don't need to do the variable assignment on lines 550 and 551: def resample_particles(cls, list_of_particles, weights):
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L567 Lets talk about the need (or not) for copying at some point. Languages handle passing around objects and variables in different ways. Here's the top google hit, but it might be worth reading a little more about 'passing by reference' and 'passing by value' https://www.geeksforgeeks.org/is-python-call-by-reference-or-call-by-value/
https://github.com/eeyouol/Covid_policy_response_abm/blob/f138e56bdae73af12edf47b75d54bcb6e72a6403/country_model_24_data_version.py#L607 I don't like the list comprehension here because python makes a new list for you, full of particles that have just been stepped, but you don't actually ever use that list. A neater way would be to just step the particles directly and keep the old list:
```
for x for x in list_of_particles:
  ParticleFilter.advance_particle(x) 
```
Or, you could get rid of the advance_particle funtion entirely:
```
for x for x in list_of_particles:
  x.step()
```
will do the same thing.

File paths only work on Yannick's laptop

File paths point to a specific location on Yannick's laptop. E.g os.chdir("C:/Users/earyo/Dropbox/Arbeit/postdoc_leeds/ABM_python_first_steps/implement_own_covid_policy_model"). This means that other people wont be able to run the code. You should use relative paths.

In a notebook this is quite straightforward as I think the notebook directory is always set to be the place where the notebook is stored. E.g. if I open run_base_model_and_filter_with_plotting_jupyter.ipynb and do %pwd (it's a notebook command that says show me the _p_resent _w_orking _d_irectory) then I get /Users/nick/gp/Covid_policy_response_abm, which is the directory on my machine where the notebook is stored. So any other files can be referenced from that directory.

This means that lines like this:

with open('C:/Users/earyo/Dropbox/Arbeit/postdoc_leeds/ABM_python_first_steps/implement_own_covid_policy_model/data/correlation_between_policies_and_metrics.csv') as f:

can be replaced with simply:

with open('./data/correlation_between_policies_and_metrics.csv') as f:

(the . is unnecessary but I like it because it says explicitly that the directory to start from is the current directory)

Sometimes it is slightly trickier with code and packages, but we can come on to that if it's a problem...

Change `./code/` to `./src/`

When I try to run the jupyter notebook with the update environment, I'm getting some errors that mean that the notebook cannot connect to the kernel. This looks like it arises from the python scripts being in a directory called code which apparently causes some conflicts somewhere. I've tried changing the directory name on my machine and the notebook can now connect to the kernel.

Can we change the name of this directory to something like src?