Giter Site home page Giter Site logo

covid19nonpharmaceuticalinterventions's Introduction

CAN-NPI: A Curated Open Dataset of Canadian Non-Pharmaceutical Interventions in Response to the Global COVID-19 Pandemic

License: CC BY 4.0

Non-pharmaceutical interventions (NPIs) have been the primary tool used by governments and organizations to mitigate the spread of the ongoing pandemic of COVID-19. Natural experiments are currently being conducted on the impact of these interventions, but most of these occur at the subnational level - data not available in early global datasets. We describe the rapid development of the first comprehensive, labelled dataset of NPIs implemented at federal, provincial/territorial and municipal levels in Canada to guide COVID-19 research. For each intervention, we provide: a) information on timing to aid in longitudinal evaluation, b) location to allow for robust spatial analyses, and c) classification based on intervention type and target population, including classification aligned with a previously developed measure of government response stringency.

A paper describing the dataset can be read here.

This dataset covers the beginning period of the pandemic, starting in January 2020; further data updates to continue for the duration of the pandemic. This novel dataset enables robust, inter-jurisdictional comparisons of pandemic response, can serve as a model for other jurisdictions and can be linked with other information about case counts, transmission dynamics, health care utilization, mobility data and economic indicators to derive important insights regarding NPI impact.

Here we show the count of recorded interventions by time in the dataset:

Dataset Intervention Count

We also provide a list of announcements directly from provincial government sources in the sources/ folder. These announcements include articles that are not related to COVID-19. They are updated twice a day and include a model-estimated probability of being about COVID-19 intervention-related topics.

Get the Data

You can use this direct link to get the data, which is stored in CSV format in this repository.

Name Content Rows Size Link
npi_canada.csv All Canadian NPIs 4,390 17 MB Download
sources/ All Canadian Provincial Announcements during Period 13 Files 30 MB View Files

Alternatively you can clone this GitHub repository, where the dataset is named npi_canada.csv. The repository also contains notebooks for visualizations and demonstrations with the data.

git clone [email protected]:jajsmith/COVID19NonPharmaceuticalInterventions.git

Access and Details

The codebook and additional details can be found at https://docs.google.com/spreadsheets/d/1NSRyeY7XUjwUO8KICJCsOd2YKwuYaSAuM_yEnXMUbOY/edit?usp=sharing

Time Period: January 1, 2020 to September 1, 2020.

Methods and Citations

If you find CAN-NPI helpful and use it in a scientific publication, we would appreciate you referencing the following paper:

Characterizing early Canadian federal, provincial, territorial and municipal nonpharmaceutical interventions in response to COVID-19: a descriptive analysis. Liam G McCoy, Jonathan Smith, Kavya Anchuri, Isha Berry, Joanna Pineda, Vinyas Harish, Andrew T Lam, Seung Eun Yi, Sophie Hu, Laura Rosella, Benjamin Fine, COVID-19 Canada Open Data Working Group: Non-PHarmaceutical Interventions. CMAJ OPEN. 2020; 8(3):E545-E553. Published 2020 Aug 31. doi:10.9778/cmajo.20200100

Interested in Contributing?

If you have a correction or addition, please open a github issue.

Join the team or contact us at howsmyflattening.ca

covid19nonpharmaceuticalinterventions's People

Contributors

cryogenicallypreservedwombat avatar jajsmith avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

natz12 nick-gibb

covid19nonpharmaceuticalinterventions's Issues

Automate Source Retrieval for all provinces, territories.

Currently sources, links, titles and published dates are all added by hand. It would be great to build a proper search function for each source that we are interested in (starting with 13+20 government websites) that can retrieve all new articles related to COVID-19 and extract important features.

Impact: Reduce time needed for labelers to find new articles and label simple parts of the source. This could help us expand our coverage on municipality sources as well.

Method: I've done this for Ontario with the BeautifulSoup package. Can be abstracted and then a structure made for each of the other provinces, territories and municipalities. See autosource branch.

Identifying the relevance of new articles to COVID-19 NPIs

With the introduction of automated article retrieval, we would like to make it easier to identify which articles are relevant to people doing intervention labeling. As more and more articles have nothing to do with COVID-19, we would like to filter these out, or at least predict the most relevant ones.

To do this, lets try two different approaches:

  1. Baseline approach - just filter articles that contain a few relevant keywords (eg. "COVID-19", "Coronavirus", "public health").
  2. Use topic modeling on intervention articles and predict the likelihood of a new being about covid.

Some more description of the second approach:

  1. Build a dataset that mixes the labeled sources (from the latest CAN-NPI release) with the retrieved articles from the autosource branch. Label articles with interventions as being about COVID-19 and the rest as not. May be able to retrieve more historical articles and label them as not COVID-19 (eg. from same time frame in 2019) to balance the data.
  2. Split the data into a training set and test set (by article and by time) and then
  3. Run topic modeling on the training set and use the topic weightings as inputs to the model
  4. Train any number of models (try logistic, decision tree, SVM from scikit-learn) to try and learn their COVID-19 relevance.
  5. Evaluate on the test set. Determine good evaluation metrics for this task
  6. Compare results with the baseline filtering.

Notes:

  • will need to to treat french and english language articles separately - compare performance on these two languages
  • get baseline filtering working first and report accuracy
  • report on whether this could be extended to determine content of documents as it relates to specific interventions (Eg. start with Oxford financial labels E1-E4)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.