Giter Site home page Giter Site logo

covid19_forecasting's Introduction

MIT License LinkedIn

Forecasting COVID-19 cases for the next 7 days and beyond

Accurate forecasting of COVID-19 cases is critical for epidemiological, economic and personal coping strategies, and thus it poses an important challenge for data scientists who work on time series analysis and forecasting. Here we built four different kinds of models to predict the number of COVID-19 cases for the next seven days in 23 countries. We trained the models based on the features that are readily available to maximize the usability of our models, namely we used Google mobility, weather, vaccination, previous cases, and temporal data (e.g. year, month, day etc.) as features. We compared performance of the models with cross validation. We conclude that the neural network model with LSTM layers outperform others, however, the XGBoost regressor model might be considered for a faster outcome with a comparable performance.

Motivation

covid19 cases in 5 countries

This plot illustrates dynamic time-varying patterns of COVID-19 cases (cases per million) in 5 different countries. Our goal is to construct a variety of models that predict future cases based on the prior cases and other relevant features like weather, datetime, mobility in various domains (e.g. parks, groceries, work places etc.). We built four different types of models:
  1. SARIMAX
  2. XGboost regression
  3. Multi-layer perceptron
  4. Long Short Term Memory networks (LSTM)

Each model was designed to predict cases in the next 7 days per a given day. Walk forward validation for time series data was used to test model performance on unseen data. As there existed weeks to months gap from the train dataset to the validation or the test dataset, respectively, we could see how the models perform when predicting cases far ahead into the future.

Model performance


Once each model is trained, their performance was tested on validation and test sets that were unseen by the model during training. The figure above illustrates the actual number of cases (blue) in Ireland and the cases predicted by the LSTM model (orange) on the validation set.

One major motivation of this project is to compare the predictability of different models, thus we compared how the four models performed on the validation and test sets. The plot above indicates that the LSTM (red) outperformed other model variants. Also, note that the XGBoost regression model (orange) showed a comparable performance, and thus it could be a good alternative at less computational cost than the LSTM model.

Get Started

  1. Download the data:
    Data from 23 countries comprising the features and target (daily cases per million per country) of our models are preprocessed and saved in a dictionary and pickled as 'covid_country_data.pickle'. Please follow this link to download the preprocessed data: link_to_preprocessed_data, which would be handy to kick-start building up on our codes.
  2. To install clone the repo:
git clone [email protected]:parkjlearning/covid19_forecasting.git

Additional info.

๐Ÿ“„ Please find the final report of this project here: Final report
๐Ÿ’ป Please find the final presentation of this project here: Final presentation

covid19_forecasting's People

Contributors

jup36 avatar parkjlearning avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.