Giter Site home page Giter Site logo

jaeyow / f1-predictor Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 275.94 MB

End-to-End Data Science Project using multiple ML models and trying out GitHub Actions as a cheap (and free) MLOps tool alternative

License: MIT License

Jupyter Notebook 87.14% SCSS 1.86% HTML 3.82% JavaScript 6.55% CSS 0.04% Python 0.58%
machine-learning mlops

f1-predictor's Introduction

F1 Prediction MLOps

Capstone Project


Using GitHub Actions as a cheap (and free) MLOps tool alternative: - invoke MLOps workflow on-demand (or with a cron schedule)

  • get latest source
  • setup Python build/MLOps environment
  • data retrieval and preparation
  • feature engineering
  • preparation for model training (including dummify categorical features)
  • feature selection
  • model building and scoring
  • setup serverless (lambda) API in AWS
  • deploy model to serverless API
  • profit!

Problem Statement

Ever since the first season of Drive to Survive, I've been captivated by the drama and excitement that is Formula 1. I've been consuming this public API in some of my past blog posts and I thought it would be fun to continue this trend and explore the insights and predictions that can be gleaned from past race data:

  • predict the podium placers (1st, 2nd, 3rd) in a race
  • predict the winner (pole-position) in a qualifying race
  • predict who wins the fastest lap in the race
  • who wins the constructor at the end of the year
  • explore the effect of factors such as Constructor/team membership, weather, home circuit advantage, age/years of experience of driver, qualifying position, etc on the outcome of the race

Hypothesis and Assumptions

  1. Some Formula 1 constructors (teams) have a tendency to win championships more than others

  2. The weather plays a crucial factor to the result of a race. Some drivers drive their best in inclement weather, while others excel in perfect weather conditions.

  3. Starting the race towards the front of the grid (pole position), increases the odds of winning the race.

  4. Home circuit advantage increases the odds of winning the race.

  5. Driver's age and years of experience in F1 affects winning races. That there is an optimum age for winning races and that this is not just a linear relationship.

Goals and success metrics

The main goal is to predict the winner of a 2021 season race, based on past racing data.

There are a few other things I would also like to explore, as I have specified in the hypothesis above.

Another thing I am also curious about is the comparison of another machine learning algorithm to solve the same problem. Would it be worse or better off?

Risks or limitations

This Capstone Project's goal is to be able to apply the learnings of this course to create a model using an appropriate machine learning algorithm.

Since I am a software developer by profession, I would really like to productionise this project:

  • to be able to expose the prediction functionality through an API, or if possible the ability to infer the predictions with out needing an internet connection (inferring in the client)

  • to be able to show the functionality through a website or a mobile app

  • to be able to create a data pipeline so that when the new results are ready (there are races all thoughout the year), the system can collect the latest race data and update the model to be used for future predictions.

Dataset

Ergast Motor Racing has been publishing these Formula 1 results from 1950 up to the present. Majority of my data set will be from this API.

I will also be scraping some data from the following sites:

  • Chicane F1 - Since 1997 this website has been publishing F1 Race statistics and may have some data that is missing in the Ergast API
  • Wikipedia - Weather information is missing in the Ergast data set and this can be scraped from Wikipedia
  • World Weather Online - some weather information is also missing from Wikipedia, so we can use WWO as a backup
  • F1 Metrics - We are not really using any dataset from F1 Metrics, however, the author of this blog had so many past predictions and analysis, that I felt it important to consider his domain knowledge as I develop the machine learning models in this project. It is a shame that his blog updates are few and far between, however when he does, it's gold.

f1-predictor's People

Contributors

jaeyow avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.