Giter Site home page Giter Site logo

mvp-horse-racing-prediction's Introduction

Hong Kong Horse Racing Prediction

The aim of this project is to predict the outcome of horse racing using machine learning algorithms.

horse_racing

From RaceBets

Dataset

The dataset comes from Kaggle and covers races in HK from 1997 to 2005.
The data consists of 6,349 races with 4,405 runners.
The 5,878 races ran before January 2005 are used to develop the forecasting models whereas the remaining 471 races ran after January 2005 are preserved to conduct out-of-sample testing.

We have an article explaining our journey through this process. You can find a link below:

Documentation

  • requirements.txt: list of requirements needed to run this project
  • baseline_models.ipynb: notebook containing informations for part 1 on baseline models
  • quick_eda_horse_racing.ipynb: notebook with a quick EDA on our dataset
  • create_dataset.py and config.py are both used to split our inital data into train and test sets depending on the date of races
  • extract_features.py is used to perform feature engineering
  • winner/: folder containing all notebooks and ML models to bet on the winner
  • placed/: folder containing all notebooks and ML models to bet on placed horses (the Top 3)

winner folder ๐Ÿ†

Let's have a look about the winner files

  • winner_01_lgbm_optim: runs the hyperoptimization for LGBM
  • winner_02_train: runs all training processes either for LGBM and deep learning then saves results
  • winner_03_show_result: helps us to verify our informations and go deeper about our predictions for a specific month
  • winner_04_all_results: consolidates all months with an ensemble model and shows final results
  • winner_functions.py: contains the required functions to run those 4 previous notebooks
  • model/: contains all saved models from winner_02_train
  • result_hyperopt.csv: file with all our optimizations steps

placed folder ๐Ÿฅ‡๐Ÿฅˆ๐Ÿฅ‰

Let's have a look about the placed files

  • placed_01_train: runs all training processes for deep learning then saves results
  • placed_02_show_result: helps us to verify our informations and go deeper about our predictions for a specific month
  • placed_03_consolidated: consolidates all months with an ensemble model and shows final results
  • placed_functions.py: contains the required functions to run those 4 previous notebooks
  • model/: contains all saved models from placed_01_train and LGBM models from winner_folder

mvp-horse-racing-prediction's People

Contributors

corkof avatar idrissan avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mvp-horse-racing-prediction's Issues

WHERE IS THE STARTER DATA??

WHAT IS THE CREATE DATA FOR...AND HOW TO RUN THE CONFIG AND CREATE DATSET FILES...IN VSCODE..I HAVE PYTHON INSTALLED ADDED TO ENV..AND MY PACKAGES LIKE PANDAS ARE ALSO INSTALLED VIA PIP..CAN SOMEONE HELP PLEASE??

'place' feature

Hi, first and foremost, thank you for the medium article. It has helped me get started with a horse-racing prediction model

df['place'] = list(map(is_place, df['result']))

I have a question regarding your code. From what I understand (on the line above), this 'place' feature is a boolean to track if the horse finished within top 3 in the current race. Wouldn't this potentially be looking forward on the data if included in the dataset?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.