Giter Site home page Giter Site logo

pranshu-raj-211 / ml-template Goto Github PK

View Code? Open in Web Editor NEW

This project forked from eddiepease/ml-template

0.0 0.0 0.0 16 KB

General purpose framework for building and tracking local machine learning models

License: MIT License

Shell 1.45% Python 98.55%

ml-template's Introduction

Machine Learning Template

This repository helps to solve two problems in local model development for data scientists:

  • A lot of time is often spend in setting up the coding framework to run your machine learning model (e.g. splitting train/test data, training/evaluating model etc). This repo sets an object-orientated, easy-to-understand code base which can be adjusted accordingly to your specific problem.
  • Versioning your model and tracking your model performance with different inputs can be challenging. This repo uses dvc (to version the data) and mlflow (to track model performance).

Getting started

To try out the package, follow the steps below:

  • Clone this repository to local machine
  • cd in folder root
  • chmod +x setup.sh to make bash file executable
  • ./setup.sh to run executable - this installs a virtualenv and downloads relevant data

Then add your data!

Starting to use dvc

DVC stands for 'data version control' - it is the git for data. To start using dvc, follow the below steps (for advanced usage, see the website):

  1. cd to folder root
  2. source venv/bin/activate - activating virtual environment
  3. dvc init - creates a '.dvc' directory to initial the dvc package
  4. dvc add data/train.csv - creates a .dvc file (a placeholder for the original data), a small text file in a human-readable format
  5. git add data/train.csv.dvc & git commit -m "Add raw data" - this adds the data to git

Whenever the data changes, simply run the dvc add [file] command and commit to save into version control.

Starting to use mlflow

MLFlow helps to track and log machine learning experiments. To start using mlflow, follow the below steps:

  1. Adjust the code as necessary to fit your data/problem (e.g. you might need to do some data transformation / want to use a different ML model / use a different metric)
  2. mlflow ui into the root directory of the project
  3. In main.py, select the evaluation method (run_single or run_cv)
  4. When ready to record a run, set mlflow_record=True and enter an experiment name (each experiment can have multiple runs associated with it)
  5. Navigate to the UI (localhost:5000) and see the run recorded

Here are some top tips for tracking your model using MLflow, based on this blog post:

  • If you are debugging, then set the variable ml_flow_record to false. This ensures that you only record the runs which are significant, aiding model analysis
  • Before you record a run in MLFlow, make sure you commit the code (via Git) and the data (via dvc). This ensures you have an accurate record in MLFlow on what code and data produced the model.
  • When experimenting with new data / new approach, create a fresh branch in git. This uses the power of this framework to make it very easy to compare / switch back to existing setup
  • Use experiments in MLFlow to group together runs

Files

This repo contains the following files:

  • src/read_data.py - files which reads in the data. The code currently works for data in a format detailed in data/data_structure.txt
  • src/transform.py - a file for all your necessary data transformations to get the data into a form that can be consumed by ML model. This might involve filling in missing data, encoding categorical variables etc
  • src/ml.py - a file that contains the machine learning code, including classes which train the model, evaluate the model and predict on the test data
  • main.py - file from which to run the source code

Contributing

Please do contribute to improve the repository. If you have an issue with the current code/documentation, do open an issue here

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.