Giter Site home page Giter Site logo

gracengu / multiclass_classification Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 208.37 MB

[Completed] Complete framework on multi-class classification covering EDA using x-charts and Principle Component Analysis; machine learning algorithms using LGBM, RF, Logistic Regression and Support Vector Algorithms; as well as Bayesian Optimizer with l1 and l2 regularization for Hyperparameter Tuning.

License: MIT License

Jupyter Notebook 90.14% HTML 9.81% Python 0.05%
principle-component-analysis lightgbm-classifier random-forest support-vector-classifier bayesian-optimization regularization

multiclass_classification's Introduction

Multi-class Classification

This repo comprise of the codes developed for multi-class classification. The goal of this problem is to use the input variables to correctly classify or predict the target variable which is a multiclass categorical variable. There are a total of 150 input variables in the data.

This repo is uploaded in github to demonstrate the use of agile and git throughout the exercise.

To see the final report: You may download 'Final_Report.html'

Table of Contents

  • 1. About the Project
  • 2. Getting Started
  • 3. Set up your environment
  • 4. Open your Jupyter notebook

Structuring a repository

An integral part of having reusable code is having a sensible repository structure. That is, which files do we have and how do we organise them.

  • Folder layout:
multiclass_classification
├── docs
├── data
├── models
│   └── models_new
├── results
├── src
│   └── analysis
│       └── __init__.py
│       └── analysis.py
|   └── train
│       └── __init__.py
│       └── train.py
|   └── Config.py
├── .gitignore
├── README.md
└── requirements.txt

1. About the Project

The following is the summary of data scienc/ml/deep learning approaches used in this project:

  • Data import and Pre-EDA
  • Data cleaning/ Missing data imputation
  • EDA: Outlier analysis
  • EDA: Statistical/Distribution analysis
  • EDA: Feature Engineering and Selection
  • Modelling - Baseline Model
  • Modelling - Hyperparameter Tuning and Regularization
  • Modelling - Regularization
  • Model Evaluation: Selection of Metrics

2. Getting Started - Clone the repository locally

You may git bash at any preferred folder location and run the following command:

git clone https://github.com/gracengu/multiclass_classification.git

Alternatively, although not recommended, you can download the zip file of the repository at the top of the
main page of the repository. If you prefer not to use git or don't have experience with it, this a good option.

3. Set up your environment

Note: the following instructions are specifically for pip users only.

The best practice is to create an isolated environment to avoid dependency conflicts in python. If this is the first
time you're setting up your compute environment, please first install pip and virtualenv.

python get-pip.py
pip install virtualenv

After installation, set up the virtual environment. If you have multiple python version installed in your PC, you may want to specify the python version you're using.

virtualenv --python=<path of python.exe with a specific version> python3.7_multiclass

To activate your environment, run the activate script from the virtualenv you have created:

python3.7_multiclass\Scripts\activate

Change the path for tensorflow installation:

tensorflow-cpu @ file:///C:/Projects/2021/multiclass_classification/support/tensorflow_cpu-2.6.0-cp37-cp37m-win_amd64.whl

Please install all of the packages listed in the requirement.txt using the following command:

pip install -r requirement.txt

If there is error installing tensorflow, comment out tensorflow from requirements.txt and perform the following commands:

pip install .\package\tensorflow_cpu-2.6.0-cp37-cp37m-win_amd64.whl
pip install -r requirements.txt

4. Open your Jupyter notebook

  1. You will have to install a new IPython kernelspec to run the jupyter notebook in an isolated environment.
ipython kernel install --name python3.7_multiclass --user

You can change the --name to anything you want.

  1. In the terminal, execute jupyter notebook.

Navigate to the notebooks directory and open notebook:

  • Baseline Model: Baseline Model.ipynb
  • Hyperparameter Tuning: Hyperparameter Tuning.ipynb

4. Final Report

The final report is written in jupyter notebook `Final report.ipynb'. Run the notebook from start to end, to generate the html report.

multiclass_classification's People

Contributors

gracengu avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.