Giter Site home page Giter Site logo

classification-models-comparison's Introduction

Scikit-learn models comparison and autotune

The project was born in a way of studying Scikit-learn library and came as an idea to automate search for the best estimator for a given dataset and problem. It occurred to me to loop tuning of 10 classification models to find the best estimator in its best configuration, then fit and serialize it in one run. And this as an alternative to manual work in Jupytor Notebook. So, this what this project is about.

Models

  • LogisticRegression(),
  • SGDClassifier(),
  • DecisionTreeClassifier(),
  • RandomForestClassifier(),
  • GradientBoostingClassifier(),
  • ExtraTreesClassifier(),
  • AdaBoostClassifier(),
  • SVC(),
  • GaussianNB(),
  • MLPClassifier()

EDA.ipynb

Exploratory data analysis. An almost clean data set describes several features correlated with the target parameter "income". The "income" variable may take one of the two values - over 50K or less, which makes its prediction a problem of classification. Check for ProfileReport, a powerful tool for EDA, applied in this notebook: https://github.com/ydataai/ydata-profiling. Features standardization and encoding are also here, whereas the resulting 'feed.csv' file is a fully prepared feed for the models.

grid_search.ipynb

This notebook is to make a dictionary with parameters for tuning the models. The resulting dictionary is exported as 'grid_search_params.json'

data/

  • best_model_trained.pkl - the outcome product of main.py
  • feed.csv - EDA.ipynb product
  • grid_search_logs.csv - if you fancy to look at parameters beyond the best accuracy score (e.g. fit_time) and decide for yourself what your best configuration is.
  • grid_search_params.json - grid_search.ipynb product
  • info.log - best scores reached by each model, the very best model, its ultimate score and configuration
  • raw_data.csv - this is where it all started.

main.py

Run it and behold the magic

classification-models-comparison's People

Contributors

ilydkin avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.