Giter Site home page Giter Site logo

tabular-automl's Introduction

Tabular-AutoML

Tabular-AutoML

AutoML Package for tabular datasets

Tabular dataset tuning is now hassle free!

Run one liner command and get best tuning and processed dataset in a go.

Python Git

Used Python Libraries :
lightgbm numpy numpy numpy

Installation & Usage


  1. Create a Virtual Environment : Tutorial
  2. Clone the repository.
  3. Open the directory with cmd.
  4. Copy this command in terminal to install dependencies.
pip install -r requirements.txt
  1. Installing the requirements.txt may generate some error due to outdated MS Visual C++ Build. You can fix this problem using this.
  2. First check the parser variable that has to be passed with all customizations.
>>> python -m tab_automl.main --help
usage: main.py [-h] -d  -t  -tf  [-p] [-f] [-spd] [-sfd] [-sm]

automl hyper parameters

optional arguments:
  -h, --help            show this help message and exit
  -d , --data-source    File path
  -t , --problem-type   Problem Type , currently supporting *regression* or *classification*
  -tf , --target-feature
                        Target feature inside the data
  -p , --pre-proc       If data processing is required
  -f , --fet-eng        If feature engineering is required
  -spd , --save-proc-data
                        Save the processed data
  -sfd , --save-fet-data
                        Save the feature engineered data
  -sm , --save-model    Save the best trained model
  1. Now run the command with your custom data, problem type and target feature
>>> # For Regression Problem
>>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "regression" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"

>>> # For Classification Problem
>>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "classification" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"

Contributing Guidelines


  1. Comment on the issue on which you want to work.
  2. If you get assigned, fork the repository.
  3. Create a new branch which should be named on your GitHub user_id , e.g. sagnik1511.
  4. Update the changes on that branch.
  5. Create a PR (Pull request) to the JWOC branch of the parent repository.
  6. The PR title should be named like this [Issue {Issue Number}] Heading of the issue.
  7. Describe the changes you have done with proper reasons.

This branch will hold all updates of JWOC.

Contributors


  1. Sagnik Roy : sagnik1511

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

Thank You for Visiting :)

tabular-automl's People

Contributors

ayushmorbar avatar kunalchhabra37 avatar palavenkireddy avatar sagnik1511 avatar sherlock-221bbs avatar snega16 avatar tihsrah avatar vishnubhaarath avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

tabular-automl's Issues

Find and fix bugs

Run the code and find any bugs.
After finding the bug please comment and solve the bug.

Follow contributing guidelines on README.md

Add new loss functions on training

  1. Add 3 loss functions for both regression and classification problem types.
  2. Add them similarly to how the model scores are stored. See here
  3. Add proper comments.
  4. If new functions are needed for the loss functions, store them on tab_automl.utils.training .
  5. Update the requirements if new libraries are being used.

Follow contributing guidelines on README.md

Add new models for regression training

  1. Add new regression models on single_model_dict which is in here.
  2. Update the requirements if the library of the new model is missing from it.
  3. Add reasons why those models are added to the PR.

Follow contributing guidelines on README.md

Load data from different file formats.

The datasets are getting loaded on .csv format only in the codebase, example : see here.

Add different data loading techniques for other formats like .txt , .sqlite, etc.
Add required comments in the code.

Follow contributing guidelines on README.md

Add badges

I want to add badges in readme.
please assign under JWOC

Add new models for classification training

  1. Add new classification models on single_model_dict which is in here.
  2. Update the requirements if the library of the new model is missing from it.
  3. Add reasons why those models are added to the PR.

Follow contributing guidelines on README.md

Add a new class "Scaling" under processing.

  1. Prepare a new class under the processing module.
  2. Prepare the functions with a proper idea and also add appropriate comments.
  3. Add a function "run" inside the "Scaling" which will go through every feature, e.g. link.
  4. Add the function under the class Preprocessing.

Follow contributing guidelines on README.md

Update every print statements to f-string

Update all the print statements will f strings for a single file.

Example:

>>> my_name = "Sagnik"
>>> # Not updated
>>> print("Hi, my name is ",my_name)
>>> # Process 1
>>> print("Hi, my name is {}".format(my_name))
>>> # Process 2
>>> print(f"Hi, my name is {my_name}")
>>> # The print statement may not have any variables, still you have to update those.
>>> print("Good Morning!")
>>> # Updated f-string
>>> print(f"Good Morning!")

Follow contributing guidelines on README.md

Update the parser with the new problem type "Clustering"

What you have to do -
1. Update the parser's problem type definitions.
2. Update the tab_automl.utils.misc.validate_parse_variable as it was prepared to check only the problem types of classification and regression.
3. The target variable parser should have a default value None as the clustering problem won't allow any target variable, but keep in mind if the problem type is some supervised technique, then the target_feature should be checked inside .tab_automl.utils.misc.validate_parse_variable function.
4. Also update the README.md where it specifies the problem types.

Follow contributing guidelines on README.md

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.