sagnik1511 / tabular-automl Goto Github PK

View Code? Open in Web Editor NEW

23.0 4.0 13.0 1.55 MB

Python Auto-ML Package for Tabular Datasets

License: MIT License

Python 100.00%

python3 automl modular-architecture scikit-learn numpy pandas python-package

tabular-automl's Introduction

Tabular-AutoML

AutoML Package for tabular datasets

Tabular dataset tuning is now hassle free!

Run one liner command and get best tuning and processed dataset in a go.

Used Python Libraries :

Installation & Usage

Create a Virtual Environment : Tutorial
Clone the repository.
Open the directory with cmd.
Copy this command in terminal to install dependencies.

pip install -r requirements.txt

Installing the requirements.txt may generate some error due to outdated MS Visual C++ Build. You can fix this problem using this.
First check the parser variable that has to be passed with all customizations.

>>> python -m tab_automl.main --help
usage: main.py [-h] -d  -t  -tf  [-p] [-f] [-spd] [-sfd] [-sm]

automl hyper parameters

optional arguments:
  -h, --help            show this help message and exit
  -d , --data-source    File path
  -t , --problem-type   Problem Type , currently supporting *regression* or *classification*
  -tf , --target-feature
                        Target feature inside the data
  -p , --pre-proc       If data processing is required
  -f , --fet-eng        If feature engineering is required
  -spd , --save-proc-data
                        Save the processed data
  -sfd , --save-fet-data
                        Save the feature engineered data
  -sm , --save-model    Save the best trained model

Now run the command with your custom data, problem type and target feature

>>> # For Regression Problem
>>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "regression" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"

>>> # For Classification Problem
>>> python -m tab_automl.main -d "your custom data scource\custom_data.csv" -t "classification" -tf "your_custom_target_feature" -spd "true" -sfd "true" -sm "true"

Contributing Guidelines

Comment on the issue on which you want to work.
If you get assigned, fork the repository.
Create a new branch which should be named on your GitHub user_id , e.g. sagnik1511.
Update the changes on that branch.
Create a PR (Pull request) to the JWOC branch of the parent repository.
The PR title should be named like this [Issue {Issue Number}] Heading of the issue.
Describe the changes you have done with proper reasons.

This branch will hold all updates of JWOC.

Contributors

Sagnik Roy : sagnik1511

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

Thank You for Visiting :)

tabular-automl's People

Contributors

Stargazers

Watchers

Forkers

tihsrah adi9235 vishnubhaarath kunalchhabra37 snega16 probot1511 kyloren1511 subhrajit91939 sherlock-221bbs palavenkireddy ayushmorbar akansha3001 tmsagarofficial

tabular-automl's Issues

Find and fix bugs

Run the code and find any bugs.
After finding the bug please comment and solve the bug.

Follow contributing guidelines on README.md

Add new loss functions on training

Add 3 loss functions for both regression and classification problem types.
Add them similarly to how the model scores are stored. See here
Add proper comments.
If new functions are needed for the loss functions, store them on tab_automl.utils.training .
Update the requirements if new libraries are being used.

Follow contributing guidelines on README.md

Add new models for regression training

Add new regression models on single_model_dict which is in here.
Update the requirements if the library of the new model is missing from it.
Add reasons why those models are added to the PR.

Follow contributing guidelines on README.md

Load data from different file formats.

The datasets are getting loaded on .csv format only in the codebase, example : see here.

Add different data loading techniques for other formats like .txt , .sqlite, etc.
Add required comments in the code.

Follow contributing guidelines on README.md

Add clustering models inside single_model_dict

What you have to do-
1. Add 5 new clustering models inside tab_automl.automl.models file's single_model_dict object.
2. Follow same code representations.

Follow contributing guidelines on README.md

Add badges

I want to add badges in readme.
please assign under JWOC

Add new models for classification training

Add new classification models on single_model_dict which is in here.
Update the requirements if the library of the new model is missing from it.
Add reasons why those models are added to the PR.

Follow contributing guidelines on README.md

Add new dataset for clustering problems inside datasets folder

Add a new tabular dataset for clustering problems inside tab_automl/datasets and also add the dataset class inside tab_automl/automl/datasets.py.

Add proper comments and quality in code.

Follow contributing guidelines on README.md

Add a new single_model_trainer function for training clustering problems.

What you have to do-
1. Add a new function inside the tab_automl.automl.training.Trainer class for training clustering problems.
2. Inside the main function of tab_automl.main some implementation will be needed so that the clustering problem type fits.

Follow contributing guidelines on README.md

Add a new class "OutlierProcessing" under processing

Prepare a new class under the processing module.
Prepare the functions with a proper idea and also add appropriate comments.
Add a function "run" inside the "OutlierProcessing" which will go through every feature, e.g. link.
Add the function under the class Preprocessing.

Follow contributing guidelines on README.md

Add a parameter of k-fold validation inside training

Add k-fold validation for chosen datasets.
Add appropriate print statements and comments inside the code.
Add all utilities on tab_automl.utils.training
If possible update the parser too with a variable named -kf --k-fold which takes the number of folds. (Optional)

Follow contributing guidelines on README.md

Add a new class "Scaling" under processing.

Prepare a new class under the processing module.
Prepare the functions with a proper idea and also add appropriate comments.
Add a function "run" inside the "Scaling" which will go through every feature, e.g. link.
Add the function under the class Preprocessing.

Follow contributing guidelines on README.md

Add contributors section

want to add contributors section in readme

Update every print statements to f-string

Update all the print statements will f strings for a single file.

Example:

>>> my_name = "Sagnik"
>>> # Not updated
>>> print("Hi, my name is ",my_name)
>>> # Process 1
>>> print("Hi, my name is {}".format(my_name))
>>> # Process 2
>>> print(f"Hi, my name is {my_name}")
>>> # The print statement may not have any variables, still you have to update those.
>>> print("Good Morning!")
>>> # Updated f-string
>>> print(f"Good Morning!")

Follow contributing guidelines on README.md

Update the parser with the new problem type "Clustering"

What you have to do -
1. Update the parser's problem type definitions.
2. Update the tab_automl.utils.misc.validate_parse_variable as it was prepared to check only the problem types of classification and regression.
3. The target variable parser should have a default value None as the clustering problem won't allow any target variable, but keep in mind if the problem type is some supervised technique, then the target_feature should be checked inside .tab_automl.utils.misc.validate_parse_variable function.
4. Also update the README.md where it specifies the problem types.

Follow contributing guidelines on README.md

sagnik1511 / tabular-automl Goto Github PK

tabular-automl's Introduction

Tabular-AutoML

AutoML Package for tabular datasets

Tabular dataset tuning is now hassle free!

Run one liner command and get best tuning and processed dataset in a go.

Installation & Usage

Contributing Guidelines

Contributors

Sagnik Roy : sagnik1511

If you like the project, do ⭐

Also follow me on GitHub , Kaggle , LinkedIn

Thank You for Visiting :)

tabular-automl's People

Contributors

Stargazers

Watchers

Forkers

tabular-automl's Issues

Recommend Projects

Recommend Topics

Recommend Org