Giter Site home page Giter Site logo

employee-retention-detection's Introduction

Employee-Retention-Detection

Flow of the Project:

  1. Dataset obtained from Kaggle.
  2. Analysed the data and filter the data from the dataset.
  3. Identify the N/A values if any (if N/A values are present then perform the imputation methods)
  4. Perform class imbalance handling methods if required.
  5. Split/Partition the data into Training and Testing Sets.
  6. Implement the training algorithm.
  7. (Optional) Use Train Control for the above trainig algorithm.
  8. Perform hyper-paramater tuning.
  9. Implement feature Selection for more optimised results.
  10. Predict the results from the testing set.
  11. Calculate and obtain the model evalution paramters such as Accuracy, precision,....
  12. Draw the results in tabular format.

Class imbalance handling methods used in the project are:

  1. Random over sampling
  2. Random under sampling
  3. Ovum over sampling
  4. Ovum under sampling
  5. ROSE

Algorithms Implemented in the project are:

  1. Decision Tree
  2. Logistic Regression
  3. Random Forest
  4. SVM
  5. kNN
  6. Naive Bayes

Dataset/Data Splitting methods used for each datasets are:

  1. 75% training and 25% testing
  2. 80% training and 20% testing
  3. 70% training and 30% testing
  4. K-Fold
  5. K-Fold within 75% training and 25% testing

Feature Selection methods implemented for each mentined algorithms are:

  1. Correlation-based Feature Selection (CFS)
  2. Information Gain
  3. Random Forest based importance score
  4. Backward Elimination
  5. Least Absolute Shrinkage and Selection Operator (LASSO)
  6. Recursive Feature Elimination (RFE)
  7. Chi Square

File Structure

  1. Over sampled and both sampled folders contains all the algorithms applied in the above project with the sub-folders (named after the algorithm implemented for the sub sequent files).

"DetailedReport.docx" and "Detailed Report.pdf" contains all the minor details and results (parameters: Accuracy, Precision, Specificity and Sensitivity) after applying various algorithms (such as logistic regression, decision tree, etc) and feature selection techniques with combination of various data splits.

It is important to read the "Dataset Info.txt" file is contains most valueable results drawn from the original dataset and subsequent datasets obtained after class imbalance handling.

Scripts Folder

This folder contains the folder named as:

  1. Decision Tree
  2. Logistic Regression
  3. Random Forest
  4. SVM
  5. kNN
  6. Naive Bayes

These folders contain files named after the name of the feature selction technique used..

Q) What is Class imbalance handling ?

Ans. In simple words Class imbalance handling is the ways to increase or decrease the rows or data so that the output number of 'Y' values of "yes" and "no" become nearly similar to get unbiased results.


Q) How unbiased results ?

Ans. If the model is trained on the data where the output is maximum number of times "no" then the predicted output for "yes" may be predicted sometimes inaccurate or incorrect. Similarily is when the model is trained on the data where the output is maximum number of times "yes". So, it is recommended to perform class imbalance handling in such cases to get unbiased results.

employee-retention-detection's People

Contributors

pranav-patel-123 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.