Giter Site home page Giter Site logo

challenge-classification's Introduction

Challenge Bearing Classification

Description

This was an assignment we received during our training at BeCode.
The main goal was to get used to Machine Learning, specifically using classification algorithms.
For this we used a database from Kaggle, on testing bearings.
Our job was to predict if a bearing was defective or not, with an accuracy as high as possible.

Installation

Python version

  • Python 3.9

Databases

Github has a 100MB file limit, so the files can't be found in this repository.
Below however, you can find the links to both files.

Packages used

  • pandas
  • numpy
  • matplotlib.pyplot
  • seaborn
  • sklearn

Usage

File Description
main.py File containing Python code.   
Used for cleaning and feature engineering the data
plots.py File containing Python code.  
Used for making some explanatory plots for this README.
utils/model.py File containing Python code, using ML - Random Forest.  
Fitting our data to the model and use to it make predictions.
utils/manipulate_dataset.py File containing Python code.
Functions made for ease of use in a team enviroment.
utils/plotting.py File containing Python code.
Used for getting to know the data.
Made plots to find correlations between features.
csv_output Folder containing some of the csv-files we used for our coding.
Not all of our outputted files are in here,  
since Github has a file limit of 100MB.
visuals Folder containing plots we deemed interesting and helped us gain
insights on the data.

Feature engineering

Column name of feature Change made Reason
timestamp Only keeping rows above 0,25 We found some outliers where the "rpm" and "hz" values spiked in the first parts of the test. 
With the use of plotting, we discovered a cut off point.

Column name of feature Change made Reason
timestamp Only keeping rows equal to or below 1,5 We found that the biggest differences between it being a bad or good bearing, could be found in the first parts of the test.
With the use of plotting, we discovered a cut off point.

Column names of feature Changes made Reason
a1_x 
a1_y 
a1_z 
a2_x 
a2_y 
a2_z
For every "experiment_id", took the mean of every column mentioned.
For every row, changed the value in every column mentioned to its mean.
The model had an easier time of fitting and was still able to make accurate predictions with these changes.

Visuals

Machine used to gather the data on bearings

Plot showing the min-max-difference of every axis, on every bearing.

Plot that gave us the idea to look into the first seconds.

Plot that showed possible clusters

Ready for future exploration

Contributors

Name Github
Patrick Brunswyck https://github.com/brunswyck
Jose Roldan https://github.com/Roldan87
Matthew Samyn https://github.com/matthew-samyn
Maarten Van den Bulcke https://github.com/MaartenVdBulcke

Timeline

29/07/2021 - 03/08/2021

challenge-classification's People

Contributors

roldan87 avatar maartenvdbulcke avatar matthew-samyn avatar dadude avatar

Watchers

Brunswyck avatar  avatar

Forkers

ltadrummond

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.