Giter Site home page Giter Site logo

thealexsamexe / iris-dataset-classification Goto Github PK

View Code? Open in Web Editor NEW
1.0 1.0 0.0 442 KB

This project is based on Iris Classification using Support Vector Machine (SVM) algorithm

Jupyter Notebook 100.00%
machine-learning machinelearningprojects support-vector-machines iris-classification iris-dataset sideproject

iris-dataset-classification's Introduction

Python Type Type Status

Install

This project requires at least Python 3.1 and the following Python libraries installed:

Data

The dataset used in this project is included as iris.csv. This dataset is a freely available on the UCI Machine Learning Repository. This dataset has the following attributes:

Features

Features: sepal-length, sepal-width, petal-length, petal-width

Target Variable

Target: class

Data Visualization

For univariate plot, a box and whisker plot and a histogram was plotted.

The preliminary results were obtained via plotting the dataset on Box and Whisker plot.

alt text

To obtain the frequency of the range of different features based on numerical data, a histogram was plotted and results were visualized in it.

alt text

Furthermore, to understand correlate the histogram with the data more, a scatter matrix was plotted.

alt text

Violin plot was used for checking the comparison of variable distribution between features

alt text

alt text

alt text

alt_text

For multivariate plot, a pair plot was obtained to understand the best set of features to explain a relationship between two or more features so that choosing a Machine Learning Algorithm can become easier and data analysis can be satisfactory.

alt text

However, the problem faced was the visualization of the correlation numerics which was solved by plotting a heatmap.

alt text

Algorithmic Evaluation

Before going on to this step, the train_test_split() was applied, resulting in the data being splitted into 70% for training and 30% for testing. So, in this project, the rule of thumb was 70-30.

This step was approached by testing the six types of Machine Learning Algorithms such as Logistic Regression (LR), Linear Discriminant Analysis (LDA), K-Nearest Neighbors (KNN), Classification and Regression Trees (CART), Gaussian Naive Bayes (NB) and Support Vector Machines (SVM).

The obtained evaluation data for the testing data were as follows by the 6 Machine Learning algorithms: LR: 0.934545 (0.071789) LDA: 0.971818 (0.043112) KNN: 0.952727 (0.062430) CART: 0.953636 (0.046435) NB: 0.935455 (0.058698) SVM: 0.980909 (0.038236)

To make things easier to understand, a box and whisker plot was constructed to get the big picture between performance of the six different algorithms.

alt text

Evaluation of Predictions

After testing the model and measuring the accuracy score, the accuracy score was measured as 97.7% or 98% if rounded-off. However, the classification report measured the macro and micro average of precision, f1-score and recall to be 98%.

iris-dataset-classification's People

Contributors

thealexsamexe avatar

Stargazers

 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.