Giter Site home page Giter Site logo

mschlei-48 / titanic-data-exploration-preparation-and-modelling- Goto Github PK

View Code? Open in Web Editor NEW
0.0 1.0 0.0 1.24 MB

Titanic Dataset Exploration, Visualization and Modelling with Logistic Regression.

Jupyter Notebook 100.00%
feature feature-selection featureimportances logistic-regression machine-learning visualization

titanic-data-exploration-preparation-and-modelling-'s Introduction

Titanic-Data-Exploration-Preparation-and-Modelling-

Project Overview

  • The famous Titanic dataset was used for exploration,preparation and modelling with Logistic Regression model.

  • Four feature selction techniques were used to select the best features to include in the model, namely :

    1. RFE(Recursive Feature Elimination)
    2. Decision Trees Feature Selection
    3. Correlation Analysis
    4. Coefficient Feature Importance(Using Logistic Regression)
  • The model obtained an accuracy of 100% using pre-processing required by Logistic Regression, which included :

    1. Removing outliers.
    2. Removing mutlicollinearity - The model asssumees that the feature variables are not correlated with each other. Highly correlated features should be removed.
    3. Asserting linear assumption - Feature variables need to have a linear relationship with the target variable. A log transformation is used to assert that relationship if it is not present.
    4. Asserting normal distribution - Feature variables need to hae a normal distribution. If they are not normally distributed a log transform or BoxCox is used to assert the distribution.
    5. Feature scaling - The features must be scaled as they might not be habing the same range of values, therefore redulting in features with high numbers dominating the model and appearing to be more important than other variables. Feature scaling helps us scale them to the same range and tehrefore give each feature a chance to equally contribute to the model.

    Sample Figures from Exploratory Data Analysis and Data Preparation

SampleImage Figure6 Figure2

Model Classification Report

ClassificataionReport

titanic-data-exploration-preparation-and-modelling-'s People

Contributors

mschlei-48 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.