Giter Site home page Giter Site logo

decision-tree-project-tutorial's Introduction

Decision Tree Project Tutorial

  • The goal of this project is to classify patients having diabetes or no, based on their diagnostics.
  • Make some basic exploratory analysis and prepare the data for modeling.
  • Use decision trees and hypertune your model. This folder has a solution guide in case you get stuck.
  • Don't forget that you are creating software, so build pipelines in the app.py

๐ŸŒฑ How to start this project

You will not be forking this time, please take some time to read this instructions:

  1. Create a new repository based on machine learning project by clicking here.
  2. Open the recently created repostiroy on Gitpod by using the Gitpod button extension.
  3. Once Gitpod VSCode has finished opening you start your project following the Instructions below.

๐Ÿš› How to deliver this project

Once you are finished creating your decision tree model, make sure to commit your changes, push to your repository and go to 4Geeks.com to upload the repository link.

๐Ÿ“ Instructions

Predicting Diabetes:

This dataset is originally from the National Institute of Diabetes and Digestive and Kidney Diseases. The objective is to predict based on diagnostic measurements whether a patient has diabetes.

Data Dictionary:

  • Pregnancies: Number of times pregnant

  • Glucose: Plasma glucose concentration a 2 hours in an oral glucose tolerance test

  • BloodPressure: Diastolic blood pressure (mm Hg)

  • SkinThickness: Triceps skin fold thickness (mm)

  • Insulin: 2-Hour serum insulin (mu U/ml)

  • BMI: Body mass index (weight in kg/(height in m)^2)

  • DiabetesPedigreeFunction: Diabetes pedigree function

  • Age: Age (years)

  • Outcome: Class variable (0 or 1), Class Distribution: (class value 1 is interpreted as "tested positive for diabetes").

Source:

(a) Original owners: National Institute of Diabetes and Digestive and Kidney Diseases (b) Donor of database: Vincent Sigillito ([email protected]) Research Center, RMI Group Leader Applied Physics Laboratory The Johns Hopkins University

Step 1:

Go to the following online dataset (https://raw.githubusercontent.com/4GeeksAcademy/decision-tree-project-tutorial/main/diabetes.csv) and download the data.

Save it in your project's 'data/raw' folder. Time to work on it!

Step 2:

Use the explore.ipynb notebook to find patterns and valuable information that will help on your cleaning process.

Don't forget to write your observations.

Use the app.py to create your cleaning pipeline. Save your clean data in the 'data/processed' folder.

Step 3:

Now that you have a better knowledge of the data, in your exploratory notebook create a first decision tree model with your clean data.

Step 4:

Change your decision tree to use 'entropy' as a criterion.

Step 5:

Hypertune your model using GridSearch to find the best hyperparameters.

Train your model with the optimal hyperparameters.

Again use the app.py to create your final machine learning model.

Save your final model in the 'models' folder.

In your README file write a brief summary of your cleaning and modeling process.

Solution guide: https://github.com/4GeeksAcademy/decision-tree-project-tutorial/blob/main/solution_guide.ipynb

decision-tree-project-tutorial's People

Contributors

alesanchezr avatar danielaaz04 avatar lorenagubaira avatar tommygonzaleza avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.