Giter Site home page Giter Site logo

claude-hanfou / death_analytics_using_cdc_data_with_machine_learning Goto Github PK

View Code? Open in Web Editor NEW

This project forked from aliciaaperez/death_analytics_using_cdc_data_with_machine_learning

0.0 0.0 0.0 56.17 MB

Death Analytics using CDC Data with Machine Learning

CSS 9.75% HTML 5.95% JavaScript 0.82% Python 0.83% Jupyter Notebook 81.90% Pug 0.39% SCSS 0.37%

death_analytics_using_cdc_data_with_machine_learning's Introduction

Leading Causes of Death in the US

alt text

Goal

To analyze the distribution of death in the United States and investigate the trends at a micro level for each state.

Heroku App

The deployed app can be viewed here https://death-machine.herokuapp.com/

Background

According to data from the National Center for Health Statistics (NCHS), which is overseen by the Center for Disease Control (CDC), one of the leading causes of death in the United States is heart disease. The CDC provides data for each recorded death that occurred on the US territory, including the causes of death, the age adjusted death rate, the location etc. For this project, we selected we selected among other datasets, the occurred age-adjusted death rates for the 10 leading causes of death in the United States. The objective of this project is to analyze the causes of deaths distribution across the leading states in the US and point out any trend that might among a certain demographic. In addition, we also want to be able to predict the death of an individual by using machine learning to create a model capable of understanding our data and rendering the information needed. The analysis also covers at a very micro level what variables significantly affect life expectancy in the United States. Verified reporting of this data starts as early as 1999.

Americans die each year and the leading causes of death account for a large portion of mortality. This project aims at providing a visual representation of what the leading of causes of deaths are for Americans and which states have the highest number of deaths and what is the cause of death. Additionally, analysis also take a look at factors such as age and population size for each state. The main purpose of the Heroku app is to provide an informative and straightforward representation of data on the leading causes of death to not only to educate but also to make people interested.

Technologies Used

  • Pandas
  • Flask
  • SQLAlchemy
  • Postgres
  • HTML
  • CSS
  • Bootstrap
  • Heroku
  • Google collab

By informing the common American about the leading causes of death in the United States, the hope is that more people will realize that there remains a high demand for research and support for preventative measures. For those who want more statistics and general information on American health, these are useful resources:

Data Sources:

The primary source of data for this objective comes from the NCHS. This dataset contains information for the top 10 leading causes of death, and was used to identify heart disease as the leading cause.

ETL Process

  • Pyspark was utilized to clean data and create data frames
  • Dataframes were connected to SQL using Postgres and a Death Database was created

Data Routes

  • Flask was used to created connection to the Postgres database
  • Routes were used to query the database and create a dictionary

Website Design

  • Deployment:
  • Bootstrap was used to create a theme for the page

Analysis

Tableau

Ultimately, the objective is to identify the leading cause of death in the United States since 1999, and then identify the states with the highest deaths recorded. We used tableau to analyze the data and plot different visualizations as shown below. This analysis was broken down as see below.

Age Adjusted Death Rate

Northern states like Minnesota and Dakota have lowest the age adjusted death rates. Southern states (Louisiana, Kentucky) have the highest heart disease death rates over the years and Wyoming is among the highest with suicide rate on the national level.

Leading causes of Death

As can be seen below, the leading cause of death in the United States is heart disease, close behind is cancer followed by unintentional injuries.

  • Heart Disease
  • Cancer
  • Unintentional Injuries
  • Chronic Lower Respiratory Disease (CLRD)
  • Stroke
  • Alzheimer’s Disease
  • Diabetes
  • Influenza & Pneumonia
  • Kidney Disease
  • Suicide

Top 10 leading causes of Death

The leading cause of death in the United States is heart disease and remained at the first position throughout the years. There are little changes in position between the other causes, but in general, these are the top 10 ranked: Heart Disease, Cancer, Unintentional Injuries, Chronic Lower Respiratory Disease (CLRD), Stroke, Alzheimer’s Disease, Diabetes, Influenza & Pneumonia, Kidney Disease, Suicide

Machine Learning

Heart Disease Prediction

Predict whether a patient should be diagnosed with Heart Disease. Examine trends & correlations within our data. Determine which features are most important to Heart Disease diagnosis. We used Machine Learning algorithm where we can train our AI to learn & improve from experience.

alt text

Cause of Death Prediction

The Machine Learning for the cause of death utilized 2015 causes of death data from the CDC. The data initially had over two million rows of data, but there were over 3,000 causes of death utilized. Some causes of death only had one entry, and they would not be very helpful. As a starting point, only the deaths with over 10,000 entries were utilized. That was 60 causes of death. Those were further grouped together by their kind. For example, lung cancer and breast cancer were listed as separate entities. Eventually, the data was whittled down to ten causes of death for this project. More could be done later once more refinement is gained with the machine learning model with using so much information.

The following ten ICD codes were used (in order of frequency, with most frequent at the top and decreasing).

  1. I251: Heart Failure, Heart Attack, Heart Disease
  2. C349: Cancer
  3. J449: COPD
  4. F03: Dementia
  5. G309: Alzheimer's Disease
  6. J189: Pneumonia or other Lung Disease
  7. A419: Sepsis
  8. E149: Diabetes
  9. G20: Parkinson's Disease
  10. X44: Accidental Poisoning By and Exposure to Drugs and Other Biological Substances

Keras was used to make the machine learning model. It was converted into a tensorflowjs file to be used in javascript to be brought to the website. After the data had been cleaned up and ready to be put into the model, there were 962,411 rows of data. Roughly 721,808 were used for training and 240,603 for testing. 7 layers were used for this, and it was steadily improving with more layers.

However, the model pushed to the project was not Keras. Due to issues with deployment to the flask app, another model was used to at least make the project function. A random forest model was used to make it work for the app.

The accuracy rate on that model is fairly low, so it cannot be taken as a good accurate model. With more time, the project will be updated to go back to the original model and finesse to make it better and get it working on the app.

image

The prediction model requires education, gender, age, marital status, race, and whether a person is Hispanic as inputs to use the model. The original data also included other medical issues the person who died had. It is for future intention to break down that information and add it to the model and make it more precise.

death_analytics_using_cdc_data_with_machine_learning's People

Contributors

aliciaaperez avatar claude-hanfou avatar kcknguyen avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.