- Jacob Darmofal
- Vincent Elequin
- Tamica Grant
- Isidore Lozano
- Using CDC survey data from 2000 to 2020 of 400,000 adults.
- We intend to collect and analyze the data to build machine learning models to understand which characteristics (gender, age, race/ethnicity, etc.) to predict heart disease risk factors.
- Using Pandas, Matplotlib, and PostgreSQL to visualize mortality predictions.
- Determine if there are high risk populations geographically
- What health factors will have a higher chance of heart disease?
- Do smokers have a higher risk of having heart disease?
- What age category is more expected?
- Which feature(s) in our dataset impact the possibility of heart disease?
- heart_disease
- bmi
- smoking
- alcohol consumption
- history of stroke
- physical health
- mental health
- gender
- age
- race
- diabetic
- physical activity
- general health
- hours of sleep
- asthma
- kidney disease
- skin cancer
- Personal Key Indicators of Heart Disease | Kaggle
- Heart Disease Stroke Prevention Vital Statistics from CDC (https://chronicdata.cdc.gov/
- Heart-Disease-Stroke-Prevention/National-Vital-Statistics-System-NVSS-National-Car/kztq-p2jf)
- The above databases are made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/
- Microsoft Bing for pics https://www.bing.com/images/create
- ChatGPT (www.openai.com)
- Neural Networks https://scikit-learn.org/
- Imbalanced Learn https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.-
- SMOTE https://imbalanced-learn.org/stable/references/generated/imblearn.over_sampling.SMOTE.html
- Tableau https://www.tableau.com
- Tableau story published to: https://public.tableau.com/app/profile/tamica.grant/viz/HeartDiseaseMap/DiseasesoftheHeartAnalysis?publish=yes
- Research data sources
- Create a front-end interface to JSON or csv file to “smarten” the algorithm.
- Perform a deep dive with existing data using machine learning.
- To predict mortality rates from data available - Predicting and diagnosing illnesses
- Create a visualization that continues to learn where clusters lie based on ML (use Leaflet or - Plotly to change the visualization).
- Identify population areas that have the highest mortality rates - Developing stronger prevention strategies.
- Create an idea using mock data and simulate how machine learning might be used.
- Create an analysis of existing data to make a prediction, classification, or regression.
- Initial GitHub repository and set-up of Google slides.