dse511-project3-team / dse511-project-3-code-repo Goto Github PK
View Code? Open in Web Editor NEWGit repository to manage the machine learning learning project-3 for DSE511.
Git repository to manage the machine learning learning project-3 for DSE511.
Upload the project proposal to the doc folder.
Create a new folder called "EDA" under the docs folder, which will carry all the images that we generated for our EDA.
We need to update the raw data generation script to directly download data from a web URL. This will help streamline the process.
Consolidate the preprocessing into one .py file for easier use.
Tune Adaboost to find the best hyperparameter over the validation dataset.
Create a new branch of project 3 for the team member 'Eonyeon Jo'.
Finding the factors that carry the greatest significance.
Evaluate Adaboost on the test dataset.
Tune XGBoost to find the best hyperparameter on the validation dataset.
Conduct analysis of variance by city to determine a trend in accident rate by city.
Use KNN and OLS to impute missing weather variables.
Russ is on this!
I will have the final result either end of today or tomorrow morning and we can go over the changes I made and how to work with the compressed version. Our goal is to trim it down a lot from 1.3 Million observations to around 75 thousand. We will only use data from certain US Cities.
Infrastructure Analysis will conducted in this issues to preprocess the following variables:
'Traffic_Signal', 'Crossing', 'Station','Amenity', 'Bump', 'Give_Way', 'Junction', 'No_Exit', 'Railway', 'Roundabout','Stop', 'Traffic_Calming', 'Turning_Loop'.
Create a distribution of accident severity by city (to accompany ANOVA)
Create correlation matrices to check for multi-collinearity.
Create a branch for project 3 unique to group member Russ.
Add a detailed readme, providing more details about the project.
In this issue, we plan to preprocess the following variables:
'ID', 'Severity', 'Start_Time', 'End_Time', 'Distance(mi)', 'Description', 'Start_Lat', 'Start_Lng', 'End_Lat', 'End_Lng', 'Number', 'Street', 'Side', 'City', 'County', 'State', 'Zipcode', 'Country', 'Timezone', 'Airport_Code'
The goal is to impute the missing values and to drop the irrelevant columns.
Evaluate XGBoost over test dataset
Create project proposal on overleaf and share with the team.
Complete project proposal on google docs based on project requirements.
Find an ML project template or any suitable template that we can use to manage the code in a standard way.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.