Final project for Practical Data Science (15-388) at Carnegie Mellon University, created by Eric Lee, Derek Young, and Kevin Yang.
We retrieved NBA game data from 1946-2016 to create a model for predicting the result (W/L) of an NBA game. We tried a variety of features, including the altitude of the court, whether the game was a back-to-back, rolling win percentage, in addition to other standard features. Out of the three models we tried (Linear Regression, Logistic Regression, and Support Vector Machine), Linear Regression performed the best (1% better than Naive approach on holdout set). In the future, we would like to consider more player-specific data to improve our model.
Please see NBA_Game_Predictions_Final_Report.ipynb for all of our data processing, exploratory data analysis, and final results.