This course challenged my abilities as a data professional by introducing many different concepts across the 24 weeks. The tasks presented in each module were thorough and complex, with detailed instructions on how to apply the material in practical ways. The tempo of the program and the amount of information being presented were challenging, making focus and time management critical factors in completing the program successfully. However, the results of all my efforts were well worth it when it was time to apply all the skills I learned in the final project.
For the final project, I worked with Nicole and Luis. We teamed up based on our admiration for each other's skills demonstrated during the boot camp. Nicole is an exceptionally talented and hard-working student and data professional, whose engagement in the classroom was a flag for me to approach her for the final project. Her involvement extended to the deployment of the database and organizational structure of the data, and the project would not have been as accurate without her. Luis was also an active participant in this boot camp, and his practical approaches to implementing tools into the analysis were invaluable. Overall, I would give my team an A+!
The final project addressed the topic of box office revenue prediction, using data from the TMDB dataset sourced from Kaggle. The dataset was intentionally messy and filled with dense columns, making EDA and data cleaning time-intensive and extremely challenging. However, once the data was cleaned and sorted, the machine models (random forest, xgboost, and lightgbm) were not as difficult to implement. The results of the analysis showed that actors have more impact on revenue than budget, and internet presence can generate three times more revenue. Despite all the data provided in the data frame, there was a 10% margin of error in the models, which translated into a mean revenue discrepancy of $41M-$43M.