Flight analytics and cancellation prediction with sparklyr and pyspark
This project is for the end-to-end ML at Scale workshop. It creates an API that can predict the likelihood of a flight being cancelled based on historic flight data. The original dataset comes from Kaggle. The workshop shows both the pyspark and sparklyr implementations and covers:
- Data Science and Exploration
- ML Model Building
- ML Model Optimisation
- ML Model Training
- ML Model Serving
Related Content http://blog.cloudera.com/blog/2017/02/analyzing-us-flight-data-on-amazon-s3-with-sparklyr-and-apache-spark-2-0/