A course project for University of Oulu, Big data processing and applications (2021).
- Leveraged PySpark to process +700 terabytes of Github Archive data.
- Trained a machine learning model that predicts the sentiment of any given text using a dataset of 10M tweets and Spark MLlib.
- Analyzed the sentiment of developers across Github communities and programming languages. (Read the report!)