Machine Learning with Spark MLlib is taken up as a part of the UE19CS322 Big Data course at PES University.
This simulates a real world scenario where you will be required to handle an enormous amount of data for predictive modelling. The data source is a stream and application faces the constraint of only being able to handle batches of a stream at any given point in time.
Enron Email Spam Detection:
Each record consists of 3 features - the subject, the email content and the label.
Each email is one of 2 classes, spam or ham
30k examples in train and 3k in test.