- Studied about Spark DataFrames and operations to analyze data, Spark ML - MLlib (machine learning library) with pySpark is used to build machine learning models (Linear regression, Logistic regression, Decision trees, K-Means clustering, basic NLP) for various projects and also applied Spark streaming on twitter data via socketing.
• Set up the PySpark on AWS EC2 cloud service for this course.
Here are the some of my Spark projects. Tool set: Python3, findspark, PySpark, Spark MLlib (for machine learning) Topics covered:
- Introduction to Spark
- Basic Operations on Spark dataframes
- Data Analysis with Spark
- Maching Learning with PySpark using MLlib
These projects' solutions can be used as reference while doing Udemy course "Spark and Python for Big Data with PySpark".
- During this whole course these projects are done in AWS-EC2. In case of any questions regarding "set-up on AWS-EC2",feel free to contact.
- My suggestion is to use AWS-EC2 during this course, to gain a wonderful experience with Amazon Web Services(AWS).