Processing NASA webserver logs with pyspark on AWS EMR. Tableau dashboard is created to generate insights from processed data.
Public URL --> https://public.tableau.com/profile/surjit.singh4117#!/vizhome/Nasa_Log_Analytics/LogAnalytics?publish=yes
- Create S3 bucket, unzip and upload input data files also nasaLogAnalytics.py to bucket. Change the python file according to your s3 bucket name. See below picture for S3 folder structure:
- Create EMR cluster and execute
aws s3 cp s3://<your bucket name>/code/nasaLogAnalytics.py .
to copy the code from s3 to emr. - Run
spark-submit nasaLogAnalytics.py
to execute the code and later consume output files from S3.