This exercise illustrates Python/AWS application that reads hit-level data files as input and helps us understand revenue sources for their products with respect to a search engine and search keywords. This application runs and creates infrastructure with AWS Cloudformation and uses event-driven serverless computing Lambda function and Glue Job Services to process input file and give output with a revenue source for their products.
- AWS CLI - Require to perform operations using local machine.
- AWS S3 bucket - Need to store files in zip format to load in Lambda function and Glue job.
deployment_template.yml - AWS CloudFormation template to provision infrastructure as a code to run this application (Create IAM roles, S3 buckets (input and output) , Glue ETL Job, Lambda event based Function, and Permission to access lambda and glue).
src/app.py - Python code to process AWS Glue job.
src/requirements.txt - Dependencies to run ETL/AWS Glue job.
lambda_function.py - Python code (lambda function) to trigger ETL (AWS glue job).
lambda.zip - Compressed ZIP file containing lambda function.
- Git clone the repository
https://github.com/SourabhShrivas/Data_Engineering_Coding_Exercise
- Upload the lambda function to an S3 bucket
aws s3 cp lambda.zip s3://s3-adobe-repository/
- Upload the ETL (Glue Job app.py on application repository S3 bucket - (s3-adobe-repository)
aws s3 cp src/app.py s3://s3-adobe-repository/etl/app.py
- Trigger the cloudformation stack with deployment_template.yml template file.
aws cloudformation deploy --template-file deployment_template.yml --stack-name infrastructure --capabilities CAPABILITY_NAMED_IAM
- Run the process by uploading the input file in the Input S3 bucket (inputs3bucket-adobe).
aws s3 cp /Users/soura/Downloads/data.tsv s3://inputs3bucket-adobe/
- Monitor jobs -
6.1 - AWS Lambda > ColudWatch > Log Groups
6.2 - AWS Glue Job > AWS Glue Studio > Monitoring \
Download the output from S3 bucket (outputs3bucket-adobe) and then empty input and output buckets and than delete the cloudFormation stack.
- Empty all the s3 bucket creted by cloudformation infrastructure as code.
aws s3 rm s3://inputs3bucket-adobe --recursive
aws s3 rm s3://outputs3bucket-adobe --recursive
- Deleting the stack
aws cloudformation delete-stack --stack-name infrastructure