The main goal of this project was to create a pipeline for external data ingestion, using services to guarantee the security of the cloud environment: IAM, Lake Formation, S3 and cost alert.
I covered these following aspects of data ingestion pipelines:
- Data extraction (Dataset: https://data.boston.gov/dataset/311-service-requests)
- Data ingestion into bronze layer (bucket s3)
- Lake formation configuration
- Database creation
- Costs and cloudwatch overview
- Roles creation and access permissions
Technologies used: AWS account, S3, Lake Formation, Python, boto3, urllib.request, pandas and io
Project based on Alura course: AWS Data Lake - creating a pipeline for data ingestion