The "End-to-End Text Summarization Project" is a comprehensive initiative focused on developing an NLP-based text summarizer. The project entails creating a robust pipeline that encompasses data ingestion, validation, transformation, model training, and evaluation. Using this pipeline, an app will be developed and deployed on AWS using a CI/CD workflow. The main goal of the project is to leverage this pipeline to automatically generate concise and accurate summaries from lengthy textual content, providing users with a dependable and efficient solution for text summarization.
- Model: t5-small
- Dataset: https://huggingface.co/datasets/samsum
-
Create all the needed files and folders using template.py
-
Create a new virtual environment
-
Install packages using requirements.txt
-
Set up project using setup.py (automatically configured)
-
Update src/constants/init.py
-
Update src/utils/logger.py
-
Update src/utils/exception.py
-
Update src/utils/utils.py
-
Test project code using notebook
-
for each component in components:
- Test component code using notebook
- Update config.yaml
- Update params.yaml
- Update entity/init.py
- Update src/config/config.py
- Update src/components/component.py
- Update src/pipeline/stage_component.py
- Update main.py
-
Update src/pipeline/prediction.py
-
Update app.py
-
Update Dockerfile
-
Update .github/workflows/main.yaml
-
Create App
-
Deploy App
-
Create a new user with the following policies:
- AmazonEC2ContainerRegistryFullAccess
- AmazonEC2FullAccess
-
Create and save the security credentials
- Save the URL of ECR
-
optinal
- sudo apt-get update -y
- sudo apt-get upgrade
-
required
- curl -fsSL https://get.docker.com -o get-docker.sh
- sudo sh get-docker.sh
- sudo usermod -aG docker ubuntu
- newgrp docker
-
github> setting> actions> runners> new self-hosted runner> choose os> run command one by one on EC2
-
check the status of runners: idle -> connected
- github> setting> secrets and variables> actions> new repository secret> create the following parameters
- AWS_ACCESS_KEY_ID
- AWS_SECRET_ACCESS_KEY
- AWS_REGION
- AWS_ECR_LOGIN_URI
- ECR_REPOSITORY_NAME