This project focuses on creating an end-to-end, rag-enabled News Generation and Summarization Pipeline. The infrastructure will be hosted on AWS and I'll soon provide Terraform templates. I will be using groq, an insanely fast API provider for Open Source LLMs. The free tier provides 30 requests per minute which is enough when compared to the scope of this project.
-
Vector Databases (PGVector, Pinecone, etc.)
-
NDTV RSS Feeds (Non-Commercial use) NDTV provides access to their various RSS Feeds. Below is an RSS Feeds that contains content about Technology:
-
An AWS Account (Note : your cloud expenses might spike up)
-
Groq API Key (create one here)
-
Langchain
-
Discord Webhook
docker build -t groqqer:latest .
Create an AWS RDS Instance by going to RDS Console.
- Select PostgreSQL.
- Select t3.micro as Instance Type.
Go to the Bedrock Console and under base models request access for Titan Multimodal Embeddings Generation 1
- Create an ECR Repository either via console or via AWS CLI
aws ecr create-repository \
--repository-name <repo_name>
--region <region_name>
- Push the Docker Image to your ECR Repository. Refer to this guide on How to push images to an ECR Repository
Open Discord and create a server. Under server, go to settings -> integrations. Create a webhook and select a channel, that would serve as a destination for your webhook.
3.1 Create an ECS Cluster.
![Screenshot 2024-05-28 at 2 00 10 AM](https://private-user-images.githubusercontent.com/100070155/334207075-f139ed63-4308-4b7b-830c-0f905fae4236.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM1MzQyMjksIm5iZiI6MTcyMzUzMzkyOSwicGF0aCI6Ii8xMDAwNzAxNTUvMzM0MjA3MDc1LWYxMzllZDYzLTQzMDgtNGI3Yi04MzBjLTBmOTA1ZmFlNDIzNi5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgxM1QwNzI1MjlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT0zMjJmMjNkODhlN2QyNDAzNzY5ZDY0M2JkMDQ1MjhiYTI3NDVmYjAzOTRmYjNiZTRhM2MyZGVlZWVkMmViNjMxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.XPXe27ONP3IyocXAD7RzFoHku8tmbYB1g-Ip2AGwyq4)
3.2 Create a Task Definition Family.
-
Select AWS Fargate as launch type.
-
Under task size select the following values
- CPU - 2vCPU
- Memory - 4GB
-
Under Task Execution Role:
- Go to the IAM Console and create an IAM Role for AWS ECS - Elastic Container Service Task.
- Attach [AmazonBedrockFullAccess] IAM Policy to your IAM Role
- Now select the newly created role.
-
Under Container - 1, enter your preferred container name and enter the ECR Image URI under Image URI.
-
Leave all other fields as default and click create.
![Screenshot 2024-05-28 at 1 49 24 AM](https://private-user-images.githubusercontent.com/100070155/334207203-6eecd96d-4fb9-40bc-b2e5-23f128671d40.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM1MzQyMjksIm5iZiI6MTcyMzUzMzkyOSwicGF0aCI6Ii8xMDAwNzAxNTUvMzM0MjA3MjAzLTZlZWNkOTZkLTRmYjktNDBiYy1iMmU1LTIzZjEyODY3MWQ0MC5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgxM1QwNzI1MjlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00NTM2ZjRhNzZjMmZiNzdhZTJkMWNiZDY2MzU2NTRmYTIxNDZjMzUyZDZiMDA2ZjljM2IxYWEwNjQ2YmFkNzgxJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.6Lw-U0B-FIPX18xI4LzvgeQXJnFK_bKk7eJ7kP6alLc)
3.3 Create an ECS Service under your ECS cluster created in step 3.1.
![Screenshot 2024-05-28 at 1 51 29 AM](https://private-user-images.githubusercontent.com/100070155/334207260-ba9dfad2-0270-48e4-ba3b-6b10b250bd87.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM1MzQyMjksIm5iZiI6MTcyMzUzMzkyOSwicGF0aCI6Ii8xMDAwNzAxNTUvMzM0MjA3MjYwLWJhOWRmYWQyLTAyNzAtNDhlNC1iYTNiLTZiMTBiMjUwYmQ4Ny5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgxM1QwNzI1MjlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT03ZGNjOWFkZmE1MjExMDRhZjU1NzEzYTE2ODk0ZmUwOThkNWM4Y2QyYjQ1ZTJlM2RjZWU0OWU4NmMwOWZlOTEwJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.G6564i5sqr8fLV8aGNn6Sd7BzMw3FM8yc9QwSk4igLU)
3.4 Now still in ECS Cluster, finally run a task.
![Screenshot 2024-05-28 at 1 52 54 AM](https://private-user-images.githubusercontent.com/100070155/334206620-abd04aab-74c4-4275-9cf4-0c51bc3a1eca.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MjM1MzQyMjksIm5iZiI6MTcyMzUzMzkyOSwicGF0aCI6Ii8xMDAwNzAxNTUvMzM0MjA2NjIwLWFiZDA0YWFiLTc0YzQtNDI3NS05Y2Y0LTBjNTFiYzNhMWVjYS5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjQwODEzJTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI0MDgxM1QwNzI1MjlaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT1jYjUyOWM5MzNlMjFkOGVlM2M0ODgzZWE3NmQ2MjY1YmFmMGM3YTM1NGQzZWE2MzM3MzgwMGRjYWJlNjcxNjZkJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCZhY3Rvcl9pZD0wJmtleV9pZD0wJnJlcG9faWQ9MCJ9.-ALAje4LFxTfpdK-HDfTzhslmlG_TTDRDZktK5LUE0o)
coming soon