Giter Site home page Giter Site logo

ai-tech-news-bot's Introduction

AI Tech News Crawler and Newsletter Bot

This project is a serverless AI tech news crawler that uses AWS Lambda and EventBridge via the Serverless Framework to summarize keyword and category data based on a trending keywords API built with natural language processing.

The end result is a newsletter/report built on getting search data for tech sites for specific keywords that are trending within various categories specified by you. You can track keywords or you can track trending keywords within certain categories, and then use an LLM to summarize what is happening with these keywords themselves.

See the config.py file for setting specific categories or keywords. See the example newsletter that has been generated here.

The project is free to use and will only cost you the OpenAI tokens, about $0.05 per report.

If you need guidence or want to build the API yourself see this tutorial here and here.

Prerequisites

  • AWS account
  • OpenAI API key
  • Python 3.8 or newer
  • Serverless Framework
  • Docker installed and running (optional)

Getting Started

1. Set up your local environment

Start by creating a file locally where you'll store the code.

mkdir tech-bot
cd tech-bot

Make sure you have python installed.

python --version

If not, install it locally. Here it depends on which version you are on. If you are on 3.11 you don't need to do anything but if you are on a later or earlier version please change the runtime in the serverless.yml file.

Make sure you have the Serverless Framework installed if not nstall it globally using npm

npm install -g serverless
  • You can check if Node.js and npm are installed by running node -v and npm -v in your terminal. If these commands return versions, Node.js and npm are already installed. If not, you'll need to install Node.js (npm comes bundled with Node.js).
  • To install Node.js and npm, visit the Node.js website and download the installer for your operating system.

2. Clone the Repository

Clone this repository to your local machine:

git clone https://github.com/ilsilfverskiold/ai-tech-news-bot.git
cd ai-tech-news-bot

3. (Optional) Make sure Docker is running

Check that docker is running. If you do not want to package your dependencies with docker (which isn't really strictly necessary) then change the serverless.yml file like so.

custom:
  pythonRequirements:
    dockerizePip: false

At the moment it will be set to true.

4. Set up the environment

Create a virtual environment and activate it:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required dependencies:

pip install -r requirements.txt

Install serverless-python-requirements like so.

serverless plugin install -n serverless-python-requirements

5. AWS Credentials Setup

Configure your AWS credentials for the Serverless Framework:

serverless config credentials --provider aws --key YOUR_AWS_ACCESS_KEY --secret YOUR_AWS_SECRET_KEY

You'll get your credentials by creating an IAM user in the AWS console with something like these rights.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "iam:GetRole",
                "events:DescribeRule",
                "apigateway:*",
                "s3:*",
                "logs:*",
                "events:PutRule",
                "events:RemoveTargets",
                "events:PutTargets",
                "events:DeleteRule",
                "iam:CreateRole",
                "cloudformation:*",
                "iam:AttachRolePolicy",
                "iam:PutRolePolicy",
                "events:PutTargets",
                "iam:PassRole",
                "lambda:*",
                "iam:TagRole",
                "iam:UntagRole"
            ],
            "Resource": "*"
        }
    ]
}

If you're having issues try to follow the tutorial here.

6. Set Your OpenAI API Key

Export your OpenAI API key as an environment variable:

export OPENAI_API_KEY=your_openai_api_key_here

You'll get an API key from OpenAI's platform directly. Make sure you have API credits available.

7. Configuration

Edit the config.py file to set up your preferences:

  • Keywords and Categories: Specify your interests to tailor the news content.
  • AWS SES Email Settings: Enter your "From" and "To" email addresses for the newsletter. Make sure you have the right to send and to recieve these emails from AWS SES.

Example config.py:

CATEGORY_LIMITS = {
    "Subjects": 2,
    "Tools & Services": 6,
    "Websites & Applications": 2,
    "Concepts & Methods": 2,
    "Platforms & Search Engines": 6,
    "Companies & Organizations": 2,
    "Hardware & Systems": 2,
    "Languages & Syntax": 2,
    "Frameworks & Libraries": 2,
    "People": 2,
    "AI Models & Assistants": 6
} # keyword limits per category that is trending

KEYWORDS_OF_INTEREST = ["Docker", "AWS", "AI"] # will always be includes regardles if trending or not

CATEGORIES_OF_INTEREST = [{"category": "Platforms & Search Engines", "limit": 6}, {"category": "Tools & Services", "limit": 6}] # set the categories rather than keywords

Remember to keep it condensed or it may be too much information to digest (for yourself I mean).

Also remember to set the correct email from and to adress that have been confirmed in AWS SES. Make sure you also set this up in AWS SES.

SOURCE_EMAIL = '[email protected]' 
TO_ADRESS = '[email protected]'
AWS_REGION_NAME = 'eu-north-1'

To go through a detailed tutorial go here.

Optional: Customize Email Templates

Modify the templates.py file if you wish to change the bot's messaging style or how it processes and summarizes data.

Optional: Tweak the scheduled events

Go to the serverless.yml file to tweak when the newsletters go out. The API usually gets new data before 10 AM UTC.

events:
    - eventBridge:
        schedule: 'cron(0 10 ? * 2-6 *)' # Trigger at 10 AM UTC on weekdays
        input:
            time_period: "daily"
    - eventBridge:
        schedule: 'cron(0 10 ? * FRI *)' # Trigger at 10 AM UTC every Friday
        input:
            time_period: "weekly"

Deployment

You can test it locally before deploying.

serverless invoke local --function newsletterTrigger

And when you're ready to go. You can deploy.

serverless deploy

Usage

The bot uses data from these API endpoints.

Table endpoint that let's us get the trending keywords for a yesterday or the current week.

curl -X GET \ 
"https://safron.io/api/table?period=daily&sort=trending"

Sources endpoint that will get us sources based on a keyword or ids.

curl -X POST \
-H "Content-Type: application/json" \
-d '{"keywords": ["Alexa", "Amazon"]}' \
"https://safron.io/api/sources?startDate=2024-01-30&endDate=2024-01-30"

This bot uses the sources from these to summarize based on our preferences and to build us customized tech news reports.

ai-tech-news-bot's People

Contributors

ilsilfverskiold avatar

Stargazers

Matteo Di Cristofaro avatar  avatar zjzhang avatar w4ld@sh avatar Philip Kang avatar HugeTerry avatar  avatar Wondong Shin avatar Prasanjit dutta avatar Cheng-Lung Sung avatar WU MINGHAO avatar Yong Chen avatar Panchajanya Banerjee (Pancham) avatar  avatar Chinmay Shrivastava avatar  avatar Bala Wang avatar Ramzi Malhas avatar Brad Armstrong avatar Amir H. Karimi avatar Vince Fulco--Bighire.tools avatar  avatar  avatar  avatar Kishor Kukreja avatar polya avatar Jingbo Xu avatar Davyd Sadovskyy avatar  avatar Weatherman avatar Seder(方进) avatar Stefan ⚯ avatar Chiheb Mhamdi avatar dgo2dance avatar Kryspin Ziemski avatar keimau avatar

Watchers

 avatar

ai-tech-news-bot's Issues

Docker Release

Any plans to dockerize this
The model part will be nice to have

Some issues with serverless deploy

Hi Ida!

I left a comment on your Medium article with some questions/issues.

After more issues doing "serverless deploy", I thought maybe better to create an issue here to show the issues I had deploying to AWS, and how I fixed them. I think there was also an issue I mentioned in your medium article not referenced here that I also fixed in my environment.

I now have this setup in AWS serverless. Looking forward to my first newsletter tomorrow morning. :))))

Thanks so much for sharing this with the community. It's pretty cool!


FYI, after peeling "serverless deploy" issues onion more, note I found the following changes also needed:

I'm super new to AWS serverless, etc. So these aren't great solutions. But FYI what I did to unblock myself.

  1. Added following in serverless.yml for function newsletterTrigger:
    role: arn:aws:iam::801843946498:role/tech-bot-dev-us-east-1-lambdaRole

I'm guessing the first "serverless deploy" created relevant IAM role, and then subsequent deploys tried to create it also but already existed.

So this isn't a general solution, as this won't work the first time since the role isn't created yet. I guess the "iamRoleStatements" in the serverless.yml causes the IAM role to be created 1st time, but fails subsequent times?

Error I was getting that this fixed was:

Error:
CREATE_FAILED: IamRoleLambdaExecution (AWS::IAM::Role)
Resource handler returned message: "Resource of type 'AWS::IAM::Role' with identifier 'tech-bot-dev-us-east-1-lambdaRole' already exists." (RequestToken: f0ea4c8b-fab9-c99b-1ef7-c31a9df78d87, HandlerErrorCode: AlreadyExists)

  1. Then started getting another serverless deploy issue related to no permissions for events.

CREATE_FAILED: TechbotdevnewsletterTriggerrule2EventBridgeRule (AWS::Events::Rule)
Resource handler returned message: "User: arn:aws:iam::801843946498:user/serverless-newsbot is not authorized to perform: events:DescribeRule on resource: arn:aws:events:us-east-1:801843946498:rule/tech-bot-dev-newsletterTrigger-rule-2 because no identity-based policy allows the events:DescribeRule action (Service: EventBridge, Status Code: 400, Request ID: 747c5a19-5e08-4ebf-b677-df9310fdea35)" (RequestToken: 1b1b4ec8-e05f-a0b2-657c-8198bf2de27d, HandlerErrorCode: AccessDenied)

I I just added "events:*" to the serverless IAM policy you had us create. That is probably too permissive, but... I needed to get things running. :)

  1. New issue with the schedule:

CREATE_FAILED: NewsletterTriggerEventsRuleSchedule1 (AWS::Events::Rule)
Resource handler returned message: "Parameter ScheduleExpression is not valid. (Service: EventBridge, Status Code: 400, Request ID: 5a5572f5-9f48-4eca-8dbc-c16e8c9023e4)" (RequestToken: 3f7735eb-3c1e-4941-f708-d9f10a639c6d, HandlerErrorCode: GeneralServiceException)

I changed the syntax of the schedules in serverless.yml to be like:

events:
  - eventBridge:
      schedule: 'cron(0 13 ? * MON-FRI *)' # Trigger at 1PM UTC on weekdays
      input:
        time_period: "daily"
  - eventBridge:
      schedule: 'cron(0 13 ? * SAT *)' # Trigger at 1PM  UTC every Saturday
      input:
        time_period: "weekly"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.