Giter Site home page Giter Site logo

etl-pipeline's Introduction

ETL Pipeline for Tweets and News using Python, Kafka, AWS and ELK Stack

In this ETL pipeline, we are trying to analyze the sentiment of the users about Google with tweets and news as our data source.

Demo Video

Stacks

  • Python
  • Kafka with Zookeeper
  • AWS (Crawlers, Glue, Athena)
  • ELK (Elastic Search, Kibana)
  • textblob (Sentiment analysis)

Architecture of the application

arch

Running the ETL Pipeline

Step 1: Start Kafka, Elastic Search and Kibana

Run the following command to start the kafka brokers with the zookeeper, Elastic Search and Kibana

docker compose up

Step 2: Extract

To extract data from the source, you need to run the following commands in different terminal

python3 extract/tweet_producer.py
python3 extract/news_producer.py

To check if the producer produces any data, run the following commands in different terminal

kafka-console-consumer.sh --topic Tweets_Topic --bootstrap-server localhost:9092 --from-beginning

or

kafka-console-consumer.sh --topic News_Topic --bootstrap-server localhost:9092 -from-beginning

If it produces any data then the extract part is working fine.

Step 3: Transform

After extracting the data from the source, you need to run the following commands in different terminal to transform the data and add sentiment prediction to the data

python3 transform/tweet_transformer.py
python3 transform/news_transformer.py

To check if it transforms the data, run

kafka-console-consumer.sh --topic Processed --bootstrap-server localhost:9092 --from-beginning

If the messages has sentiment in it then the transformers works fine.

Step 4: Load

In this step you need to create a S3 bucket and replace the one with the default in load/aws_consumer.py. After that run the following commands in different terminal

python3 load/aws_consumer.py
python3 load/elk_consumer.py

This will load the data to the S3 bucket and Elastic Search


Elastic Search and Kibana

Create a dashboard with the data from the loader and visualize the result of the ETL pipeline

Kibana

AWS

In AWS you need to create a crawler with an IAM role and using AWS Athena you can view the results of the ETL pipeline

Athena 1 Athena 2

etl-pipeline's People

Contributors

nanthakumaran-s avatar imgbotapp avatar

Stargazers

Riyazur Razak N avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.