Stream-processing-workshop
Current Status
Data Source- GDELT
The GDELT Project
Supported by Google Jigsaw, the GDELT Project monitors the world's broadcast, print, and web news from nearly every corner of every country in over 100 languages and identifies the people, locations, organizations, themes, sources, emotions, counts, quotes, images and events driving our global society every second of every day, creating a free open platform for computing on the entire world. (Copied from official website)
Workflow
Requirements
Install and run Pravega
[Option 1] from installation package
[Option 2] from docker
docker run -it -e HOST_IP=<ip> -p 9090:9090 -p 12345:12345 pravega/pravega:latest standalone
Install and run Flink
Run on docker
docker pull flink:scala_2.11
cd references/flink-docker
docker-compose up
Install and run Kafka
Quick start
Create a new topic
- Example
kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 1 --topic my-replicated-topic
Start a producer
- Example
kafka-console-producer.sh --broker-list localhost:9092 --topic test
Start a consumer
- Example
kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic test --from-beginning
Install and run Druid
Download Druid
Run Druid
bin/supervise -c quickstart/tutorial/conf/tutorial-cluster.conf
modify tutorial-cluster.conf as desired
Run Jobs
Option 4
Export Data Producer
--class com.machinedoll.projectdemo.jobs.option4.ExportDataProducer
Export Data Consumer
--class com.machinedoll.projectdemo.jobs.option4.ExportDataConsumer