Giter Site home page Giter Site logo

vadymurupa / pulsar-flink-stateful-streams Goto Github PK

View Code? Open in Web Editor NEW

This project forked from polyzos/pulsar-flink-stateful-streams

0.0 0.0 0.0 7.1 MB

Integrating Apache Pulsar and Apache Flink for Powerful Data Streams Processing

Shell 3.83% Java 96.17%

pulsar-flink-stateful-streams's Introduction

WIP:

  • Optimize to backpressure, buffers, checkpoint intervals and wm intervals for larger state
  • User RocksDB API to demonstrate what gets written and how
  • Use time based joins for session windows and add time constraints

Use Case 1

Data Enrichment with Topic Lookups

Use Case 2

Data Aggregation with Time Constraints on Time Windows

Setup a Pulsar Cluster

docker run -rm -it --name pulsar \
-p 6650:6650  -p 8080:8080 \
--mount source=pulsardata,target=/pulsar/data \
--mount source=pulsarconf,target=/pulsar/conf \
apachepulsar/pulsar:2.9.1 \
bin/pulsar standalone

Setup Pulsar Logical Components

Go into your container

docker exec -it pulsar bash

and run the following commands

  1. Create topics
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/orders
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/users
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/items

bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/view_events
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/purchase_events
bin/pulsar-admin topics create-partitioned-topic -p 1 persistent://public/default/cart_events

bin/pulsar-admin topics list public/default
  1. Set infinite Retention
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/users
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/items

bin/pulsar-admin topics get-retention persistent://public/default/users
bin/pulsar-admin topics get-retention persistent://public/default/items

bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/view_events
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/purchase_events
bin/pulsar-admin topics set-retention -s -1 -t -1 persistent://public/default/cart_events

Start a Flink Cluster

start-cluster

Deploy the Flink Job

./deploy.sh

Monitor Flink logs

Tail the logs

tail -f log/flink-*-taskexecutor-*

The original Datasets can be found on the following links:

pulsar-flink-stateful-streams's People

Contributors

polyzos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.