Giter Site home page Giter Site logo

elenikougiou / flink-cep-automation Goto Github PK

View Code? Open in Web Editor NEW
4.0 1.0 1.0 214 KB

Apache Flink CEP: Automatic pattern generation and processing

License: Apache License 2.0

Java 84.79% Python 12.56% Shell 2.65%
flink flink-cep automation big-data java kafka pattern-recognition event-processing

flink-cep-automation's Introduction

Flink CEP Automation

AUEB | Bachelor Thesis (w / Prof. Kotidis Yannis) | 2020 - 2021

The goal of the project is to create a generalized Complex Event Processing Operator using the library FlinkCEP.

Code contributors:

  • Kontaxakis Antonios
  • Kotidis Yannis

How to use:

Step 1 - Create data:

StreamGenerator.py creates the text file with the data, based on the requested pattern and conditions we want to examine. The data contains one event per line in the format stream_id, window_id, event , while the last line of the file is "-1, -1, KILL" in order to understand that there are no other events and to terminate the flink job. The user needs to enter 7 command-line arguments as follows:

  1. Pattern (String)
  2. Stream length (int)
  3. Number of sub-streams (int)
  4. Window size (int)
  5. Number of matches (int)
  6. Strict contiguity (boolean)
  7. File name for writing data (String)

Command example

./StreamGenerator.py 'ab{1,3}(c|d)' 1000 8 100 150 True 'data.txt'

Step 2 - Send data:

CEPdata.java sends the data to a Kafka topic. The user needs to enter 3 command-line arguments as follows:

  1. File name for reading data (String)
  2. Name of the Kafka topic for sending data (String)
  3. Host IP (String)

Command example (with jar)

java -jar data_kafka.jar 'data.txt' 'CEPdata' 'localhost'

Step 3 - Submit flink job & find results:

CEPCase_Generate.java contains all the important operations:

  • Reads the data from a Kafka topic (or from a text file)
  • Re-writes the wanted regular expression to a FlinkCEP pattern based on wanted conditions
  • Finds the matching results
  • Writes the results to a Kafka topic (or to a text file)

The user needs to enter 12 command-line arguments as follows:

  1. Type (String): "Kafka" for using Kafka topics to read and write, or anything else for using text files.
  2. File name for reading data (String) (useful when type != "Kafka")
  3. File name for writing results (String) (useful when type != "Kafka")
  4. Pattern (String)
  5. Parallelism (int)
  6. Contiguity Condition (int): 1 = strict, 2 = relaxed, 3 = non deterministic relaxed
  7. After match skip strategy (int): 1 = no skip, 2 = skip to next, 3 = skip past last event, 4 = skip to first, 5 = skip to last
  8. Pattern name (int) (useful when strategy = 4 or strategy = 5)
  9. Flink job name (String)
  10. Name of the Kafka topic for reading data (String) (useful when type = "Kafka")
  11. Name of the Kafka topic for writing results (String) (useful when type = "Kafka")
  12. Host IP (String)

Command example (with jar, submitting job to a Flink cluster)

./bin/flink run ./examples/flink_job.jar 'Kafka' '-' '-' 'ab{1,3}(c|d)' 4 1 3 '-' 'Example' 'CEPdata' 'CEPout' 'localhost'

flink-cep-automation's People

Contributors

elenikougiou avatar kougiou avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Forkers

lilijacky

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.