Giter Site home page Giter Site logo

franztscharf / impro-spark-docker Goto Github PK

View Code? Open in Web Editor NEW
0.0 2.0 1.0 31 KB

Over the last years, stream data processing has been gaining attention both in industry and in academia due to its wide range of applications. To fulfill the need for scalable and efficient stream analytics, numerous open source stream data processing systems have been developed, with high throughput and low latency being their key performance targets. Apache Spark is one of the stream processing systems used both in industry and in academia [1]. Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud [2]. The goal of this project is to integrate docker with Spark Streaming and expose Docker- and Spark-specific statistics to a common history server and visualize them. For example, Docker provides network related statistics and Spark provides JVM related statistics.

Dockerfile 17.80% Python 7.36% Smarty 9.47% Shell 65.38%

impro-spark-docker's Introduction

IMPRO-Spark-Docker: Monitoring runtime engine statistics for Spark and Docker swarm

Over the last years, stream data processing has been gaining attention both in industry and in academia due to its wide range of applications. To fulfill the need for scalable and efficient stream analytics, numerous open source stream data processing systems have been developed, with high throughput and low latency being their key performance targets. Apache Spark is one of the stream processing systems used both in industry and in academia [1]. Docker is an open platform for developers and sysadmins to build, ship, and run distributed applications, whether on laptops, data center VMs, or the cloud [2]. The goal of this project is to integrate docker with Spark Streaming and expose Docker- and Spark-specific statistics to a common history server and visualize them. For example, Docker provides network related statistics and Spark provides JVM related statistics.

Includes

The build process of this repo automaticly deployes a Docker Swarm with Apache Spark. Also it creats a virtual machine for visualisation of the regarding Docker and Spark metrics. For the visualisation it uses Grafana as time series analytics platform and Graphite as a historie server for storing the data. For collecting the statistics it uses CollecD.

Getting Started

Deploy Enviroment:

cd IMPRO-Spark-Docker
sh ./sh/build.sh

follow instructions of the script. At the end you can look into your IaaS provider ther should be multible vms deployed.

Deploy Apache Spark Example Streaming Application:

cd IMPRO-Spark-Docker
sh ./sh/deploy-app-streaming.sh

Wait until the app is ready. afterwards you can lauch the source thought the following commands.

cd IMPRO-Spark-Docker
sh ./sh/deploy-app-source.sh

Access Web UI:

-Grafana on ip of node-v and port 80 -Graphit on ip of node-v and port 81 -Apache Spark Web UI on ip of node-1 and port 8080

Dependencies

Runtime metrics

-CollecD -Native Spark config sink with file in ./pkg/metrics.properties

License

Not sure probably Apache 2.0

impro-spark-docker's People

Contributors

franztscharf avatar

Watchers

 avatar  avatar

impro-spark-docker's Issues

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.