Giter Site home page Giter Site logo

jerry-024 / sparta Goto Github PK

View Code? Open in Web Editor NEW

This project forked from wacai/sparta

0.0 1.0 0.0 36.81 MB

Real Time Aggregation based on Spark Streaming

Home Page: http://www.stratio.com

License: Apache License 2.0

Scala 39.24% Shell 0.80% Java 3.68% CSS 6.96% HTML 5.19% JavaScript 31.38% Gherkin 11.80% ApacheConf 0.94%

sparta's Introduction

Coverage Status

About Stratio Sparta

Since Aryabhatta invented zero, Mathematicians such as John von Neuman have been in pursuit of efficient counting and architects have constantly built systems that computes counts quicker. In this age of social media, where 100s of 1000s events take place every second, we were inspired by twitter's Rainbird project to develop a distributed aggregation engine with this high level features:

  • Pure Spark
  • No need of coding, only declarative aggregation workflows
  • Data continuously streamed in & processed in near real-time
  • Ready to use, plug&play
  • Flexible workflows (input, output, parsers, etc...)
  • High performance
  • Scalable
  • Business Activity Monitoring
  • Visualization

Strataconf London 2015 slideshare

Introduction

Social media and networking sites are part of the fabric of everyday life, changing the way the world shares and accesses information. The overwhelming amount of information gathered not only from messages, updates and images but also readings from sensors, GPS signals and many other sources was the origin of a (big) technological revolution.

This vast amount of data allows us to learn from the users and explore our own world.

We can follow in real-time the evolution of a topic, an event or even an incident just by exploring aggregated data.

But beyond cool visualizations, there are some core services delivered in real-time, using aggregated data to answer common questions in the fastest way.

These services are the heart of the business behind their nice logos.

Site traffic, user engagement monitoring, service health, APIs, internal monitoring platforms, real-time dashboards…

Aggregated data feeds directly to end users, publishers, and advertisers, among others.

In Sparta we want to start delivering real-time services. Real-time monitoring could be really nice, but your company needs to work in the same way as digital companies:

Rethinking existing processes to deliver them faster, better. Creating new opportunities for competitive advantages.

Features

  • Highly business-project oriented
  • Multiple application
  • Cubes
    • Time-based
    • Secondly, minutely, hourly, daily, monthly, yearly...
    • Hierarchical
    • GeoRange: Areas with different sizes (rectangles)
    • Flexible definition of aggregation policies (json, web app)
  • Operators:
    • Max, min, count, sum, range
    • Average, median
    • Stdev, variance, count distinct
    • Last value
    • Full-text search

Architecture

Sparta overview

Architecture

Key technologies

Input/Outputs

Inputs

  • Twitter
  • Kafka
  • Flume
  • RabbitMQ
  • Socket

Outputs

  • MongoDB
  • Cassandra
  • ElasticSearch
  • Redis
  • Spark's DataFrames Outputs
  • PrintOut
  • CSV
  • Parquet

Build

You can generate rpm and deb packages by running:

mvn clean package -Ppackage

Note: you need to have installed the following programs in order to build these packages:

In a debian distribution:

  • fakeroot
  • dpkg-dev
  • rpm

In a centOS distribution:

  • fakeroot
  • dpkg-dev
  • rpmdevtools

Sandbox

Documentation

sparta's People

Contributors

emgaitan-stratio avatar compae avatar alexrchies avatar mariostratio avatar aalfonso-stratio avatar danielcsant avatar anistal avatar gschiavon avatar dcarroza-stratio avatar sgomezg avatar ajnavarro avatar witokondoria avatar smola avatar becaresss avatar eambrosio avatar gasparms avatar roclas avatar gjimenez-stratio avatar tomasperezv avatar dvallejo avatar pedrogutierrezstratio avatar

Watchers

lining avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.