Giter Site home page Giter Site logo

bullet-db / bullet-core Goto Github PK

View Code? Open in Web Editor NEW
40.0 9.0 18.0 936 KB

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink.

Home Page: https://bullet-db.github.io

License: Apache License 2.0

Makefile 0.04% Java 99.93% Shell 0.03%
bullet big-data querying streaming real-time sketches java

bullet-core's Introduction

Bullet Core

Build Status Coverage Status Maven Central

Bullet is a streaming query engine that can be plugged into any singular data stream using a Stream Processing framework like Apache Storm, Spark or Flink. It lets you run queries on this data stream - including hard queries like Count Distincts, Top K etc.

Table of Contents

Background

In Bullet, both the queries and the data flow through the system. There is absolutely no persistence layer! Queries live as long as their duration and operate on data in-memory only. So, the queries in Bullet look forward in time, which is pretty unique for most querying systems.

We created Bullet initially as a simple distributed grep like tool to find events in a click stream (containing high volume - 1 million events per sec -- user interaction data) at Yahoo. In particular, we use it for validating instrumentation that generates these events by interacting with the pages ourselves and finding our own events in this data stream and validate it for the proper key/value pairs. There was nothing as light-weight and cheap as Bullet to do this task. There are many other use-cases for Bullet and indeed, how you use it, depends on your data stream. If you put Bullet on performance metric data, your queries might mostly be finding the 99th percentile of some latency metric etc.

This project is the core library for Bullet that lets us implement Bullet agnostically on any JVM based Stream Processor. See Bullet Storm, which uses this to implement Bullet on Storm and Bullet Spark, on Spark Streaming. This code initially lived inside the Bullet Storm code base up to Bullet Storm Version 0.4.3.

Install

Bullet Core is a library written in Java and published to Bintray and mirrored to JCenter. It is meant to be used to implement Bullet on different Stream Processors or to implement a Bullet PubSub. To see the various versions and set up your project for your package manager (Maven, Gradle etc), see here.

Usage

Once you have added a dependency for Bullet Core, use our abstractions for the PubSub, Parsing, Querying, Windowing, Partitioning, and Sketching as you need to. In particular, see how we abstract running a Bullet Query. You can also look at our reference implementations in Storm and Spark to get a better idea.

Documentation

All documentation is available at Github Pages here.

Links

Quick Links

Contributing

All contributions are welcomed! Feel free to submit PRs for bug fixes, improvements or anything else you like! Submit issues, ask questions using Github issues as normal and we will classify it accordingly. See Contributing for a more in-depth policy. We just ask you to respect our Code of Conduct while you're here.

License

Code licensed under the Apache 2 license. See the LICENSE for terms.

bullet-core's People

Contributors

0aix avatar akshaisarma avatar dependabot[bot] avatar hbxie avatar nathanspeidel avatar nickbethune avatar shriramkumar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

bullet-core's Issues

Support HOPPING window

Overlapping (or not) windows. We can require the sizes to multiples of each other to essentially buffer N of the smaller in a turnstile like data structure in the Join stage and combine when emitting. Might require supporting 'last' in our emit and include window attributes

Add post aggregations

Might require #37 for some

  1. Project constants per row
  2. Support post aggregate order by to sort
  3. Support computations using constants: E.g. SELECT (100.0 * CAST(A, double) / 32)) ...
  4. Support computations using other result columns: E.g. SELECT A-B...

Support Computations and Expressions in Filters and Projections

This will let us represent pre-aggregation computations and more easily translate statements like:

SELECT a + 5 
FROM ...

rather than add it as a postAggregation. It will also let us generate new fields before performing operations like COUNT DISTINCT, GROUP BY etc.

java.lang.NoSuchMethodError: com.yahoo.bullet.pubsub.PubSubMessage.<init> with (ByteArray)PubSubMessageSerDe

I see the following error when using the latest versions of all tooling with ByteArrayPubSubMessageSerDe:

java.lang.NoSuchMethodError: com.yahoo.bullet.pubsub.PubSubMessage.<init>(Ljava/lang/String;Ljava/io/Serializable;Lcom/yahoo/bullet/pubsub/Metadata;)V
	at com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe$LazyPubSubMessage.<init>(ByteArrayPubSubMessageSerDe.java:34) ~[bullet-core-1.4.1.jar!/:na]
	at com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe$LazyPubSubMessage.<init>(ByteArrayPubSubMessageSerDe.java:30) ~[bullet-core-1.4.1.jar!/:na]
	at com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe.toMessage(ByteArrayPubSubMessageSerDe.java:65) ~[bullet-core-1.4.1.jar!/:na]

Component versions:

BULLET_UI_VERSION=1.1.0
BULLET_WS_VERSION=1.2.2
BULLET_KAFKA_VERSION=1.2.2
BULLET_SPARK_VERSION=1.0.4
BULLET_EXAMPLES_VERSION=1.0.0

kafka_pubsub_config.yaml:

bullet.pubsub.context.name: "QUERY_SUBMISSION"
bullet.pubsub.class.name: "com.yahoo.bullet.kafka.KafkaPubSub"
bullet.pubsub.kafka.bootstrap.servers: "kafka:29092"
bullet.pubsub.kafka.request.topic.name: "bullet.requests"
bullet.pubsub.kafka.response.topic.name: "bullet.responses"
bullet.pubsub.message.serde.class.name: "com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe"

Logs:


  .   ____          _            __ _ _
 /\\ / ___'_ __ _ _(_)_ __  __ _ \ \ \ \
( ( )\___ | '_ | '_| | '_ \/ _` | \ \ \ \
 \\/  ___)| |_)| | | | | || (_| |  ) ) ) )
  '  |____| .__|_| |_|_| |_\__, | / / / /
 =========|_|==============|___/=/_/_/_/
 :: Spring Boot ::        (v1.5.7.RELEASE)

2021-06-29 07:02:02.331  INFO 1 --- [           main] com.yahoo.bullet.rest.Application        : Starting Application v1.2.2 on bullet-web-service-f6884c9bc-nw6zm with PID 1 (/usr/local/var/bullet/service/bullet-service-embedded.jar started by root in /usr/local/var/bullet/service)
2021-06-29 07:02:02.333  INFO 1 --- [           main] com.yahoo.bullet.rest.Application        : No active profile set, falling back to default profiles: default
2021-06-29 07:02:04.333  INFO 1 --- [           main] c.y.bullet.rest.service.SchemaService    : Read 10 fields
2021-06-29 07:02:04.338  INFO 1 --- [           main] c.y.bullet.rest.service.SchemaService    : Schema: {"data":[{"id":"probability","type":"column","attributes":{"description":"Generated from Random#nextDouble","name":"probability","type":"DOUBLE"}},{"id":"gaussian","type":"column","attributes":{"description":"Generated from Random#nextGaussian","name":"gaussian","type":"DOUBLE"}},{"id":"uuid","type":"column","attributes":{"description":"A UUID string generated from UUID#randomUUID","name":"uuid","type":"STRING"}},{"id":"tuple_number","type":"column","attributes":{"description":"A numeric id for the tuple generated in a monotonically increasing fashion in this period","name":"tuple_number","type":"LONG"}},{"id":"duration","type":"column","attributes":{"description":"A random number ranging from 0 to 10050 with a tendency to have a high frequency on lower values","name":"duration","type":"LONG"}},{"id":"type","type":"column","attributes":{"description":"A random string chosen from: foo, bar, baz, qux, quux, norf","name":"type","type":"STRING"}},{"id":"subtypes","type":"column","attributes":{"subFields":[{"name":"field_A","description":"Value randomly chosen from: foo, bar, baz, qux, quux, norf"},{"name":"field_B","description":"Value randomly chosen from: foo, bar, baz, qux, quux, norf"}],"description":"Contains two keys whose values are randomly chosen from: foo, bar, baz, qux, quux, norf","name":"subtypes","type":"STRING_MAP"}},{"id":"tags","type":"column","attributes":{"description":"Contains four keys which are four fragments of the uuid. The values are randomly generated boolean values from Random#nextBoolean","name":"tags","type":"BOOLEAN_MAP"}},{"id":"stats","type":"column","attributes":{"subFields":[{"name":"period_count","description":"The period in which this record was generated"},{"name":"record_number","description":"A monotonically increasing id for the record. There may be gaps in the id but if the data generation has kept up with your maximum tuples per period, this is the nth tuple generated"},{"name":"timestamp","description":"The ms time when this record was generated"},{"name":"nano_time","description":"The ns time when this record was generated"}],"description":"This map contains some numeric information such as the current number of periods etc.","name":"stats","type":"LONG_MAP"}},{"id":"classifiers","type":"column","attributes":{"description":"This contains two maps, each with: field_A and field_B whose values are randomly chosen from: foo, bar, baz, qux, quux, norf","name":"classifiers","type":"STRING_MAP_LIST"}}],"meta":{"version":"1.0"}}
2021-06-29 07:02:04.364  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_defaults.yaml
2021-06-29 07:02:04.369  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: storage_defaults.yaml
2021-06-29 07:02:04.391  INFO 1 --- [           main] c.yahoo.bullet.rest.PubSubConfiguration  : Async responder classes are not configured.
2021-06-29 07:02:04.393  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_defaults.yaml
2021-06-29 07:02:04.396  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: /bullet/configs/pubsub_config.yaml
2021-06-29 07:02:04.408  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_defaults.yaml
2021-06-29 07:02:04.412  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_kafka_defaults.yaml
2021-06-29 07:02:04.418  INFO 1 --- [           main] com.yahoo.bullet.kafka.KafkaPubSub       : Producer properties:
{retries=5, value.serializer=org.apache.kafka.common.serialization.ByteArraySerializer, request.timeout.ms=3000, acks=all, batch.size=65536, max.block.ms=50000, bootstrap.servers=bullet-pubsub:9092, connections.max.idle.ms=-1, buffer.memory=33554432, key.serializer=org.apache.kafka.common.serialization.StringSerializer, linger.ms=5}
2021-06-29 07:02:04.419  INFO 1 --- [           main] com.yahoo.bullet.kafka.KafkaPubSub       : Consumer properties:
{key.deserializer=org.apache.kafka.common.serialization.StringDeserializer, value.deserializer=org.apache.kafka.common.serialization.ByteArrayDeserializer, max.poll.records=50, request.timeout.ms=35000, group.id=bullet-query-consumer, bootstrap.servers=bullet-pubsub:9092, heartbeat.interval.ms=3000, auto.commit.interval.ms=1000, enable.auto.commit=true, fetch.max.wait.ms=500, connections.max.idle.ms=-1, session.timeout.ms=30000, max.poll.interval.ms=30000}
2021-06-29 07:02:04.860  INFO 1 --- [       Thread-2] com.yahoo.bullet.rest.common.Reader      : Reader thread started, ID: 18
2021-06-29 07:02:05.003  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: /bullet/configs/query_config.yaml
2021-06-29 07:02:05.004  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_defaults.yaml
2021-06-29 07:02:05.007  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_bql_defaults.yaml
2021-06-29 07:02:05.008  WARN 1 --- [           main] com.yahoo.bullet.common.Validator        : Key: bullet.bql.max.query.length had an invalid value: null. Using default: 2147483647
2021-06-29 07:02:05.015  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_defaults.yaml
2021-06-29 07:02:05.018  INFO 1 --- [           main] com.yahoo.bullet.common.Config           : Loading configuration file: bullet_bql_defaults.yaml
2021-06-29 07:02:05.938  INFO 1 --- [           main] com.yahoo.bullet.rest.Application        : Started Application in 4.004 seconds (JVM running for 4.333)
2021-06-29 07:05:14.611 ERROR 1 --- [nboundChannel-2] .WebSocketAnnotationMethodMessageHandler : Error while processing message GenericMessage [payload=byte[116], headers={simpMessageType=MESSAGE, stompCommand=SEND, nativeHeaders={destination=[/server/request], content-length=[116]}, simpSessionAttributes={}, simpHeartbeat=[J@62ddb425, lookupDestination=/request, simpSessionId=4uo2ppahji1pfq1g4bbsfqw4fppxy034yrfq0hr5uax0gt0ox3q0va5hunfeaq1h, simpDestination=/server/request}]

java.lang.NoSuchMethodError: com.yahoo.bullet.pubsub.PubSubMessage.<init>(Ljava/lang/String;Ljava/io/Serializable;Lcom/yahoo/bullet/pubsub/Metadata;)V
	at com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe$LazyPubSubMessage.<init>(ByteArrayPubSubMessageSerDe.java:34) ~[bullet-core-1.4.1.jar!/:na]
	at com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe$LazyPubSubMessage.<init>(ByteArrayPubSubMessageSerDe.java:30) ~[bullet-core-1.4.1.jar!/:na]
	at com.yahoo.bullet.pubsub.ByteArrayPubSubMessageSerDe.toMessage(ByteArrayPubSubMessageSerDe.java:65) ~[bullet-core-1.4.1.jar!/:na]
	at com.yahoo.bullet.rest.service.QueryService.submit(QueryService.java:74) ~[classes!/:1.2.2]
	at com.yahoo.bullet.rest.service.WebSocketService.submitQuery(WebSocketService.java:87) ~[classes!/:1.2.2]
	at com.yahoo.bullet.rest.controller.WebSocketController.handleNewQuery(WebSocketController.java:89) ~[classes!/:1.2.2]
	at com.yahoo.bullet.rest.controller.WebSocketController.submitWebsocketQuery(WebSocketController.java:65) ~[classes!/:1.2.2]
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:1.8.0_292]
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:1.8.0_292]
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_292]
	at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_292]
	at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:180) ~[spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.handler.invocation.InvocableHandlerMethod.invoke(InvocableHandlerMethod.java:112) ~[spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.handler.invocation.AbstractMethodMessageHandler.handleMatch(AbstractMethodMessageHandler.java:502) [spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.simp.annotation.support.SimpAnnotationMethodMessageHandler.handleMatch(SimpAnnotationMethodMessageHandler.java:497) [spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.simp.annotation.support.SimpAnnotationMethodMessageHandler.handleMatch(SimpAnnotationMethodMessageHandler.java:87) [spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.handler.invocation.AbstractMethodMessageHandler.handleMessageInternal(AbstractMethodMessageHandler.java:461) [spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.handler.invocation.AbstractMethodMessageHandler.handleMessage(AbstractMethodMessageHandler.java:399) [spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at org.springframework.messaging.support.ExecutorSubscribableChannel$SendTask.run(ExecutorSubscribableChannel.java:135) [spring-messaging-4.3.11.RELEASE.jar!/:4.3.11.RELEASE]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_292]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_292]
	at java.lang.Thread.run(Thread.java:748) [na:1.8.0_292]

Support SESSION window

This window breaks if the gap between successful events exceeds a certain time threshold

RestPubSub does not work with ByteArrayPubSubMessageSerDe

When we write out the PubSubMessage (created with a SerDe), the RestPubSub takes the payload (which is a byte[]), serializes it again to another byte[], Base64 encodes it as a String and returns a new PubSubMessage that is then JSONified. When reading it back in RestSubscriber, we are not able to GSON parse it as a PubSubMessage since it expects an array but a String is there instead: Expected BEGIN_ARRAY but was STRING

Support casting

Filters and Projections should support casting. Group by also can (post-aggregation)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.