Giter Site home page Giter Site logo

polarstreams / polar Goto Github PK

View Code? Open in Web Editor NEW
199.0 5.0 15.0 1.19 MB

Lightweight & elastic kubernetes-native event streaming system

License: GNU Affero General Public License v3.0

Go 99.45% Dockerfile 0.19% Smarty 0.35%
elastic event-streaming events golang high-availability k8s kubernetes message-queue

polar's People

Contributors

acomabon avatar artahmetaj avatar aureamunoz avatar dependabot[bot] avatar helmutkemper avatar jorgebay avatar mihai22125 avatar sabre1041 avatar sunilkumardash9 avatar testwill avatar vevi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

polar's Issues

Use buffer pool for reading individual chunks

We are reading ahead from the segment file reusing the same buffer which is OK but when returning a single chunk (readSingleChunk()) we are allocating new buffers instead of using a pool.

Manual commit on all brokers

Similar to register, stateless consumers should be able to manually commit on all brokers with a single call.

Support a single consumer range

There's currently a minimum of 2 consumer ranges per token, which can be seen as arbitrary from the user perspective, because it simplified calculations during token splits/joins.

We should support setting the value to 1 consumer range per token.

Barco pods failing in OpenShift 4

I am testing Barco in OpenShift version 4.11 (Kubernetes version v1.24.0+4f0dd4d) but the Barco pods are not starting successfully. Following the installation (where I found some warnings reported here #17), the pods are in the following state:

$ oc get pod
NAME      READY   STATUS             RESTARTS        AGE
barco-0   0/1     CrashLoopBackOff   7 (21s ago)     11m
barco-1   0/1     Error              7 (5m21s ago)   11m

Checking the pod logs :

$ oc logs -f barco-0
{"level":"info","time":"2022-09-19T06:31:55Z","message":"Starting Barco"}
{"level":"info","time":"2022-09-19T06:31:55Z","message":"Using home dir as /var/lib/barco"}
{"level":"info","time":"2022-09-19T06:31:55Z","message":"Initializing local db from /var/lib/barco/data/local.db"}
{"level":"fatal","error":"unable to open database file: no such file or directory","time":"2022-09-19T06:31:55Z","message":"Exiting"}

Steps to reproduce

  1. Use OpenShift Local to start a local instance of OpenShift (Instructions here)
  2. Start it
  3. Execute commands described in Installing Barco Streams on Kubernetes page
  4. Check pod's status

Support Kafka API

It would be nice to support existing Kafka client API to leverage the existing Kafka ecosystem.

The goal is to support protocol translation for interoperability, not as a storage format.

Protocol documentation: https://kafka.apache.org/protocol.html

Tasks:

  • A: Implement basic Kafka API operations like metadata (3), api versions (18), find coordinator (10) and create topics (19).
  • B: Implement Produce (0) operation: decode a record batch, use the key to route and call produce by message.
  • C: Implement operations related to fetching/consuming: fetch (1), list offsets (2) and offset commit (8). These requires creating an abstraction for the request writer (currently HTTP only).
  • D: Support compression

Add sample HTTP/2 commands to consume messages

The documentation describes how to use HTTP/2 commands (by curl cli) to produce messages, however it is only described to use a client to consume them.

If the documentation supports HTTP/2 to produce and consume:

Barco supports producing and consuming events using [HTTP/2](https://en.wikipedia.org/wiki/HTTP/2) APIs. HTTP/2 provides solid low level features that are required in a client-server protocol like framing, request pipelining, ping, compression, etc. without the performance pitfalls of HTTP 1.

and includes an example to produce, it could be great to add an example to consume the same message (following the steps to subscribe, poll, and commit. Otherwise, it seems that it is always needed to have a client to consume messages.

Mixed casing conventions for the REST API

We recently introduced consumer_id querystring parameter which uses snake case but the API uses camelCase...

We should change it to consumerId on v0.5.0, while still supporting the snake case version until v0.7.0.

Tiered storage

We should add support for uploading and downloading files from S3 and other object storage providers.
Ideally, the configuration should be done in a way that facilitates budgeting for short period of time on a broker volume and longer periods on object storage.

For example, if a user wants to have 7 day retention, they could set 1 day local volume and 6 days on S3, that way the local volume should only be large enough to hold 1 day of events.

Create horizontal pod autoscaler based on CPU utilization

We should create a basic HPA resource file for users to have as a reference, it should be conservative in scaling down with long stabilizationWindowSeconds.

Scaling up should be set at 100% (2x) and scaling down should set at 50% (1/2 x).

Producer max offset can't be retrieved sometimes

There's a flaky test "should scale down". After looking into the root cause of the failure, it seems the max offset can't be retrieved from the peers.

This only happens in CI and occasionally but it probably hints a larger issue.

Provide a Java Client

Java client could be a great feature to allow producing and consuming messages from this language. This client should be published in the Maven Central Repository to allow be integrated as a dependency in the most common Java frameworks such as Quarkus, SpringBoot, and Micronaut.

This client should be aligned in the same way as the current Go Client.

Use dedicated type for representing the Group

We should use a type for representing the group name that way we make sure that it doesn't get mixed with other string parameters.

For example:

type Group string

and then, use the Group type for method signatures:

type OffsetState interface {
	Initializer
	fmt.Stringer

	// Here: Use Group instead of string
	Get(group Group, ...) offset *Offset

	// Use Group instead of string
	GetAllWithDefaults(group Group, t...) []Offset

	// Use Group instead of string
	Set(group Group, ...) bool
	// ...
}

Cross build for arm64

We should provide arm64 images.

There are a couple of tasks for providing both amd64 and arm64 images under a single tag:

  • We should conditionally build the docker image (see ./build/container/Dockerfile) with the following env variables for arm64 CGO_ENABLED=1 CC=aarch64-linux-gnu-gcc.
  • Create a manifest file to include both image types under the same name.

We could create a shell script or something with the steps.

Use dedicated buffers for producing

We stream the request body directly into the compressed writer which is good for saving memory resources but, as coalescing multiple requests into a single chunk is single threaded, it can lead to one slow client adding latency to other unrelated clients.

We should use intermediate buffers to read the bodies in parallel and avoid introducing latency to other requests when a client is slow.

This can have an impact on performance so we should benchmark the brokers as part of this task.

Create static binary for both amd64 and arm64

We should provide static binaries of PolarStreams and document how to install it manually on a linux instance.

To support both amd64 and arm64 we could build them inside containers and then copy them using:

docker create --name dummy IMAGE_NAME
docker cp dummy:/path/to/file /dest/to/file
docker rm -f dummy

On amd64, we can just disable CGO_ENABLED:

CGO_ENABLED=0 go build -ldflags '-s -linkmode external -extldflags=-static' .

But arm64 requires musl.

Support moving forward for corrupted files

PolarStreams performs CRC validation of chunks when reading. Currently, it will panic when finding a corrupted chunk. We should retrieve the data from a replica when this occurs.

More producer metrics

We should add more metrics related to the broker producing interface:

  • Raw requests and sizes
  • Use the amount of messages in the coalescer; ...
  • Current size of the allocation

Harden generation creation

Creating new generations due to ownership changes (scaling / failover) should be tested more thoroughly and implementation should be revisited to make sure it resists sudden process kills (power failure) and that the system will self heal.

Revisit consumer assignment

Consumer assignment was made with 1 consumer range per token in mind.

We have to account for different consumer ranges when assigning consumers to token ranges.

Topic creation, removal, validation and metadata

Support creating and removing topics, allowing disabling automatic creation when producing.
This includes topic validation when producing and other metadata.

We should track the ability to have different consumer ranges by topic on a separate ticket.

Support consuming without long lived connections

Currently we support producing in a stateless manner, for example you can send a message :

curl -X POST -d '{"hello":"world"}' \
    -H "Content-Type: application/json" \
    "http://barco.streams:9251/v1/topic/my-topic/messages"

It would be awesome if we could also support the same level of client statelessness for consumers, to enable stuff like curl for consuming, for example:

curl -X PUT "http://barco.streams:9252/v1/consumer/register?consumer_id=1"
curl -X POST -H "Accept: application/vnd.barco.consumermessage+json" \ 
    "http://barco.streams:9252/v1/consumer/poll?consumer_id=1"

This would represent a registration of a consumer with id "1", belonging to the "default" consumer group, to get data from all topics, followed by a request to poll the data. The poll response would contain multiple messages:

[
  {
    "topic": "my-topic",
    "token": "-9223372036854775808",
    "rangeIndex": "1",
    "version": "1",
    "startOffset": 123
    "values": [
      {"hello": 1},
      {"hello": 2},
      {"hello": 3}
    ]
  }
]

Provide a Node.JS Client

Node.JS client could be a great feature to allow producing and consuming messages from this language. This client should be published in the npm repositories to be used by the Node.JS community of developers.

This client should be aligned in the same way as the current Go Client.

Warnings deploying in OpenShift 4

I am testing Barco in OpenShift version 4.11 (Kubernetes version v1.24.0+4f0dd4d) but I am getting some warnings of PodSecurity violations.

Following the instructions I found the following warning when the customization is applied:

❯ kubectl apply -k .
namespace/streams created
serviceaccount/barco created
role.rbac.authorization.k8s.io/barco created
clusterrole.rbac.authorization.k8s.io/barco created
rolebinding.rbac.authorization.k8s.io/barco created
clusterrolebinding.rbac.authorization.k8s.io/barco created
service/barco created
Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "barco" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "barco" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "barco" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "barco" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
statefulset.apps/barco created

Steps to reproduce

  1. Use OpenShift Local to start a local instance of OpenShift (Instructions here)
  2. Start it
  3. Execute commands described in Installing Barco Streams on Kubernetes page

Expose `barco` service to be consumed externally

The current k8s definition of the barco service has not included nodePorts, so it is impossible to execute the sample commands to test and verify the production of messages.

Steps to reproduce

  1. Start minikube
  2. Deploy Barco following the instructions
  3. Execute the command to publish a message:
❯ TOPIC="my-topic"
curl -X POST -i -d '{"hello":"world"}' \
    -H "Content-Type: application/json" --http2-prior-knowledge \
    "http://barco.streams:9251/v1/topic/${TOPIC}/messages"
curl: (6) Could not resolve host: barco.streams
  1. Expose service to be consumed externally:
❯ minikube service barco --url -n streams
😿  service streams/barco has no node port

Suggestions

I would like to suggest improving the Getting Started by adding some references to have a local Kubernetes cluster (or using the local development environment) before to try to execute the commands to produce or consume. Otherwise, it is a bit complicated to follow up on the instructions.

Meanwhile the barco service is not available to expose it, there is an alternative way to execute the commands using the exec option of kubectl CLI:

❯ k exec barco-0 -- curl -X POST -i -d '{"hello":"world"}' -H "Content-Type: application/json" --http2-prior-knowledge "http://barco:9251/v1/topic/my-topic/messages"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    19  100     2  100    17      1     16  0:00:02  0:00:01  0:00:01    18
HTTP/2 200 
content-type: text/html; charset=utf-8
content-length: 2
date: Mon, 19 Sep 2022 07:12:15 GMT

OK

Support polling using different response formats

We currently send the compressed chunks to the consumers as a way to have less traffic between brokers and clients.

We should also support having the broker uncompress and adapt the response to be used directly by simple clients like curl.

Document consumer ranges

We should include a page in the documentation detailing how fanout works in a Barco Streams cluster.

Consider publishing custom k8s metrics

We should consider using publishing metrics via custom.metrics.k8s.io api that includes disk usage and other metrics that can be useful to be consumed by a HPA.

Support ndjson

We support HTTP/2 which features framing and requests pipelining. In order to support packing more messages into the same request we should support ndjson, which is a streaming-friendly format.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.