polarstreams / polar Goto Github PK

View Code? Open in Web Editor NEW

199.0 5.0 15.0 1.19 MB

Lightweight & elastic kubernetes-native event streaming system

License: GNU Affero General Public License v3.0

Go 99.45% Dockerfile 0.19% Smarty 0.35%

elastic event-streaming events golang high-availability k8s kubernetes message-queue

polar's People

Contributors

Stargazers

Watchers

Forkers

aholmberg jorgebay aureamunoz vevi acomabon helmutkemper hadryan mihai22125 artahmetaj vinodborole nanderoo sunilkumardash9 sabre1041 mkingori

polar's Issues

Consumer poll response should include offset

Currently, the response does not include the offset of the payload: https://github.com/barcostreams/barco/blob/main/docs/developer/NETWORK_FORMATS.md#consumer-poll-response

The poll response should contain the offset for bookeeping after the topic name.

Use buffer pool for reading individual chunks

We are reading ahead from the segment file reusing the same buffer which is OK but when returning a single chunk (readSingleChunk()) we are allocating new buffers instead of using a pool.

Manual commit on all brokers

Similar to register, stateless consumers should be able to manually commit on all brokers with a single call.

Support a single consumer range

There's currently a minimum of 2 consumer ranges per token, which can be seen as arbitrary from the user perspective, because it simplified calculations during token splits/joins.

We should support setting the value to 1 consumer range per token.

Barco pods failing in OpenShift 4

I am testing Barco in OpenShift version 4.11 (Kubernetes version v1.24.0+4f0dd4d) but the Barco pods are not starting successfully. Following the installation (where I found some warnings reported here #17), the pods are in the following state:

$ oc get pod
NAME      READY   STATUS             RESTARTS        AGE
barco-0   0/1     CrashLoopBackOff   7 (21s ago)     11m
barco-1   0/1     Error              7 (5m21s ago)   11m

Checking the pod logs :

$ oc logs -f barco-0
{"level":"info","time":"2022-09-19T06:31:55Z","message":"Starting Barco"}
{"level":"info","time":"2022-09-19T06:31:55Z","message":"Using home dir as /var/lib/barco"}
{"level":"info","time":"2022-09-19T06:31:55Z","message":"Initializing local db from /var/lib/barco/data/local.db"}
{"level":"fatal","error":"unable to open database file: no such file or directory","time":"2022-09-19T06:31:55Z","message":"Exiting"}

Steps to reproduce

Use OpenShift Local to start a local instance of OpenShift (Instructions here)
Start it
Execute commands described in Installing Barco Streams on Kubernetes page
Check pod's status

Use go's Context all the way when producing

We should pass deadlines on all the replication communications to unbounded request times.

Set the prometheus metrics annotations in resource files

We should set the prometheus.io/scrape and prometheus.io/port in the k8s resource files to automatically scrape Brokers.

Discovery endpoint for brokers should also provide the broker's namespace

Discovery API's GET /v1/brokers return the service name without being fully qualified with the namespace.

It should return "my-svc.my-namespace" to work on all environments.

Support Kafka API

It would be nice to support existing Kafka client API to leverage the existing Kafka ecosystem.

The goal is to support protocol translation for interoperability, not as a storage format.

Protocol documentation: https://kafka.apache.org/protocol.html

Tasks:

A: Implement basic Kafka API operations like metadata (3), api versions (18), find coordinator (10) and create topics (19).
B: Implement Produce (0) operation: decode a record batch, use the key to route and call produce by message.
C: Implement operations related to fetching/consuming: fetch (1), list offsets (2) and offset commit (8). These requires creating an abstraction for the request writer (currently HTTP only).
D: Support compression

Active timeouts for consumers in half-open state

We should proactively close consumer connections that are idle after timeout elapsed.

Collapse getting started

Add sample HTTP/2 commands to consume messages

The documentation describes how to use HTTP/2 commands (by curl cli) to produce messages, however it is only described to use a client to consume them.

If the documentation supports HTTP/2 to produce and consume:

Barco supports producing and consuming events using [HTTP/2](https://en.wikipedia.org/wiki/HTTP/2) APIs. HTTP/2 provides solid low level features that are required in a client-server protocol like framing, request pipelining, ping, compression, etc. without the performance pitfalls of HTTP 1.

and includes an example to produce, it could be great to add an example to consume the same message (following the steps to subscribe, poll, and commit. Otherwise, it seems that it is always needed to have a client to consume messages.

Mixed casing conventions for the REST API

We recently introduced consumer_id querystring parameter which uses snake case but the API uses camelCase...

We should change it to consumerId on v0.5.0, while still supporting the snake case version until v0.7.0.

Manual consumer commit

Support receiving a commit message with the last consumed offset and commit it.

duplicate column name when container restarts

When the container restarts, the ALTER TABLE ... command generates the error: duplicate column name: cluster_size

Include arm64 builds in CI

We should run tests on GitHub actions for arm64.

We could use something like https://github.com/uraimo/run-on-arch-action that uses QEMU and run go test ./... or any approach.

Tiered storage

We should add support for uploading and downloading files from S3 and other object storage providers.
Ideally, the configuration should be done in a way that facilitates budgeting for short period of time on a broker volume and longer periods on object storage.

For example, if a user wants to have 7 day retention, they could set 1 day local volume and 6 days on S3, that way the local volume should only be large enough to hold 1 day of events.

Create horizontal pod autoscaler based on CPU utilization

We should create a basic HPA resource file for users to have as a reference, it should be conservative in scaling down with long stabilizationWindowSeconds.

Scaling up should be set at 100% (2x) and scaling down should set at 50% (1/2 x).

Producer max offset can't be retrieved sometimes

There's a flaky test "should scale down". After looking into the root cause of the failure, it seems the max offset can't be retrieved from the peers.

This only happens in CI and occasionally but it probably hints a larger issue.

Provide a Java Client

Java client could be a great feature to allow producing and consuming messages from this language. This client should be published in the Maven Central Repository to allow be integrated as a dependency in the most common Java frameworks such as Quarkus, SpringBoot, and Micronaut.

This client should be aligned in the same way as the current Go Client.

Use dedicated type for representing the Group

We should use a type for representing the group name that way we make sure that it doesn't get mixed with other string parameters.

For example:

type Group string

and then, use the Group type for method signatures:

type OffsetState interface {
	Initializer
	fmt.Stringer

	// Here: Use Group instead of string
	Get(group Group, ...) offset *Offset

	// Use Group instead of string
	GetAllWithDefaults(group Group, t...) []Offset

	// Use Group instead of string
	Set(group Group, ...) bool
	// ...
}

Cross build for arm64

We should provide arm64 images.

There are a couple of tasks for providing both amd64 and arm64 images under a single tag:

We should conditionally build the docker image (see ./build/container/Dockerfile) with the following env variables for arm64 CGO_ENABLED=1 CC=aarch64-linux-gnu-gcc.
Create a manifest file to include both image types under the same name.

We could create a shell script or something with the steps.

Consuming with start at earliest can not navigate up in the generations after scaling

After scaling up/down, the offset a consumer group that hasn't been tracked yet can't be properly initialized.

Support registering a consumer on all brokers for stateless interface

To have a simpler API for stateless consumers, we should support relaying the register message to all the peers.

Relates to #65

Use dedicated buffers for producing

We stream the request body directly into the compressed writer which is good for saving memory resources but, as coalescing multiple requests into a single chunk is single threaded, it can lead to one slow client adding latency to other unrelated clients.

We should use intermediate buffers to read the bodies in parallel and avoid introducing latency to other requests when a client is slow.

This can have an impact on performance so we should benchmark the brokers as part of this task.

Create static binary for both amd64 and arm64

We should provide static binaries of PolarStreams and document how to install it manually on a linux instance.

To support both amd64 and arm64 we could build them inside containers and then copy them using:

docker create --name dummy IMAGE_NAME
docker cp dummy:/path/to/file /dest/to/file
docker rm -f dummy

On amd64, we can just disable CGO_ENABLED:

CGO_ENABLED=0 go build -ldflags '-s -linkmode external -extldflags=-static' .

But arm64 requires musl.

Document installing without a container

We should document how to setup a cluster on plain Linux instances.

This task depends on #79.

Supporting HTTP/2 on the producer interface requires unbounded memory allocation

HTTP/2 breaks request and responses into multiple streamids.

Unlike other protocols like the Cassandra protocol, it can interleave partial requests/responses.

On the producing server side this is a problem as we can't independently move forward a request body without moving the whole stream forward.

Offset management for dev mode is invalid

It uses incorrect ranges to refer to a dev mode offset value.

Segment file mismatch on replicas

When writing at high throughput rates, its possible that the file name of the replicated data does not match.

Support consuming using HTTP/1 as well

We should support both HTTP/1 and HTTP/2 on the consuming interface.

Relates to #65.

There's no ServeConn() on net/http yet, tho: golang/go#36673

Support moving forward for corrupted files

PolarStreams performs CRC validation of chunks when reading. Currently, it will panic when finding a corrupted chunk. We should retrieve the data from a replica when this occurs.

More producer metrics

We should add more metrics related to the broker producing interface:

Raw requests and sizes
Use the amount of messages in the coalescer; ...
Current size of the allocation

Harden generation creation

Creating new generations due to ownership changes (scaling / failover) should be tested more thoroughly and implementation should be revisited to make sure it resists sudden process kills (power failure) and that the system will self heal.

Revisit consumer assignment

Consumer assignment was made with 1 consumer range per token in mind.

We have to account for different consumer ranges when assigning consumers to token ranges.

Clean data logs periodically

Topic creation, removal, validation and metadata

Support creating and removing topics, allowing disabling automatic creation when producing.
This includes topic validation when producing and other metadata.

We should track the ability to have different consumer ranges by topic on a separate ticket.

Support setting the offset reset policy when a new consumer group is added

Similar to Kafka’s auto.offset.reset, we should support setting the strategy when there’s no offset for a given consumer group.

We could expose the following setting:
“On new consumer group: <start from earliest | start from latest>“

Support consuming without long lived connections

Currently we support producing in a stateless manner, for example you can send a message :

curl -X POST -d '{"hello":"world"}' \
    -H "Content-Type: application/json" \
    "http://barco.streams:9251/v1/topic/my-topic/messages"

It would be awesome if we could also support the same level of client statelessness for consumers, to enable stuff like curl for consuming, for example:

curl -X PUT "http://barco.streams:9252/v1/consumer/register?consumer_id=1"
curl -X POST -H "Accept: application/vnd.barco.consumermessage+json" \ 
    "http://barco.streams:9252/v1/consumer/poll?consumer_id=1"

This would represent a registration of a consumer with id "1", belonging to the "default" consumer group, to get data from all topics, followed by a request to poll the data. The poll response would contain multiple messages:

[
  {
    "topic": "my-topic",
    "token": "-9223372036854775808",
    "rangeIndex": "1",
    "version": "1",
    "startOffset": 123
    "values": [
      {"hello": 1},
      {"hello": 2},
      {"hello": 3}
    ]
  }
]

Provide a Node.JS Client

Node.JS client could be a great feature to allow producing and consuming messages from this language. This client should be published in the npm repositories to be used by the Node.JS community of developers.

This client should be aligned in the same way as the current Go Client.

Warnings deploying in OpenShift 4

I am testing Barco in OpenShift version 4.11 (Kubernetes version v1.24.0+4f0dd4d) but I am getting some warnings of PodSecurity violations.

Following the instructions I found the following warning when the customization is applied:

❯ kubectl apply -k .
namespace/streams created
serviceaccount/barco created
role.rbac.authorization.k8s.io/barco created
clusterrole.rbac.authorization.k8s.io/barco created
rolebinding.rbac.authorization.k8s.io/barco created
clusterrolebinding.rbac.authorization.k8s.io/barco created
service/barco created
Warning: would violate PodSecurity "restricted:v1.24": allowPrivilegeEscalation != false (container "barco" must set securityContext.allowPrivilegeEscalation=false), unrestricted capabilities (container "barco" must set securityContext.capabilities.drop=["ALL"]), runAsNonRoot != true (pod or container "barco" must set securityContext.runAsNonRoot=true), seccompProfile (pod or container "barco" must set securityContext.seccompProfile.type to "RuntimeDefault" or "Localhost")
statefulset.apps/barco created

Steps to reproduce

Use OpenShift Local to start a local instance of OpenShift (Instructions here)
Start it
Execute commands described in Installing Barco Streams on Kubernetes page

Expose `barco` service to be consumed externally

The current k8s definition of the barco service has not included nodePorts, so it is impossible to execute the sample commands to test and verify the production of messages.

Steps to reproduce

Start minikube
Deploy Barco following the instructions
Execute the command to publish a message:

❯ TOPIC="my-topic"
curl -X POST -i -d '{"hello":"world"}' \
    -H "Content-Type: application/json" --http2-prior-knowledge \
    "http://barco.streams:9251/v1/topic/${TOPIC}/messages"
curl: (6) Could not resolve host: barco.streams

Expose service to be consumed externally:

❯ minikube service barco --url -n streams
😿  service streams/barco has no node port

Suggestions

I would like to suggest improving the Getting Started by adding some references to have a local Kubernetes cluster (or using the local development environment) before to try to execute the commands to produce or consume. Otherwise, it is a bit complicated to follow up on the instructions.

Meanwhile the barco service is not available to expose it, there is an alternative way to execute the commands using the exec option of kubectl CLI:

❯ k exec barco-0 -- curl -X POST -i -d '{"hello":"world"}' -H "Content-Type: application/json" --http2-prior-knowledge "http://barco:9251/v1/topic/my-topic/messages"
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100    19  100     2  100    17      1     16  0:00:02  0:00:01  0:00:01    18
HTTP/2 200 
content-type: text/html; charset=utf-8
content-length: 2
date: Mon, 19 Sep 2022 07:12:15 GMT

OK

We should refactor GroupReadQueue to only write the response on some occasions.

Add CRC to interbroker message header

We should add integrity checks for interbroker message header.

https://github.com/barcostreams/barco/blob/72e6f385b71b697733d54b54957b286ef7ef38a3/internal/interbroker/data_messages.go#L47-L54

polarstreams / polar Goto Github PK

polar's People

Contributors

Stargazers

Watchers

Forkers

polar's Issues

Recommend Projects

Recommend Topics

Recommend Org