redhatinsights / patchman-engine Goto Github PK
View Code? Open in Web Editor NEWThe purpose of this repository is to store source code for System Patch Manager application
License: GNU General Public License v3.0
The purpose of this repository is to store source code for System Patch Manager application
License: GNU General Public License v3.0
In order to understand the problem, you must understand how kafka works. In essence kafka is a set of append-only files, with a distributed key-value store. Each kafka topic-partition pair corresponds to single file, each entry in this file being a message. Then, on top of this, kafka uses zookeper to store offsets into these logs, which denote position of last processed message.
Kafka consumer is very simple. On start it does some setup, picks which partitions it will read from, and reads offsets from zookepeer. Then it contacts kafka server, and waits for new messages being appended to chosen partitons. After correctly reading these messages, a consumer will have to at some point commit the fact that it read them. This is done by writing message offset into zookeeper.
In the event of consumer crash, it can read where it left of, and continue processing messages without skipping or double processing the messages.
These commits can be automatic or manual. In automatic commit, the consumer commits messages immediately after receiving them. In the manual case, programmer can chose when to commit the messages.
Note: Each partition can be consumed by at most 1 consumer from single CMSfR application, but each consumer can read from multiple partitions.
The current implementation of our kafka readers utilized automatic immediate commiting, which
meant that upon receiving message, this message was immediately committed as processed sucessfully. This posed a problem, when process crashed after receiving messages. The received messages would be committed, but they would be processed asynchronously later.
| receive m1 | commit m1 | receive m2 | commit m2 | ---- spawn m1 and m2 ----> | handle m1 | handle m2 |
and for single message:
| receive m1 | commit m1 | ----- spawn m1-----> | handle m1 |
This mean't if process crashed:
| receive m1 | commit m1 | ~~~ crash ~~~~
we would have not processed m1
correctly, but it still would be commited.
This pattern was implemented because it allowed us to spawn a goroutine for each message, and therefore allowed us almost infinite concurrency. It was an easy way to get up and running.
The solution is to use manual commits, and commit messages only after they have been correctly processed. This however requires us to process messages synchronously, reducing overall throughput. This reduction is also bad because we are bound by waiting on inventory & vmaas and not by actual work done in this component.
This change would result in following flow for messages m1
and m2
| receive m1 | handle m1 | commit m1 | receive m2 | handle m2 | commit m2 |
Since the handle
phase takes much longer than receive
and commit
it means that we incur a lot of lag when handling a lot of messages. This is in some way alleviated by the fact, that we can run multiple independent listener
component instances. Each listener has own consumer, which can process messages independently.
If we want to introduce concurrency in one listener, we need to ensure, that messages are committed
in the same order they were received. If we don't do this, we might create a situation which would allow skipping of messages and double processing of messages. This is caused by segmentio/kafka-go. The library which we are using is just sending raw message offsets without doing any internal reordering, and if we have following flow:
| receive m1 | receive m2 | handle m2 | commit m2 | ~~~ crash ~~~
We would have commited m2 before even processing m1, and that means we have missed m1.
Partitions | Listener pods | Consumers per pod | Goroutines per consumer | + | - | Concurrency |
---|---|---|---|---|---|---|
P | N | 1 | 1 | Simple | Slow, Pod overhead | min(N,P) |
P | N | M | 1 | Low pod overhead | Error handling, complexity | min(N*M, P) |
P | N | M | S | Max throughput | Complexity - work-queue in golang | N*S |
Throughput. Total throughput of our application can be roughly approximated by how many systems we can process concurrently and how much time takes one system throughput = concurrency / latency
. 10 systems in parallel / 1 second per system = 10 systems per second
If we implement our system with concurrency of 20
, and slowest stage of the processing pipeline takes 500 ms
, then we will be able to process at most 20 / 0.5 = 40
systems per second.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.