Comments (9)
As I understand the way zeeqs functions, it reads the workflow events from hazelcast and pushes them to postgres db for persistence. I am curious on why is it designed this way? Can zeebe not export the events directly to a postgres db and zeeqs read from postgres db?
Zeebe exports many records. ZeeQS consumes these records and stores them in an aggregated way.
The aggregation and storing could be done in the Zeebe exporter itself. But it would put an additional load on Zeebe.
And this way, multiple applications can consume the Hazelcast stream and build their own state.
What is the recommended production configuration for hazelcast? Should it be running in embedded mode or standalone mode? We observed that when hazelcast runs in standalone mode if it restarts, zeebe also needs to be restarted to connect with hazelcast.
I would not recommend running it in production.
However, the best option would be running a standalone Hazelcast instance or cluster. Otherwise, you may run into issues if the broker has multiple partitions or is restarted.
from zeeqs.
The aggregation and storing could be done in the Zeebe exporter itself. But it would put an additional load on Zeebe.
I am not sure if my first post was misread. The flow I meant was - zeebe -> postgres <-> zeeqs. Why would it be an additional load on zeebe to export the event stream to postgres exporter vs exporting it to hazelcast?
And this way, multiple applications can consume the Hazelcast stream and build their own state.
One curiosity question - why didn't zeeqs use elasticsearch exporter that is maintained by the zeebe team?
I would not recommend running it in production.
Oh. Since zeebe doesn't provide a programmatic way to find the status of a running/completed workflow (at least in my findings) we are planning to use zeeqs apis to let our UI know whats happening with the workflow. Would you have any other recommendation for this problem statement?
However, the best option would be running a standalone Hazelcast instance or cluster.
Ok. Would you know why we need a zeebe restart if hazelcast restarts for some reason? Shouldn't zeebe be able to connect back to hazelcast once its back up?
Also one more question - does zeebe store the events data locally if hazelcast crashes and send those events to it when it comes up?
Thanks
from zeeqs.
I am not sure if my first post was misread. The flow I meant was - zeebe -> postgres <-> zeeqs. Why would it be an additional load on zeebe to export the event stream to postgres exporter vs exporting it to hazelcast?
Well, we could have an exporter that stores the records simply in PostgreSQL. But ZeeQS does more than this. It stores the data in an aggregated view. The view is "optimized" for how the data is accessed.
One curiosity question - why didn't zeeqs use elasticsearch exporter that is maintained by the zeebe team?
It could. And it would be nice. Hazelcast has the advantage that it is easier to set up. It can run in memory and is fast.
However, I'm open to contributions to support the ES exporter.
Since zeebe doesn't provide a programmatic way to find the status of a running/completed workflow (at least in my findings) we are planning to use zeeqs apis to let our UI know whats happening with the workflow. Would you have any other recommendation for this problem statement?
If you can, use the Operate API for production.
If not, ZeeQS is a good option. But you may need to configure Hazelcast to your needs to avoid data loss.
Would you know why we need a zeebe restart if hazelcast restarts for some reason? Shouldn't zeebe be able to connect back to hazelcast once its back up?
I didn't test this case. But it would be good if the Hazelcast exporter could handle this case.
Please share your experience if possible.
does zeebe store the events data locally if hazelcast crashes and send those events to it when it comes up?
Zeebe stores the records until all exporters acknowledge a record. Currently, the Hazelcast exporter acknowledge the record if it is added to the ring-buffer.
from zeeqs.
Alright. That answers most of my questions in some way.
Before we close this issue one last question - The direction of the arrow from hazelcast (and elasticsearch) to zeeqs should be opposite - zeeqs should be reading the data from hazelcast instead of hazelcast pushing them to zeeqs right?
from zeeqs.
The direction of the arrow from hazelcast (and elasticsearch) to zeeqs should be opposite - zeeqs should be reading the data from hazelcast instead of hazelcast pushing them to zeeqs right?
It depends on how you look at it. ZeeQS reads data from Hazelcast. The reading is implemented as a pull from Hazelcast but exposed as listeners in ZeeQS.
You could see the arrows also as the data flow. The data is exported from the broker to Hazelcast. Then, the data is imported from Hazelcast to ZeeQS. And, stored in the database.
from zeeqs.
When doing a performance test, we noticed the postgres db under load. While other components can scale, it is hard to scale the postgres db. Hence I have one more question
If we deploy hazelcast in clustered mode, can zeeqs read directly from hazelcast and respond to graphql queries? I understand that this needs code modifications but the question is mainly from a design standpoint - would it be mandatory to store data in postgres?
Thanks
from zeeqs.
If we deploy hazelcast in clustered mode, can zeeqs read directly from hazelcast and respond to graphql queries? I understand that this needs code modifications but the question is mainly from a design standpoint - would it be mandatory to store data in postgres?
Well. ZeeQS needs to aggregate the data. For example, to collect all data of a process instance.
Instead of storing the data in a database, ZeeQS could store the data in memory. But memory is limited.
ZeeQS could read the data ad-hoc from Hazelcast but the Hazelcast ringbuffer is limited. And it is probably not efficient to read from Hazelcast every time.
First of all, ZeeQS is not optimized for performance. However, you could try to tune it by removing the data from the database that you don't need anymore.
from zeeqs.
Instead of storing the data in a database, ZeeQS could store the data in memory. But memory is limited.
ZeeQS could read the data ad-hoc from Hazelcast but the Hazelcast ringbuffer is limited. And it is probably not efficient to read from Hazelcast every time.
With my limited knowledge about hazelcast, I am thinking memory won't be limited. Given that it can be scaled horizontally and every entry could have a time to live (TTL) we could always evict the entries in this in-memory data grid(IMDG) that are older than few days/months.
Could you share more inputs on why won't be efficient to read from Hazelcast every time being an IMDG?
However, you could try to tune it by removing the data from the database that you don't need anymore.
We could do this but right now when we are running the benchmark-tool it won't be of any help I think. Let me know if you have more ideas here to share.
One more observation - we also observed that the operate application too was very slow in showing the processes spawned by the benchmark tool on the UI. That leads me to the thought that could zeebe export code itself be slow?
Thanks.
from zeeqs.
Could you share more inputs on why won't be efficient to read from Hazelcast every time being an IMDG?
First: To collect these data, ZeeQS must read and aggregate all records in Hazelcast. For example, the job state is aggregated over multiple job records.
Second: The GraphQL API fetches the connections (e.g. process instance -> jobs) from the database.
Reading Hazelcast for each sub-query would result in high latency.
That leads me to the thought that could zeebe export code itself be slow?
Maybe, you could check the metrics in Zeebe.
from zeeqs.
Related Issues (20)
- Add helm charts to the repo HOT 1
- A timer of a timer start event links to the created process instance HOT 1
- Return only timers of root timer start events for process
- Importing process records fail with PostgreSQL HOT 1
- Elasticsearch exporter HOT 1
- Is GraphiQL disabled when running on docker-compose? HOT 4
- Does Zeeqs really guarantee the order of events as they occurred? HOT 5
- Build ZeeQS for multiple architectures
- I can access user task's metadata
- I can query user tasks HOT 2
- I can query decisions and decision requirements
- I can fetch decision evaluations
- Optimize GraphQL with batching
- Improve GraphQL tests with Spring
- Add `bpmn:documentation` for BpmnElement HOT 2
- Persistence not working with `docker-compose --profile postgres up` HOT 6
- why do you use Zeebe HOT 2
- Enabling JSON logging to ship logs to EFK stack HOT 2
- Import Zeebe User Task from 8.5 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from zeeqs.