Deion I have a few questions on zeeqs architecture <ol d

Why does zeeqs need hazelcast exporter?,about camunda-community-hub/zeeqs

Comments (9)

saig0 commented on May 30, 2024

As I understand the way zeeqs functions, it reads the workflow events from hazelcast and pushes them to postgres db for persistence. I am curious on why is it designed this way? Can zeebe not export the events directly to a postgres db and zeeqs read from postgres db?

Zeebe exports many records. ZeeQS consumes these records and stores them in an aggregated way.

The aggregation and storing could be done in the Zeebe exporter itself. But it would put an additional load on Zeebe.
And this way, multiple applications can consume the Hazelcast stream and build their own state.

What is the recommended production configuration for hazelcast? Should it be running in embedded mode or standalone mode? We observed that when hazelcast runs in standalone mode if it restarts, zeebe also needs to be restarted to connect with hazelcast.

I would not recommend running it in production.

However, the best option would be running a standalone Hazelcast instance or cluster. Otherwise, you may run into issues if the broker has multiple partitions or is restarted.

from zeeqs.

shahamit commented on May 30, 2024

The aggregation and storing could be done in the Zeebe exporter itself. But it would put an additional load on Zeebe.

I am not sure if my first post was misread. The flow I meant was - zeebe -> postgres <-> zeeqs. Why would it be an additional load on zeebe to export the event stream to postgres exporter vs exporting it to hazelcast?

And this way, multiple applications can consume the Hazelcast stream and build their own state.

One curiosity question - why didn't zeeqs use elasticsearch exporter that is maintained by the zeebe team?

I would not recommend running it in production.

Oh. Since zeebe doesn't provide a programmatic way to find the status of a running/completed workflow (at least in my findings) we are planning to use zeeqs apis to let our UI know whats happening with the workflow. Would you have any other recommendation for this problem statement?

However, the best option would be running a standalone Hazelcast instance or cluster.

Ok. Would you know why we need a zeebe restart if hazelcast restarts for some reason? Shouldn't zeebe be able to connect back to hazelcast once its back up?

Also one more question - does zeebe store the events data locally if hazelcast crashes and send those events to it when it comes up?

Thanks

from zeeqs.

saig0 commented on May 30, 2024

I am not sure if my first post was misread. The flow I meant was - zeebe -> postgres <-> zeeqs. Why would it be an additional load on zeebe to export the event stream to postgres exporter vs exporting it to hazelcast?

Well, we could have an exporter that stores the records simply in PostgreSQL. But ZeeQS does more than this. It stores the data in an aggregated view. The view is "optimized" for how the data is accessed.

One curiosity question - why didn't zeeqs use elasticsearch exporter that is maintained by the zeebe team?

It could. And it would be nice. Hazelcast has the advantage that it is easier to set up. It can run in memory and is fast.

However, I'm open to contributions to support the ES exporter.

Since zeebe doesn't provide a programmatic way to find the status of a running/completed workflow (at least in my findings) we are planning to use zeeqs apis to let our UI know whats happening with the workflow. Would you have any other recommendation for this problem statement?

If you can, use the Operate API for production.

If not, ZeeQS is a good option. But you may need to configure Hazelcast to your needs to avoid data loss.

Would you know why we need a zeebe restart if hazelcast restarts for some reason? Shouldn't zeebe be able to connect back to hazelcast once its back up?

I didn't test this case. But it would be good if the Hazelcast exporter could handle this case.

Please share your experience if possible.

does zeebe store the events data locally if hazelcast crashes and send those events to it when it comes up?

Zeebe stores the records until all exporters acknowledge a record. Currently, the Hazelcast exporter acknowledge the record if it is added to the ring-buffer.

from zeeqs.

shahamit commented on May 30, 2024

Alright. That answers most of my questions in some way.
Before we close this issue one last question - The direction of the arrow from hazelcast (and elasticsearch) to zeeqs should be opposite - zeeqs should be reading the data from hazelcast instead of hazelcast pushing them to zeeqs right?

from zeeqs.

saig0 commented on May 30, 2024

The direction of the arrow from hazelcast (and elasticsearch) to zeeqs should be opposite - zeeqs should be reading the data from hazelcast instead of hazelcast pushing them to zeeqs right?

It depends on how you look at it. ZeeQS reads data from Hazelcast. The reading is implemented as a pull from Hazelcast but exposed as listeners in ZeeQS.

You could see the arrows also as the data flow. The data is exported from the broker to Hazelcast. Then, the data is imported from Hazelcast to ZeeQS. And, stored in the database.

from zeeqs.

shahamit commented on May 30, 2024

When doing a performance test, we noticed the postgres db under load. While other components can scale, it is hard to scale the postgres db. Hence I have one more question

If we deploy hazelcast in clustered mode, can zeeqs read directly from hazelcast and respond to graphql queries? I understand that this needs code modifications but the question is mainly from a design standpoint - would it be mandatory to store data in postgres?

Thanks

from zeeqs.

saig0 commented on May 30, 2024

If we deploy hazelcast in clustered mode, can zeeqs read directly from hazelcast and respond to graphql queries? I understand that this needs code modifications but the question is mainly from a design standpoint - would it be mandatory to store data in postgres?

Well. ZeeQS needs to aggregate the data. For example, to collect all data of a process instance.

Instead of storing the data in a database, ZeeQS could store the data in memory. But memory is limited.

ZeeQS could read the data ad-hoc from Hazelcast but the Hazelcast ringbuffer is limited. And it is probably not efficient to read from Hazelcast every time.

First of all, ZeeQS is not optimized for performance. However, you could try to tune it by removing the data from the database that you don't need anymore.

from zeeqs.

shahamit commented on May 30, 2024

Instead of storing the data in a database, ZeeQS could store the data in memory. But memory is limited.
ZeeQS could read the data ad-hoc from Hazelcast but the Hazelcast ringbuffer is limited. And it is probably not efficient to read from Hazelcast every time.

With my limited knowledge about hazelcast, I am thinking memory won't be limited. Given that it can be scaled horizontally and every entry could have a time to live (TTL) we could always evict the entries in this in-memory data grid(IMDG) that are older than few days/months.
Could you share more inputs on why won't be efficient to read from Hazelcast every time being an IMDG?

However, you could try to tune it by removing the data from the database that you don't need anymore.

We could do this but right now when we are running the benchmark-tool it won't be of any help I think. Let me know if you have more ideas here to share.

One more observation - we also observed that the operate application too was very slow in showing the processes spawned by the benchmark tool on the UI. That leads me to the thought that could zeebe export code itself be slow?

Thanks.

from zeeqs.

saig0 commented on May 30, 2024

Could you share more inputs on why won't be efficient to read from Hazelcast every time being an IMDG?

First: To collect these data, ZeeQS must read and aggregate all records in Hazelcast. For example, the job state is aggregated over multiple job records.

Second: The GraphQL API fetches the connections (e.g. process instance -> jobs) from the database.

Reading Hazelcast for each sub-query would result in high latency.

That leads me to the thought that could zeebe export code itself be slow?

Maybe, you could check the metrics in Zeebe.

from zeeqs.

Why does zeeqs need hazelcast exporter? about zeeqs HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent