Giter Site home page Giter Site logo

Comments (9)

saig0 avatar saig0 commented on May 30, 2024

As I understand the way zeeqs functions, it reads the workflow events from hazelcast and pushes them to postgres db for persistence. I am curious on why is it designed this way? Can zeebe not export the events directly to a postgres db and zeeqs read from postgres db?

Zeebe exports many records. ZeeQS consumes these records and stores them in an aggregated way.

The aggregation and storing could be done in the Zeebe exporter itself. But it would put an additional load on Zeebe.
And this way, multiple applications can consume the Hazelcast stream and build their own state.

What is the recommended production configuration for hazelcast? Should it be running in embedded mode or standalone mode? We observed that when hazelcast runs in standalone mode if it restarts, zeebe also needs to be restarted to connect with hazelcast.

I would not recommend running it in production.

However, the best option would be running a standalone Hazelcast instance or cluster. Otherwise, you may run into issues if the broker has multiple partitions or is restarted.

from zeeqs.

shahamit avatar shahamit commented on May 30, 2024

The aggregation and storing could be done in the Zeebe exporter itself. But it would put an additional load on Zeebe.

I am not sure if my first post was misread. The flow I meant was - zeebe -> postgres <-> zeeqs. Why would it be an additional load on zeebe to export the event stream to postgres exporter vs exporting it to hazelcast?

And this way, multiple applications can consume the Hazelcast stream and build their own state.

One curiosity question - why didn't zeeqs use elasticsearch exporter that is maintained by the zeebe team?

I would not recommend running it in production.

Oh. Since zeebe doesn't provide a programmatic way to find the status of a running/completed workflow (at least in my findings) we are planning to use zeeqs apis to let our UI know whats happening with the workflow. Would you have any other recommendation for this problem statement?

However, the best option would be running a standalone Hazelcast instance or cluster.

Ok. Would you know why we need a zeebe restart if hazelcast restarts for some reason? Shouldn't zeebe be able to connect back to hazelcast once its back up?

Also one more question - does zeebe store the events data locally if hazelcast crashes and send those events to it when it comes up?

Thanks

from zeeqs.

saig0 avatar saig0 commented on May 30, 2024

I am not sure if my first post was misread. The flow I meant was - zeebe -> postgres <-> zeeqs. Why would it be an additional load on zeebe to export the event stream to postgres exporter vs exporting it to hazelcast?

Well, we could have an exporter that stores the records simply in PostgreSQL. But ZeeQS does more than this. It stores the data in an aggregated view. The view is "optimized" for how the data is accessed.

One curiosity question - why didn't zeeqs use elasticsearch exporter that is maintained by the zeebe team?

It could. And it would be nice. Hazelcast has the advantage that it is easier to set up. It can run in memory and is fast.

However, I'm open to contributions to support the ES exporter.

Since zeebe doesn't provide a programmatic way to find the status of a running/completed workflow (at least in my findings) we are planning to use zeeqs apis to let our UI know whats happening with the workflow. Would you have any other recommendation for this problem statement?

If you can, use the Operate API for production.

If not, ZeeQS is a good option. But you may need to configure Hazelcast to your needs to avoid data loss.

Would you know why we need a zeebe restart if hazelcast restarts for some reason? Shouldn't zeebe be able to connect back to hazelcast once its back up?

I didn't test this case. But it would be good if the Hazelcast exporter could handle this case.

Please share your experience if possible.

does zeebe store the events data locally if hazelcast crashes and send those events to it when it comes up?

Zeebe stores the records until all exporters acknowledge a record. Currently, the Hazelcast exporter acknowledge the record if it is added to the ring-buffer.

from zeeqs.

shahamit avatar shahamit commented on May 30, 2024

Alright. That answers most of my questions in some way.
Before we close this issue one last question - The direction of the arrow from hazelcast (and elasticsearch) to zeeqs should be opposite - zeeqs should be reading the data from hazelcast instead of hazelcast pushing them to zeeqs right?
image

from zeeqs.

saig0 avatar saig0 commented on May 30, 2024

The direction of the arrow from hazelcast (and elasticsearch) to zeeqs should be opposite - zeeqs should be reading the data from hazelcast instead of hazelcast pushing them to zeeqs right?

It depends on how you look at it. ZeeQS reads data from Hazelcast. The reading is implemented as a pull from Hazelcast but exposed as listeners in ZeeQS.

You could see the arrows also as the data flow. The data is exported from the broker to Hazelcast. Then, the data is imported from Hazelcast to ZeeQS. And, stored in the database.

from zeeqs.

shahamit avatar shahamit commented on May 30, 2024

When doing a performance test, we noticed the postgres db under load. While other components can scale, it is hard to scale the postgres db. Hence I have one more question

If we deploy hazelcast in clustered mode, can zeeqs read directly from hazelcast and respond to graphql queries? I understand that this needs code modifications but the question is mainly from a design standpoint - would it be mandatory to store data in postgres?

Thanks

from zeeqs.

saig0 avatar saig0 commented on May 30, 2024

If we deploy hazelcast in clustered mode, can zeeqs read directly from hazelcast and respond to graphql queries? I understand that this needs code modifications but the question is mainly from a design standpoint - would it be mandatory to store data in postgres?

Well. ZeeQS needs to aggregate the data. For example, to collect all data of a process instance.

Instead of storing the data in a database, ZeeQS could store the data in memory. But memory is limited.

ZeeQS could read the data ad-hoc from Hazelcast but the Hazelcast ringbuffer is limited. And it is probably not efficient to read from Hazelcast every time.

First of all, ZeeQS is not optimized for performance. However, you could try to tune it by removing the data from the database that you don't need anymore.

from zeeqs.

shahamit avatar shahamit commented on May 30, 2024

Instead of storing the data in a database, ZeeQS could store the data in memory. But memory is limited.
ZeeQS could read the data ad-hoc from Hazelcast but the Hazelcast ringbuffer is limited. And it is probably not efficient to read from Hazelcast every time.

With my limited knowledge about hazelcast, I am thinking memory won't be limited. Given that it can be scaled horizontally and every entry could have a time to live (TTL) we could always evict the entries in this in-memory data grid(IMDG) that are older than few days/months.
Could you share more inputs on why won't be efficient to read from Hazelcast every time being an IMDG?

However, you could try to tune it by removing the data from the database that you don't need anymore.

We could do this but right now when we are running the benchmark-tool it won't be of any help I think. Let me know if you have more ideas here to share.

One more observation - we also observed that the operate application too was very slow in showing the processes spawned by the benchmark tool on the UI. That leads me to the thought that could zeebe export code itself be slow?

Thanks.

from zeeqs.

saig0 avatar saig0 commented on May 30, 2024

Could you share more inputs on why won't be efficient to read from Hazelcast every time being an IMDG?

First: To collect these data, ZeeQS must read and aggregate all records in Hazelcast. For example, the job state is aggregated over multiple job records.

Second: The GraphQL API fetches the connections (e.g. process instance -> jobs) from the database.

Reading Hazelcast for each sub-query would result in high latency.

That leads me to the thought that could zeebe export code itself be slow?

Maybe, you could check the metrics in Zeebe.

from zeeqs.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.