Giter Site home page Giter Site logo

Comments (2)

afausti avatar afausti commented on June 3, 2024

That would be an interesting addition to enable this package to compute moving averages, for example. Did you see any documentation on Sliding widow in Faust? the only options I found are Hopping and Tumbling windows.

As for the performance, Faust leverages Kafka's parallelism model. In principle, if you need a large sliding window or process large messages you can increase the number of topic partitions and run more Faust workers.

from kafka-aggregator.

fonty422 avatar fonty422 commented on June 3, 2024

Yeah, there's a sliding window class, and I have an open issue on the faust-streaming repo which provides links to several classes and modules that might provide access. I just can't seem to get any of them to work as I would expect them to.
I think in general people would like some basic aggregates - counts, mean, min, max, sum, in a continuously moving window.
In my mind the basic way would be to perform a calculation/function when something enters the window, then perform the opposite function to remove that when one exits the window. Normal windows know when something leaves the window.

With my particular use case the issue is that some keyed messages might sporadically produce thousands of messages more than usual or than others, so a particular partition might become overloaded without warning and randomly. Considering the window is 3 months long I wondered whether there's a topic size limit to consider - say 1 message per second for a misbehaving/troubled input, that's ~8M records over the window period.
My understanding (and correct me if I'm wrong) is that a stream processor (aka agent) can be run by multiple workers, but those workers can only read from specific partitions, so if you need to process by key then you need to ensure that the keys always end up in the same partition. that makes it painful when performing aggregations if a single partition by chance ends up with a a few million more records than all others. If those aggregations are held in a table, then each worker can only see it's own table and not those of others, meaning you can't split the workload across multiple workers.

If there's another way to do it, please let me know.

from kafka-aggregator.

Related Issues (3)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.