Comments (2)
That would be an interesting addition to enable this package to compute moving averages, for example. Did you see any documentation on Sliding widow in Faust? the only options I found are Hopping and Tumbling windows.
As for the performance, Faust leverages Kafka's parallelism model. In principle, if you need a large sliding window or process large messages you can increase the number of topic partitions and run more Faust workers.
from kafka-aggregator.
Yeah, there's a sliding window class, and I have an open issue on the faust-streaming repo which provides links to several classes and modules that might provide access. I just can't seem to get any of them to work as I would expect them to.
I think in general people would like some basic aggregates - counts, mean, min, max, sum, in a continuously moving window.
In my mind the basic way would be to perform a calculation/function when something enters the window, then perform the opposite function to remove that when one exits the window. Normal windows know when something leaves the window.
With my particular use case the issue is that some keyed messages might sporadically produce thousands of messages more than usual or than others, so a particular partition might become overloaded without warning and randomly. Considering the window is 3 months long I wondered whether there's a topic size limit to consider - say 1 message per second for a misbehaving/troubled input, that's ~8M records over the window period.
My understanding (and correct me if I'm wrong) is that a stream processor (aka agent) can be run by multiple workers, but those workers can only read from specific partitions, so if you need to process by key then you need to ensure that the keys always end up in the same partition. that makes it painful when performing aggregations if a single partition by chance ends up with a a few million more records than all others. If those aggregations are held in a table, then each worker can only see it's own table and not those of others, meaning you can't split the workload across multiple workers.
If there's another way to do it, please let me know.
from kafka-aggregator.
Related Issues (3)
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from kafka-aggregator.