Hey so for my purposes I want to separate chunks by a certain time range. For example,

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Sounds like the aggregation you need is last ag

Question on chunks about redis-timeseries HOT 11 CLOSED

danni-m commented on May 27, 2024

Question on chunks

from redis-timeseries.

Comments (11)

danni-m commented on May 27, 2024

the compression branch is done to accommodate a compression backend for space-saving reasons.
I started to work on a Gorilla-typed implementation (www.vldb.org/pvldb/vol8/p1816-teller.pdf).

Regarding the question, a time-based chunks are something I would consider in the future since its easier to evenly distribute.
Technically you can write another backend that is time based, problem is that you need to think what happens if you get a few samples or what happens if you get too many samples.

Im not sure what you trying to achieve? maybe we can find a simpler solution.
How's your input looks like? (reporting every X secs? realtime?)
Why do you need to partition the data by seconds?

from redis-timeseries.

someburner commented on May 27, 2024

@danni-m

Thanks for the feedback (i'll make the PR edits soon).

Input is a real-time input from remote websocket. The websocket feed contains timestamped data. However due to the nature of the data, it is unknown how many samples/sec. Could be anywhere from 1 every 5 seconds to 300/second or so. So not actually a crazy amount. But the issue is more the sporadic nature of the data.

The goal is to be able to automatically sort the data by a certain range of time, and precise intervals are important. I want to store data for future analysis, where the data sets are split by EPOCH % 300. If the chunks were to be split automatically by a time-based cut-off, I believe my work on the front-end of my application will be much easier.

from redis-timeseries.

someburner commented on May 27, 2024

Technically you can write another backend that is time based, problem is that you need to think what happens if you get a few samples or what happens if you get too many samples.

What are the pitfalls of too few or too many? Or is this just a question of exhausting memory.

from redis-timeseries.

danni-m commented on May 27, 2024

if your data is sporadic, having a samples-per-chunk is probably better, here's why.
If you open a chunk every 5 minutes and a chunk has a elastic size (lets say something like a vector, where you would have a base size and then increase it if you're out of size) then you would have a lot of memory overhead because if you didn't get 100% of the samples your chunk will have an empty tail.

If I understand you correctly, your scenario is read-heavy, so having a more efficient range query would help right?
Currently the range query is O((N/chunks-per-samples)+M) where N is the number of samples and M is the total matching samples, but this can be improved by doing a binary search on the chunk index, this would make the chunks search O(log(N)).

Another way to get a better read performance is data denormalization.
What type of queries are you doing? is it always in 5 minutes buckets?

from redis-timeseries.

someburner commented on May 27, 2024

then you would have a lot of memory overhead because if you didn't get 100% of the samples your chunk will have an empty tail.

Hmm, I may have to look over the chunk code to understand it more first. But the 5 minute buckets are absolute. So if, after 250 seconds, no new samples came in, then the end of that data-set would be 250. And in my particular case it would be useful to know that there were no samples for a given 300 seconds.

A little more background though: I want to use compactions for the 5 minute chunks. So after a week or so, I don't actually care about the individual samples. I'll just want to be able to transform them into a single point for every 5 minutes that contains the min, max, and average.

For now it's not necessarily read-heavy, but could be in the future. The main thing is simplicity of accessing the data. It would be really convenient to be able to do a ts.range over 60 minutes, and then have the data returned as an array of 12x 300-second values.

from redis-timeseries.

danni-m commented on May 27, 2024

If I understand correctly, you need TS.RANGE to return 0 on buckets without data?
Also, compactions are already implemented in the module so you can already start and use them.

from redis-timeseries.

someburner commented on May 27, 2024

Well, in my case I would just have it return the last point of the previous 5 minutes. And so on. The data is for a trading platform, so if no data comes in, we assume the price is the same.

For now it may be sufficient to just use compactions and trigger them myself before adding the next data point? Or, what triggers the compaction? I was looking at #3 but wasn't sure the actual mechanism from the readme.

from redis-timeseries.

danni-m commented on May 27, 2024

Sounds like the aggregation you need is last aggregation method.
The compaction is triggered by inserting samples to the origin key. if you didn't report anything no new samples will be added.

Today there's an option in redis to have another thread that will do periodic actions. using this mechanism you can trigger "empty" compaction.

from redis-timeseries.

danni-m commented on May 27, 2024

Another easy solution is to put some kind of a webserver in front of redis and add the missing data points.

from redis-timeseries.

someburner commented on May 27, 2024

Ah okay I think I know what to do now- I'll just handle the empty separately. Thanks! This can be closed now.

from redis-timeseries.

danni-m commented on May 27, 2024

@someburner great, if you have any other question, just open a ticket.

from redis-timeseries.

Question on chunks about redis-timeseries HOT 11 CLOSED

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent