Giter Site home page Giter Site logo

Question on chunks about redis-timeseries HOT 11 CLOSED

danni-m avatar danni-m commented on May 27, 2024
Question on chunks

from redis-timeseries.

Comments (11)

danni-m avatar danni-m commented on May 27, 2024

the compression branch is done to accommodate a compression backend for space-saving reasons.
I started to work on a Gorilla-typed implementation (www.vldb.org/pvldb/vol8/p1816-teller.pdf).

Regarding the question, a time-based chunks are something I would consider in the future since its easier to evenly distribute.
Technically you can write another backend that is time based, problem is that you need to think what happens if you get a few samples or what happens if you get too many samples.

Im not sure what you trying to achieve? maybe we can find a simpler solution.
How's your input looks like? (reporting every X secs? realtime?)
Why do you need to partition the data by seconds?

from redis-timeseries.

someburner avatar someburner commented on May 27, 2024

@danni-m

Thanks for the feedback (i'll make the PR edits soon).

Input is a real-time input from remote websocket. The websocket feed contains timestamped data. However due to the nature of the data, it is unknown how many samples/sec. Could be anywhere from 1 every 5 seconds to 300/second or so. So not actually a crazy amount. But the issue is more the sporadic nature of the data.

The goal is to be able to automatically sort the data by a certain range of time, and precise intervals are important. I want to store data for future analysis, where the data sets are split by EPOCH % 300. If the chunks were to be split automatically by a time-based cut-off, I believe my work on the front-end of my application will be much easier.

from redis-timeseries.

someburner avatar someburner commented on May 27, 2024

Technically you can write another backend that is time based, problem is that you need to think what happens if you get a few samples or what happens if you get too many samples.

What are the pitfalls of too few or too many? Or is this just a question of exhausting memory.

from redis-timeseries.

danni-m avatar danni-m commented on May 27, 2024

if your data is sporadic, having a samples-per-chunk is probably better, here's why.
If you open a chunk every 5 minutes and a chunk has a elastic size (lets say something like a vector, where you would have a base size and then increase it if you're out of size) then you would have a lot of memory overhead because if you didn't get 100% of the samples your chunk will have an empty tail.

If I understand you correctly, your scenario is read-heavy, so having a more efficient range query would help right?
Currently the range query is O((N/chunks-per-samples)+M) where N is the number of samples and M is the total matching samples, but this can be improved by doing a binary search on the chunk index, this would make the chunks search O(log(N)).

Another way to get a better read performance is data denormalization.
What type of queries are you doing? is it always in 5 minutes buckets?

from redis-timeseries.

someburner avatar someburner commented on May 27, 2024

then you would have a lot of memory overhead because if you didn't get 100% of the samples your chunk will have an empty tail.

Hmm, I may have to look over the chunk code to understand it more first. But the 5 minute buckets are absolute. So if, after 250 seconds, no new samples came in, then the end of that data-set would be 250. And in my particular case it would be useful to know that there were no samples for a given 300 seconds.

A little more background though: I want to use compactions for the 5 minute chunks. So after a week or so, I don't actually care about the individual samples. I'll just want to be able to transform them into a single point for every 5 minutes that contains the min, max, and average.

For now it's not necessarily read-heavy, but could be in the future. The main thing is simplicity of accessing the data. It would be really convenient to be able to do a ts.range over 60 minutes, and then have the data returned as an array of 12x 300-second values.

from redis-timeseries.

danni-m avatar danni-m commented on May 27, 2024

If I understand correctly, you need TS.RANGE to return 0 on buckets without data?
Also, compactions are already implemented in the module so you can already start and use them.

from redis-timeseries.

someburner avatar someburner commented on May 27, 2024

Well, in my case I would just have it return the last point of the previous 5 minutes. And so on. The data is for a trading platform, so if no data comes in, we assume the price is the same.

For now it may be sufficient to just use compactions and trigger them myself before adding the next data point? Or, what triggers the compaction? I was looking at #3 but wasn't sure the actual mechanism from the readme.

from redis-timeseries.

danni-m avatar danni-m commented on May 27, 2024
  1. Sounds like the aggregation you need is last aggregation method.
  2. The compaction is triggered by inserting samples to the origin key. if you didn't report anything no new samples will be added.

Today there's an option in redis to have another thread that will do periodic actions. using this mechanism you can trigger "empty" compaction.

from redis-timeseries.

danni-m avatar danni-m commented on May 27, 2024

Another easy solution is to put some kind of a webserver in front of redis and add the missing data points.

from redis-timeseries.

someburner avatar someburner commented on May 27, 2024

Ah okay I think I know what to do now- I'll just handle the empty separately. Thanks! This can be closed now.

from redis-timeseries.

danni-m avatar danni-m commented on May 27, 2024

@someburner great, if you have any other question, just open a ticket.

from redis-timeseries.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.