Comments (11)
the compression
branch is done to accommodate a compression backend for space-saving reasons.
I started to work on a Gorilla-typed implementation (www.vldb.org/pvldb/vol8/p1816-teller.pdf).
Regarding the question, a time-based chunks are something I would consider in the future since its easier to evenly distribute.
Technically you can write another backend that is time based, problem is that you need to think what happens if you get a few samples or what happens if you get too many samples.
Im not sure what you trying to achieve? maybe we can find a simpler solution.
How's your input looks like? (reporting every X secs? realtime?)
Why do you need to partition the data by seconds?
from redis-timeseries.
Thanks for the feedback (i'll make the PR edits soon).
Input is a real-time input from remote websocket. The websocket feed contains timestamped data. However due to the nature of the data, it is unknown how many samples/sec. Could be anywhere from 1 every 5 seconds to 300/second or so. So not actually a crazy amount. But the issue is more the sporadic nature of the data.
The goal is to be able to automatically sort the data by a certain range of time, and precise intervals are important. I want to store data for future analysis, where the data sets are split by EPOCH % 300
. If the chunks were to be split automatically by a time-based cut-off, I believe my work on the front-end of my application will be much easier.
from redis-timeseries.
Technically you can write another backend that is time based, problem is that you need to think what happens if you get a few samples or what happens if you get too many samples.
What are the pitfalls of too few or too many? Or is this just a question of exhausting memory.
from redis-timeseries.
if your data is sporadic, having a samples-per-chunk is probably better, here's why.
If you open a chunk every 5 minutes and a chunk has a elastic size (lets say something like a vector, where you would have a base size and then increase it if you're out of size) then you would have a lot of memory overhead because if you didn't get 100% of the samples your chunk will have an empty tail.
If I understand you correctly, your scenario is read-heavy, so having a more efficient range
query would help right?
Currently the range query is O((N/chunks-per-samples)+M) where N is the number of samples and M is the total matching samples, but this can be improved by doing a binary search on the chunk index, this would make the chunks search O(log(N)).
Another way to get a better read performance is data denormalization.
What type of queries are you doing? is it always in 5 minutes buckets?
from redis-timeseries.
then you would have a lot of memory overhead because if you didn't get 100% of the samples your chunk will have an empty tail.
Hmm, I may have to look over the chunk code to understand it more first. But the 5 minute buckets are absolute. So if, after 250 seconds, no new samples came in, then the end of that data-set would be 250. And in my particular case it would be useful to know that there were no samples for a given 300 seconds.
A little more background though: I want to use compactions for the 5 minute chunks. So after a week or so, I don't actually care about the individual samples. I'll just want to be able to transform them into a single point for every 5 minutes that contains the min, max, and average.
For now it's not necessarily read-heavy, but could be in the future. The main thing is simplicity of accessing the data. It would be really convenient to be able to do a ts.range
over 60 minutes, and then have the data returned as an array of 12x 300-second values.
from redis-timeseries.
If I understand correctly, you need TS.RANGE
to return 0 on buckets without data?
Also, compactions are already implemented in the module so you can already start and use them.
from redis-timeseries.
Well, in my case I would just have it return the last point of the previous 5 minutes. And so on. The data is for a trading platform, so if no data comes in, we assume the price is the same.
For now it may be sufficient to just use compactions and trigger them myself before adding the next data point? Or, what triggers the compaction? I was looking at #3 but wasn't sure the actual mechanism from the readme.
from redis-timeseries.
- Sounds like the aggregation you need is
last
aggregation method. - The compaction is triggered by inserting samples to the origin key. if you didn't report anything no new samples will be added.
Today there's an option in redis to have another thread that will do periodic actions. using this mechanism you can trigger "empty" compaction.
from redis-timeseries.
Another easy solution is to put some kind of a webserver in front of redis and add the missing data points.
from redis-timeseries.
Ah okay I think I know what to do now- I'll just handle the empty separately. Thanks! This can be closed now.
from redis-timeseries.
@someburner great, if you have any other question, just open a ticket.
from redis-timeseries.
Related Issues (20)
- Add TS.RANGE command complexity description HOT 2
- Redis rdb save crashes if no samples yet HOT 1
- has it tested on arm? HOT 1
- Contributing a feature - is it relevant? HOT 3
- Retain compaction context in rdb
- Calculate correlation between two keys
- Timestamp Too Old HOT 10
- Timestamp with milliseond precision HOT 1
- memory usage HOT 10
- Add NX argument to TS.CREATE HOT 4
- Support automatic timestamps for arbitrary values
- Support adding multiple or weighted values HOT 1
- Multiple compaction rules on a key cause crash on redis startup HOT 1
- Replacing the current value in a series corrupts aggregate values
- Timeseries starting from 0 are unsupported HOT 1
- How to use int64 timestamp HOT 1
- how does retentionSecs works? HOT 2
- Compilation Issue on MacOS Sierra v10.12.6 HOT 2
- Compressing timestamps HOT 6
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from redis-timeseries.