graphite-project / whisper Goto Github PK

View Code? Open in Web Editor NEW

1.2K 78.0 326.0 506 KB

Whisper is a file-based time-series database format for Graphite.

Home Page: http://graphite.readthedocs.org/

License: Apache License 2.0

Python 100.00%

whisper python graphite graphite-components library time-series metrics

whisper's People

Contributors

Stargazers

Watchers

Forkers

rsampaio stfp huhongbo sejeff mleinart msabramo accelerationnet slackhappy cbowman0 drawks tabletcorry fashiolista draco2003 obazoud curtkersey richg datacratic elacarte dieterbe docdoc8fr bgzxz ciranor titilambert afilipovich lucciano sckevmit jeqka24 lambdafu strategist922 baremetal-deps jssjr krux byteinternet nettedfish edwardgeorge izikshmulewitz samywang daniellawrence alexandreboisvert benjamink bheilman j5int ph1l thenobutton criteo-forks pingwin pcn ajablonski jeffreywugz jvanasco graphite-server baites oztc mandre mrmanc jcrabtree akahn slillibri ryanrf maximebrun chrisvxd syntigo-nv themysteriousx jbooth runscope gavioto walbenzi scout24 jcsp coderfi chinnurtb timob alexey-zaharchenko instaedu ecanzonieri acdha umairghani zsiddique maage bronanny imclab codeslash oho-sugu rightaction siboulet flaviofalcao svetlyak40wt yadsirhc yuuki pabigot sunliwen lunkwill42 kixeye aizvorski heliodor evhub lambdacloud benh57 b-rich squixa

whisper's Issues

the current 0.9.x branch of whisper is not compatible with the 0.9.x branch of carbon

Seems that the addition of the fallocate kwargs in the carbon code (both 0.9.x and master) does not match the expected number of arguments in the 0.9.x branch of whisper which causes an uncaught exception in carbon-cache.

Timestamps not being honoured

If I create an archive with a 7d:5y retention:

[root@hostname ~]# whisper-create meh.wsp 7d:5y
Created: meh.wsp (3148 bytes)

And then update the archive with a value for a random day (not a Thursday):

[root@hostname ~]# whisper-update meh.wsp 1388275200:10

When dumping the archive, the timestamp has reverted to the Thursday prior to the timestamp requested (potentially aligning with EPOCH which was a Thursday I believe):

[root@hostname ~]# whisper-dump meh.wsp
Meta data:
aggregation method: average
max retention: 157248000
xFilesFactor: 0.5

Archive 0 info:
offset: 28
seconds per point: 604800
points: 260
retention: 157248000
size: 3120

Archive 0 data:
0: 1388016000, 10
1: 0, 0

Running whisper-fetch against the archive results in a bunch of timestamps in the future:

[root@hostname ~]# whisper-fetch meh.wsp
1388620800 None
1389225600 None
1389830400 None
1390435200 None
1391040000 None
1391644800 None
1392249600 None
1392854400 None
<>
1541030400 None
1541635200 None
1542240000 None
1542844800 None
1543449600 None
1544054400 None
1544659200 None
1545264000 None
[root@hostname ~]# date -d @"1545264000"
Thu Dec 20 00:00:00 UTC 2018
[root@hostname ~]# date
Tue Dec 31 10:28:53 UTC 2013

This is running on python-whisper 0.9.12 and I can reproduce it on both RHEL5 (python 2.4) and RHEL6 (python2.6).

whisper-update : being able to completely empty a slot

I think a lot of users have from time to time some really big weird values that they would like to erase. With whisper-update.py we can set a 0 value to a timestamp, but not 'None' or even set the slot 642 of the first retention to 0:0. It would be nice to be able to completly empty a slot.

contrib/whisper-auto-resize.py fails

# python whisper-auto-resize.py /archive2/graphite/ /etc/carbon/
Traceback (most recent call last):
  File "whisper-auto-resize.py", line 208, in <module>
    processMetric(fullpath, schemas, agg_schemas)
  File "whisper-auto-resize.py", line 136, in processMetric
    str_xFilesFactor =  "{0:.2f}".format(xFilesFactor)
ValueError: Unknown format code 'f' for object of type 'str'
# python -V
Python 2.7.3
# dpkg -l|grep carbon
ii  graphite-carbon                   0.9.10-1                      backend data caching and persistence daemon for Graphite

Properly detect failing fallocate

The check for CAN_FALLOCATE in whisper.py assumes that if it can import ctypes.util it can fallocate. This assumption is not all the time valid, since on most solaris based operating systems running ZFS will fail. I believe the assumption is not correct.

Archive propagation/aggregation with irregular updates

I use graphite to graph code triggered events like errors or user activity. As such, I don't have data for each and every data point. When data is propagated into a lower archive, only non-null values are used to calculate the average value. For metrics that represent events per minute the aggregation would lead to wrong values as null values (which represent values of 0) are not considered.

As whisper was written especially with irregular updates in mind, did I miss something and there is some mechanism for this case? If not, a new aggregate function that interprets nulls as 0s would be nice.

The downsampling aggregation doesn't work as expected

The correspondent entry in creates.log:

creates.log:18/09/2013 11:27:34 :: new metric stats.timers.replaced.processing_time.upper_90 matched schema stats
creates.log:18/09/2013 11:27:34 :: new metric stats.timers.replaced.processing_time.upper_90 matched aggregation schema max_percentile
creates.log:18/09/2013 11:27:34 :: creating database file /opt/graphite/storage/whisper/stats/timers/replaced/processing_time/upper_90.wsp (archive=[(10, 17280), (60, 10080), (300, 8640), (900, 35040)] xff=0.1 agg=max)

So I expect that on any resolution (assuming there are at least 10% samples available) it will be downsampled using max aggregation scheme.

Well, for a 10s interval I've got some metric value of about 760, but when downsampled to a minute - the value became to something like 450.

Screenshots:

10 seconds:

60 seconds:

Could anyone please explain such a behaviour? It looks like a bug.

Explore using mmap for data file access

Imported from https://bugs.launchpad.net/bugs/989427


Reported by	mleinartas
Date Created	Apr 27, 2012
Tags	[]

From question: https://answers.launchpad.net/graphite/+question/191807

I read in one of the answers (comment #4 in https://answers.launchpad.net/graphite/+question/170794) that carbon-cache keeps writing to both the first and last blocks of each .wsp file.
This means that it has to keep doing lseek+read+write system calls every time it updates the file.
I'm not very familiar with Python but a quick search tells me that it supports mmap (http://docs.python.org/library/mmap.html).
Using mmap(2) could give a huge advantage:

No system calls (which are expensive).
No copying of data between the process user-space and kernel buffers on each read/write

Any tool/ or python code example on how to transform data from an existing whisper file?

I need to convert data contained in a lot of whisper files ( thousands of them) .

The transform I should to do is rate-> counter, assuming that contained data is a rate (val2-val1)/time, the operation is as easy as multiply the perion time for each archive this is:

for each archive 
  val[i]'=val[i]*time_per_period (in seconds).

After doing for all archives I should change aggregation method from avg to sum.

The easiest way to do it is with whisper-xxxx.py tools , but there is too many files and the most efficient way to do this is with python code.

There is any example code or documentation on how to do that?

Create whisper-move script

So this would be a very simple script to simply move a metric from one location to another (checking that the target location doesn't already exist).

Reason for having this as a script rather than just diving in an moving files is this is slightly more use friendly and also will allow access to this script alone rather than all metric storage directories.

Enabling and improving whisper file header caching

Currently this setting is not exposed through carbon, I have created a few branches that

Expose this setting through carbon-cache options
Creates a notion of entry which encapsulates a time stamp in addition to info
Allows carbon cache to control the ageing

I do not expect my first proposal to be accepted but would like to start this conversation as caching would be very useful in order to decrease the number of disk iops

To this end I am proposing
#157
graphite-project/carbon#523
for your consideration

whisper-fill.py work with directories?

what would be the syntax to fill whisper files under one directory? also, can the source be on a different server?

Full Disk = Corrupt Whisper

At least for 0.9.x, when a disk carbon is using fills up the result is empty and or corrupt whisper files, the the file cannot be updated for read. This scenario could be handled better... perhaps by checking space before writing or removing files that are partially written. Has anyone else found a workaround for this?

[Notification] Whisper library ported to perl

Good Morning,

First of all, thanks for sharing the whisper database concept with the world. I've came across whisper in a pure perl world and started to port the whisper library towards perl, the src can be found under https://github.com/corecache/libwhisper-perl - it's also available on CPAN.

The library implements reading functions only, creating/upating wsp files is planned but currently not implemented - so this is also a call for actions to help implementing the write operations too.

Thanks for all the fish!

sparse files are broken

graphite-project/carbon#189

[utility] (graphite) pn@ip-10-155-128-120:~ $ whisper-dump.py /opt/graphite/storage/whisper/Platform2/QA/RecommendationService/i-3fbaa558/clientCall/derrida/updateContextActivations/rate/avgPerSec.wsp 1>/dev/null
Traceback (most recent call last):
  File "/usr/local/bin/whisper-dump.py", line 87, in <module>
    dump_archives(header['archives'])
  File "/usr/local/bin/whisper-dump.py", line 79, in dump_archives
    (timestamp, value) = struct.unpack(whisper.pointFormat, map[offset:offset+whisper.pointSize])
struct.error: unpack requires a string argument of length 12

New whisper-fill utility

With whisper-merge from 0.9.10, we found that data from the source file would always overwrite data in the destination file. In order to reconcile (lost data) in a cluster a more careful approach was necessary, such that we only fill in gaps of missing data, and use the highest precision archive for that.

I believe that the last point has already been addressed with recent commits, however, still posting our code as it may be useful for others. We use whisper-fill instead of whisper-merge to fill in gaps in data on nodes in a (replicated) cluster that needed a reboot or for some other reason became unavailable for a while.

Since I can't attach patches here, inline follows what we did. Feel free to use and incorporate as you deem appropriate.

Thanks,
Fabian

#!/usr/bin/env python

# whisper-fill: unlike whisper-merge, don't overwrite data that's
# already present in the target file, but instead, only add the missing
# data (e.g. where the gaps in the target file are).  Because no values
# are overwritten, no data or precision gets lost.  Also, unlike
# whisper-merge, try to take the highest-precision archive to provide
# the data, instead of the one with the largest retention.
# Using this script, reconciliation between two replica instances can be
# performed by whisper-fill-ing the data of the other replica with the
# data that exists locally, without introducing the quite remarkable
# gaps that whisper-merge leaves behind (filling a higher precision
# archive with data from a lower precision one)

# Work performed by author while working at Booking.com.

from whisper import info, operator, fetch, update_many
import itertools
import time
import sys

def fill(src, dst, tstart, tstop):
        # fetch range start-stop from src, taking values from the highest
        # precision archive, thus optionally requiring multiple fetch + merges
        srcHeader = info(src)

        srcArchives = srcHeader['archives']
        srcArchives.sort(key=operator.itemgetter('retention'))

        # find oldest point in time, stored by both files
        srcTime = int(time.time()) - srcHeader['maxRetention']

        if tstart < srcTime and tstop < srcTime:
                return

        # we want to retain as much precision as we can, hence we do backwards
        # walk in time 

        # skip forward at max 'step' points at a time
        for archive in srcArchives:
                # skip over archives that don't have any data points
                rtime = time.time() - archive['retention']
                if tstop <= rtime:
                        continue

                untilTime = tstop
                fromTime = rtime if rtime > tstart else tstart

                (timeInfo, values) = fetch(src, fromTime, untilTime)
                (start, end, archive_step) = timeInfo
                pointsToWrite = list(itertools.ifilter(
                        lambda points: points[1] is not None,
                        itertools.izip(xrange(start, end, archive_step), values)))
                pointsToWrite.sort(key=lambda p: p[0],reverse=True) #order points by timestamp, newest first
                update_many(dst, pointsToWrite)

                tstop = fromTime

                # can stop when there's nothing to fetch any more
                if tstart == tstop:
                        return

def main(argv):
        if len(argv) != 2:
                print("usage: whisper-fill.py src dst");
                print("       copies data from src in dst, if missing")
                sys.exit(1)

        src = argv[0]
        dst = argv[1]

        startFrom = time.time()
        header = info(dst)
        archives = header['archives']
        archives.sort(key=operator.itemgetter('retention'))
                fromTime = time.time() - archive['retention']
                if fromTime >= startFrom:
                        continue

                (timeInfo, values) = fetch(dst, fromTime, startFrom)
                (start, end, step) = timeInfo
                gapstart = None
                for v in values:
                        if not v and not gapstart:
                                gapstart = start
                        elif v and gapstart:
                                # ignore single units lost
                                if (start - gapstart) > archive['secondsPerPoint']:
                                        fill(src, dst, gapstart, start)
                                gapstart = None
                        start += step

                startFrom = fromTime


if __name__ == "__main__":                                                  
        main(sys.argv[1:])

exceptions.IOError No such file or directory /opt/graphite/storage/whisper/stats/timers/prod/ .... <XYZ>.wsp

We've been statsd/carbon/whisper/graphite for recording stats since Jun 25.
Starting from july 1, carbon cache has been logging the following exception all over.

exceptions.IOError: [Errno 2] No such file or directory: '/opt/graphite/storage/whisper/stats/timers/prod/************/count.wsp'

We already have a lot of stats collected over that week and till today(Jul 12). But it looks like it fails for certain new keys. Isn't it supposed to create the file if it doesn't exist already ?

As per log for jun25, carbon was smooth till 2am. I then see logs for 18:30 on July 1 starting with errors.

Could the following configs cause trouble ?

storage-schema.conf:

[carbon]
pattern = ^carbon.
retentions = 60:90d

[default_1m_for_1_month]
priority=100
pattern = .*
retentions = 1m:30d

carbon.conf:

USER = graphite

We run apache as graphite, and everything under /opt/graphite is recursively chowned and chgrped to graphite)

As a side note, right from start, we're seeing this warning in the logs:
/opt/graphite/conf/storage-aggregation.conf not found, ignoring.

Any idea what could be wrong ?

Full trace :

01/07/2012 18:27:22 :: Unhandled Error
Traceback (most recent call last):
File "/usr/lib/python2.7/threading.py", line 504, in run
self.target(_self.args, self.__kwargs)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/threadpool.py", line 167, in _worker
result = context.call(ctx, function, *args, *_kwargs)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 118, in callWithContext
return self.currentContext().callWithContext(ctx, func, _args, *_kw)
File "/usr/local/lib/python2.7/dist-packages/twisted/python/context.py", line 81, in callWithContext
return func(*args,kw)

File "/opt/graphite/lib/carbon/writer.py", line 158, in writeForever
writeCachedDataPoints()
File "/opt/graphite/lib/carbon/writer.py", line 118, in writeCachedDataPoints
whisper.create(dbFilePath, archiveConfig, xFilesFactor, aggregationMethod, settings.WHISPER_SPARSE_CREATE)
File "/usr/local/lib/python2.7/dist-packages/whisper.py", line 327, in create
fh = open(path,'wb')
exceptions.IOError: [Errno 2] No such file or directory: '/opt/graphite/storage/whisper/stats/timers/prod/************/count.wsp'

RFC: Proposal for new aggregationMethod weightedAverage("other_wisper_file.wsp")

Hi everybody.

I'm working in a specialized performance group and we are building a big infrastructure with graphite ( + whisper by now).

We are working mainly with 2 main metrics:

services response times. ( by period)
service request number. ( by period)

( by example web response time of a intranet, -- more used on working hours than on the night ).

While analyzing requirements we noticed that there are not a good aggregation method for us.

Supose we have the following use in a website for a day.

request/hour

response time/hour

if we use the following timePerPoint/timeToStore with an average aggregationMethod

60m:1d
1d:1y

after this day the averaged value will be 204.17 ms but the real ( sum(all_request_one_day)/#request_one_day = 238,09 ms

this is a 15% of ERROR!!

I will propose to create a new aggregationMetrod weigthedAverage() in the original whisper file (o) referred to other whisper file ( by example : num_request.wsp ) .

In this case the computed value should be :

NOTE: o[n] like original serie and r[n] refered serie, and n=timestamp value

computed_value=(o[n]*r[n] + o[n-1]*r[n-1]+.....+o[0]*r[0]) /  (r[n]+r[n-1]+...r[0])

The only restriction I can see is exactly the same storage schema for both series ( original and refered) , and perhaps the aggregationMethod on the referred serie should be forced to "sum"

¿ What do you think about ?

whisper-merge fails

whisper-info users.wsp

maxRetention: 31536000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 830880

Archive 0
retention: 2592000
secondsPerPoint: 60
points: 43200
size: 518400
offset: 52

Archive 1
retention: 15552000
secondsPerPoint: 900
points: 17280
size: 207360
offset: 518452

Archive 2
retention: 31536000
secondsPerPoint: 3600
points: 8760
size: 105120
offset: 725812

whisper-resize users.wsp 1m:30d 1d:1y

Retrieving all data from the archives
Traceback (most recent call last):
File "/usr/bin/whisper-resize", line 66, in
timeinfo,values = whisper.fetch(path, fromTime, untilTime)
File "/usr/lib/python2.6/site-packages/whisper.py", line 667, in fetch
return file_fetch(fh, fromTime, untilTime)
File "/usr/lib/python2.6/site-packages/whisper.py", line 733, in file_fetch
unpackedSeries = struct.unpack(seriesFormat, seriesString)
struct.error: unpack requires a string argument of length 105048

Any ideas why is is happening?

Remove enableDebug

this function does some nastiness such as using globals to override open() and there isn't currently a way to disable it once enabled.

As a maintainer, I don't personally see a ton of utility in it and would be more in favor of removing it from whisper's master branch and 0.10 as a result. For profiling times I'd rather just use cProfile and the excellent runsnake Python profile viewer instead.

What does the rest of the team think of this? Any utility in keeping it around? If so, I can make a disableDebug, but it would have to add more grotesque globals and that is generally poor programming practice.

Whisper file locking

There is code in whisper.py to do file level locking. Is there a reason that it is disabled?

I need to resize a large number of whisper files and want to do it without taking down carbon.

Whisper queries on existing files broken after upgrade from 0.9.10 to 0.9.14

I have a really odd whisper problem after upgrading to 0.9.14 from 0.9.10. I can no longer see any data for a range that is less than 24 hours and is before the current day. I can see data for the current day, and I can see data for something like "the last week" at one time, but if I try to look at a single day in during that last week or whatever time period, no data points are returned.

For example:

On my 0.9.14 installation, this whisper-fetch.py query for data from Mon Nov 16 17:39:35 UTC 2015 until now returns what I would expect. All timestamps return data. whisper-fetch.py --pretty --from=1447695575 myfile.wsp.

However, this query for an hours worth of data starting at the previous time stamp returns None for all values. whisper-fetch.py --pretty --from=1447695575 --until=1447699175 myfile.wsp.

I thought I had seriously broken something with the installation, so I copied one of the whisper files back to the old server with 0.9.10 on it and ran the same whisper-fetch.py test queries an the same exact whisper files from the new server, and data shows up as I would expect.

My first thought was that somehow, somewhere I had screwed up retention, but the retention on these files haven’t changed in several years and it was working and continues to work 100% correctly on the old graphite server, even with whisper files copied back from the 0.9.14 server.

This is the retention information from the specific file that I've been testing with:

# From whisper-info.py:
maxRetention: 315360000
xFilesFactor: 0.5
aggregationMethod: average
fileSize: 5247424

Archive 0
retention: 86400
secondsPerPoint: 10
points: 8640
size: 103680
offset: 64

Archive 1
retention: 2592000
secondsPerPoint: 60
points: 43200
size: 518400
offset: 103744

Archive 2
retention: 63072000
secondsPerPoint: 300
points: 210240
size: 2522880
offset: 622144

Archive 3
retention: 315360000
secondsPerPoint: 1800
points: 175200
size: 2102400
offset: 3145024

Whisper file resizing with aggregation ?

I am using retention policy of 10s:6h,1min:7d,10min:5y for statsd metrics . size of each whisper file is 3.2 MB which is very high. we wanted to reduce its size and came up with new retention policy of 10s:1h 1m:6h 5m:1d 30m:7d 1h:90d 1d:1y 30d:5y which consumes 47KB .

We used whisper-resize.py to resize the .wsp files (all are count metrics) which were created already.

I used the following command for some sample .wsp files and tested
whisper-resize.py --aggregate --aggregationMethod=sum --xFilesFactor=0.0 --newfile=newFile.wsp oldFile.wsp 10s:1h 1m:6h 5m:1d 30m:7d 1h:90d 1d:1y 30d:5y

but the aggregation is not appropriate . I crosschecked both file's data.
what went wrong in my approach ? and can anyone help me in resizing along with aggregation?

i have gone through the source code of whisper.py. but didnt understand what exactly happening ? especially file_update_many and __archive_update_many functions.

Thanks in Advance :)

whisper logo

It would be nice to have a logo for whisper.
I would make good use of it in my documentation diagrams.

whisper git master release appears as 0.9.10

After installed whisper from git I've executed

# pip list -o
....
whisper (Current: 0.9.10 Latest: 0.9.13 [sdist])
....

what this means? is git master the latest release?

Backport pr #33 into 0.9.x

Seems reasonable enough (#33)

struct.error: 'L' format requires 0 <= number <= 4294967295

Hello,

I have an exception when I try to convert a rrd file to whisper.
/usr/local/bin/rrd2whisper.py /path/to/file.rrd

Traceback (most recent call last):
  File "/usr/local/bin/rrd2whisper.py", line 51, in <module>
    whisper.create(path, [(seconds_per_point,retention_points)], xFilesFactor=options.xFilesFactor)
  File "/usr/local/lib/python2.7/dist-packages/whisper.py", line 333, in create
    maxRetention = struct.pack( longFormat, oldest )
struct.error: 'L' format requires 0 <= number <= 4294967295

Any idea ?

Document whisper file format

Hello!

We'd love to use and manipulate the whisper files from ruby, but without a file format specification it's hard to write a parser for it.

We'd like to perform non-standard operations, like "give me the last non-null value before point X in time, from either archive", etc.

I'd love to see the we don't need to invoke external (python) scripts for these operations, but rather be able to do this from our own code.

Please sketch up a binary file format, or let us know, if there is one out there, but we have not found it.

Why Whisper not allow fetch data in future? (and not emit an exception about that?)

Hello.

I am trying Graphite to register metrics in a website, and, to these metrics, I have some goals (like number of registered users, by example). I have searched for some tools to allow plotting of these metrics together with his goals, and I found that, when I use until= parameter in Graphite Web with a date in the future, it simply ignores that field and put the limit to..now.

As I am a Python programmer, I started investigating where is the problem. Starting at Graphite Web, I came to the Whisper, which, in this exact line, just ignores the date informed in until without emitting an exception (which seems good, almost to me, to inform the user of the limitation) or.....doing the expected work, as it allows to register the date in the future (I tested it. And really works, but it only show the data when the time came (e.g If a post to 1 minute in the future, I only can get the data when the 1 minute comes to now)), but not fetch these data easily.

I know that maybe Graphite is not the tool for it. But, as I related, the behavior above is not...expected. Consider that I am not saying that Graphite/Whisper should allow fetch data in the future, but I only asking why creating the data works when fetching the data simply not (without even alert the user about it).

If there is a tool specific to this requeriment, please, inform me about it. If there is nothing related, just alert me to create my own graphs with the metrics and the goals together, as I expected.

Thanks at all.

Suggested backup strategy

Hi,

I'm having a hard time finding info on a recommended way to back up whisper files.

There's projects like https://github.com/jjneely/whisper-backup that do lots of other things as well, but is there any simpler method? How likely am I to end up with corrupted whisper files if I simply cp the whole thing?

Thanks!

JSON output for whisper-info.py

Would be great to have this, as it would make the output simpler to manipulate programatically. I was hoping to run a script that checks whether each .wsp file is older than its greatest retention value, and if so delete it - a json output would greatly simplify this.

Multiply all whisper values

Hello,
I need to move from Mbit/s to bit/s. Is there any way to multiply all whisper points via a script to do that?

Rollup aggregation at the highest level

It is obiuosly more of a question than a feature request, because, as I see it, it could break some backward compatibilty.

Is it architecturally possible to implement rollup aggregation on the highest level? The use case is obviuos -- one may have a metric that has to come from different hosts and the current solution would be (I guess) to have an aggregating middleware or to use extra signals with aggregation rules, which one could call a bit surplus.

I would be grateful, if one could tell me about the perspective of such feature, though I am considering possibility that I am just lacking some understanding of the code.

Migrate historical data with aggregation

I met a data migration issue here, for instance, I have some historical metric keys: metrics.node1.request.count
metrics.node2.request.count
...
metrics.node8.request.count

And I need to aggregate(sum) those metrics to metrics.all.request.count, is there any script can help me to do that?

Thanks a lot!

Documentation appears wrong about retention resolution requirements

The documentation says:

To support accurate aggregation from higher to lower resolution archives, the number of points in a longer retention archive must be divisible by its next lower retention archive. For example, an archive with 1 data points every 60 seconds and retention of 120 points (2 hours worth of data) can have a lower-resolution archive following it with a resolution of 1 data point every 300 seconds for 1200 points, while the same resolution but for only 1000 points would be invalid since 1000 is not evenly divisible by 120.

According to the code, it's not based on the number of data points - it's based on the precision of each archive.

whisper out of range time range cause incorrect value to be returned

Imported from https://bugs.launchpad.net/bugs/1111285


Reported by	sbernard
Date Created	Jan 31, 2013
Tags	[]

When specifying a range in the future, fetch_file accepts the range as ok and returns an incorrect timeinfo with an end < start.

How to reproduce :
take any timeseries
Call fetch_file with
from = now + 1h
until = now + 3h

fetch_file returns a series of NONE values correct with a timeinfo like this:

start = from
end = now

As from is in the future, the timeinfo returned has end < start which is cleary a bug.

Enforce closing of files in 0.9.x

I had a recurring problem where my carbon-cache.py writer thread would deadlock which I traced down to a race condition in whisper.py. I had enabled Flock locking. When an error occurred updating the WSP file (like a zero length file which generated the CorruptWhisperFile exception) the WSP file would not be closed. A race condition between the Python GC (which would close the file referenced by an out of scope variable) and the next time there were data points to write to the same corrupt WSP file triggered a deadlock. As the out of scope file descriptor wasn't closed the file was still locked and the writer thread would wait forever trying to obtain an exclusive lock on the same file.

This should easily be fixed by enforcing that files get closed. I found that the master branch does this but not 0.9.x. I would offer a pull request but I much perfer the current method of using Python's with statement to the original commit's try/except/finally:

f610a65

That commit has been cleaned up several times since making it difficult to back port.

Is anyone up for a backport of this? Or should I make a pull request with the original patch?

Is the master branch stable enough to run in production with a 0.9.12 / 0.9.13 Graphite setup?

Share archive merge and copy logic between resize and merge operations

whisper-merge.py and whisper-resize.py are both handling copy operations between two whisper files. Whisper-resize.py recently received some new code to handle resizes with reduced precision changes more intelligently: f20e2bf

This logic should be moved into whisper.py as part of a merge operation along with any other code to handle merges from different types. Whisper-resize.py can then call a merge to do it's final copy of data

(bug) whisper-resize.py tool inserts holes in lower precision archives

Hi everyone.

I've been testing for a while the whisper-resize tool. As you can see in the following picture the resized whisper file has holes at the 2 lowest precision archives (blue and brown lines)

To test it I've made 2 new tools in this PR #120

whisper-createfull.py who can create whisper files with a single value in alls its datapoints and enables us to know the expected data after resized
whisper-split.py anables create as many whisper files as archives in the original whisper file so we can inspect data contained in each archive file.

We did the following test:

create a know whisper file (all values= 1.0) with a know retention ( 1m:30d and 15m: 365d)
resize the original file to a lowest retention ( by example )
split the resized whisper file. ( expected data is to be all values = 1.0 in all the archives )
plot all data in each archive with a external tool like grafana ( we did a scale(metric_archive,num) to represent in the graph lowest precision archives values with a line below highest precision, with this representation we can see easily where data exist and it's value.

you can reproduce this test executing this script.

#! /bin/bash
WHISPER_DIR = /opt/grafito/storage/whisper/
rm -f $WHISPER_DIR/whisper_test_file*
whisper-createfull.py $WHISPER_DIR/whisper_test_file.wsp 60: 30d 15m: 365d
whisper-resize.py $WHISPER_DIR/whisper_test_file.wsp --newfile = $WHISPER_DIR/whisper_test_file_resized.wsp 1m: 30d 5m: 90d 1h: 200d 1d: 1a
whisper-split.py $WHISPER_DIR/whisper_test_file_resized.wsp

These are the results when plotted each archive with a line over the other (scale)

Allow whisper to store data in a whisper file relative to the timestamp of the latest datapoint instead of current date

Hello,

From my observation and a quick look at the code, it looks like whisper is storing data in the files relative to the current time.

I.E.: I create a new whisper file (empty) with the following retention : 1s:5m 10s:1d 1m:30d 5m:90d 15m:1y
If the current date is January 1st 2016 and for whatever reason I want to store a datapoint from august 15h 2015, this datapoint (and its timestamp) will be stored directly in the last archive (15m:1y)

I am wondering if it would be easy to modify whisper's behavior to the following:
for a given whisper file, as a new datapoint is added to the file:
=> check its timestamp
==> if it's timestamp is newer than any timestamp previously recorded, make this new timestamp the relative "starting date" of the file. (if this is the first datapoint added to the file, then make its timestamp be the "starting date")
And then use this "starting date" instead of the current time.

This way, if I take my previous example again, this time, when I add my datapoint from august 15th 2015, it will be stored in all archives, and august 15th 2015 will be considered as the "starting date" of this whisper file.
Then, if I add a new datapoint to that whisper file long after that date, let say on December 25th 2015, then December 25th 2015 becomes the new "starting date" and this new datapoint get stored in all archives, while my previous datapoint gets removed form all archives but the last.

If this makes sense, the idea behind that is to allow to store in whisper some old data but still at a high precision as long as there has been no new datapoint added.

Let's say I have stored somewhere (a cvs file for example) some datapoint from last year at a precision of 10 seconds and just for a duration of 4 hours.
If I wanted to import them to the whisper database, with the current behavior of whisper and with the retention setup indicated above, it would get stored with a 15 minutes precision (so I would lose my 10s precision).
With the modified behavior, it would actually store it with a 10s precision.

Anybody has an idea if this could be done easily ?

If I am not clear in my explanation, let me know and I will try to explain better.

Whisper could be significantly sped up O(1) complexity for aggregation.

Hello!

By increasing the redundancy in the data format - promote efficiency over disk space - whisper storage efficiency can be significantly improved.

When updating the lower archives currently one needs to recalculate and reaggregate all data in the upper archive.
If the data points included not only the value but also one (or multiple - as needed by the chosen aggregator) auxiliary field used by the aggregator, which would store depedning on the method whatever data is useful for it, updating the lower archives will be as easy as reading these values and updating them in O(1) complexity.
Since all updated will go down to all lower archives they will always be in sync.

This will currently only benefit the average method, but this is the most widely used right now, and opens up the possibility to add more complicated others as well (like squared average, etc.)

So instead of the current format which can be found at https://github.com/graphite-project/whisper/blob/master/whisper.py#L20,L26

While we're at it, we can also add a Magic identifier and a version information to the format.

I recommend this layout:


  # File = Header,Data
  # Header = Magic, Version, Metadata,ArchiveInfo+
  #              Magic = "WHISPER"
  #              Version = Version information in ascii, 5 characters "01.00"
  #     Metadata = aggregationType,aggregationMetadataSize,   maxRetention,xFilesFactor,archiveCount
  #     ArchiveInfo = Offset,SecondsPerPoint,Points
  # Data = Archive+
  #     Archive = Point+
  #         Point = timestamp,value, aggregationMetadata

aggregationMetadata will be of aggregationMetadataSize bytes length, and it's format will be speicific to the aggregation method chosen. The aggregationMetadataSize field is necessary so that code that does not know the aggreagtor method can still browse through the data in the archive.

Installation & Running in the README?

The README currently describes what whisper is, and lists a few scripts, but leaves open quite a few questions:

How do I install it?
How do I run it?
How do I start developing on this project?

whisper-merge.py fails

Using the latest release of whisper, whisper-merge.py fails with:

Traceback (most recent call last):
  File "/usr/local/bin/whisper-merge.py", line 32, in <module>
    whisper.merge(path_from, path_to)
  File "/usr/local/lib/python2.7/dist-packages/whisper.py", line 835, in merge
    (timeInfo, values) = fetch(path_from, fromTime, untilTime)
TypeError: 'NoneType' object is not iterable

Whisper lacks testcases and is not time invariant

I'm filing this issue since I've been thinking about this for a while. Not sure if I should report this here or on Launchpad.

Some time ago I contributed to Whisper. I was curious to see how whisper worked, and I also wanted to add more powerful import functionality of historical data. I've also considered using Whisper to store historical data in production.

Looking at the code there were two things that bothered me:

The total lack of test cases. I am not necessarily that guy who always require everything to have testcases or 100% test coverage. More so, I think it's more important to test only what needs to be tested. However, there are two reasons why I think Whisper needs at least a couple of crucial test cases (but tests for corner cases should preferably be added, too):
- It will store my precious historical data and I will blindly trust all data that's been stored in Whisper. There seem to be a number of corner cases (esp. due to 2)). I truly hope that these corner cases will never show up as regressions in the future.
- It will make it waay easier for newcomers to confidently contribute to Whisper.
It is time dependent. There are multiple calls to time.time() done in multiple places. This makes it harder to recreate corner cases, write tests and also identify corner cases.

I'd love to hear your input on this. If this is not something you are willing to work on in Whisper, maybe it's something that can brought to attention in Ceres.

whisper aggregation not working for older data points

Storage - scheme:
[default]
pattern = .*
retentions = 5m:15d,15m:1y,1h:10y,1d:100y

storage-aggregation :
[all_sum]
pattern = .*
xFilesFactor = 0.1
aggregationMethod = sum

Now, I am feeding entries as :

echo "rec.test 25 $(date --date="-6 minute" +%s)" | nc localhost 2003
echo "rec.test 50 $(date --date="-3 minute" +%s)" | nc localhost 2003
echo "rec.test 100 $(date +%s)" | nc localhost 2003
echo "rec.test 1 $(date --date="-1 year" +%s)" | nc localhost 2003
echo "rec.test 4 $(date --date="-1 year minute" +%s)" | nc localhost 2003
echo "rec.test 6 $(date --date="-1 year -1 minute" +%s)" | nc localhost 2003
echo "rec.test 8 $(date --date="-1 year -2 minute" +%s)" | nc localhost 2003

On grafana graph, I am able to see the aggregation(sum value) for recent feeded values. But 1 year before values are not aggregated. In fact only one value(latest entry from window of 1 hour) 8 is shown instead of 4+6+8=18.

What can be missing in the configurations ?

Wrong variable name for CAN_LOCK vs. LOCK?

Sorry to raise this as an issue, I'm not sure how else to contact the developers for such a question.

While working on the whisper.py code, I noticed that it contains:

try:
  import fcntl
  CAN_LOCK = True
except ImportError:
  CAN_LOCK = False

And then only references to "LOCK", like this:

    if LOCK:
      fcntl.flock( fh.fileno(), fcntl.LOCK_EX )

"CAN_LOCK" is not mentioned anywhere else.
Could this be a case that "if LOCK" was meant to be "if CAN_LOCK", or am I missing something?

Add --debug to whisper-fetch to help people

I have spend quite few hours trying to find out why Graphite does not show my data. The main problem is that it is hard to see what it does with the data before showing them. I would love to have something like this:

$ whisper-fetch --pretty --debug mymetric.wsp
DEBUG: Aggregating data to 1 min as determined by the schema Schema 1s:30m,1m:1d,5m:2y and selected period Fri May  9 10:27:10 2014 - Sat May 10 10:27:00 2014
DEBUG: 10 periods were aggregated to None because they had fewer than 30 (50% of 1 min) data points as mandated by the xFilesFactor=0.5
Fri May  9 10:27:10 2014    None
...

Such or similar output would have made it very clear how the data was aggregated and why I see None in the output even though I have sent some data.

I hope you will consider this request and find the time to implement at least parts of it. Thank you!

Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/django/core/handlers/base.py", line 115, in get_response
    response = callback(request, *callback_args, **callback_kwargs)
  File "./graphite/render/views.py", line 112, in renderView
    seriesList = evaluateTarget(requestContext, target)
  File "./graphite/render/evaluator.py", line 10, in evaluateTarget
    result = evaluateTokens(requestContext, tokens)
  File "./graphite/render/evaluator.py", line 21, in evaluateTokens
    return evaluateTokens(requestContext, tokens.expression)
  File "./graphite/render/evaluator.py", line 24, in evaluateTokens
    return fetchData(requestContext, tokens.pathExpression)
  File "./graphite/render/datalib.py", line 230, in fetchData
    dbResults = dbFile.fetch( timestamp(startTime), timestamp(endTime), timestamp(now))
  File "./graphite/storage.py", line 308, in fetch
    return whisper.fetch(self.fs_path, startTime, endTime, now)
TypeError: fetch() takes at most 3 arguments (4 given)

using whisper 0.9.12