Comments (10)
Hi @cdeil ,
As far as I remember it was not initial intention of author in #57
Other users also noticed that - e.g. #250
from whisper.
TBH I think proper implementation should not drop point but replace it with 0 or previous value instead. But then we should not call this function "drop", isn't it?
from whisper.
I don't see where it says this is intended. Returning the wrong timestamp does not make any sense to me, can't imagine why someone would want this.
from whisper.
This cost me a day at work, because I ran whisper-fetch.py --drop=nulls
and got timestamps for 2010 - 2012 and requested another data extract from a legacy system and was debugging why the extract doesn't work properly there. But really the data was for 2018-2020 as it should be already in the Whisper file, I just created wrong timestamps due to this bug.
For now I'm changing my code to just call whisper.fetch()
in my pipeline instead of whisper-fetch.py
, put the timestamps and values into a pandas.Series and call dropna to drop irrelevant (t, val)
.
from whisper.
Possible fix (completely untested): #306
from whisper.
Another issue that I ran into is that here you're using local time of the machine that I'm running the data processing on, which gave incorrect results in my case:
Lines 86 to 89 in 8d21c56
I'm now using this, I think that should be correct:
def read_whisper(path):
(fromTime, untilTime, step), val = whisper.fetch(path, fromTime=0, archiveToSelect=None)
fromTimeStamp = pd.Timestamp(fromTime, unit="s", tz="Europe/Berlin")
index = pd.date_range(
start=fromTimeStamp,
freq="H",
periods=len(val),
)
data = {"val": val}
return pd.DataFrame(data, index=index)
from whisper.
I was getting incorrect data with whisper.fetch
for archiveToSelect=60
, tried varies fromTime
values.
I see correct data with whisper-dump.py
, but I don't want to write temp CSV files.
Wrote this, which seems to work fine:
def read_whisper_archive(path: str, archive_id: int) -> pd.DataFrame:
"""Whisper data read direct implementation with Numpy and Pandas"""
infos = whisper.info(path)
if archive_id < 0 or archive_id >= len(infos["archives"]):
raise ValueError(f"Invalid archive_id = {archive_id}")
dtype = np.dtype([
("time", ">u4"),
("val", ">f8")
])
offset = infos["archives"][archive_id]["offset"]
data = np.fromfile(path, dtype=dtype, offset=offset)
data = data[np.nonzero(data["time"])]
# The astype is needed to avoid this error later on
# ValueError: Big-endian buffer not supported on little-endian compiler
df = pd.DataFrame(
data={"val": data["val"].astype(float)},
index=pd.to_datetime(data["time"], unit="s")
)
df = df.sort_index()
return df
This should be much faster and memory-efficient than the current whisper-dump.py
, which used Python types and lists, and also more convenient for use cases where people want to do data analysis directly, or dump to binary formats like Parquet / HDF5 / ....
Is the processing correct, i.e. is it guaranteed that non-filled values have time=0? and is the sorting by time at the end needed, or is this already the case in the file? the description at https://graphite.readthedocs.io/en/stable/whisper.html unfortunately doesn't explain where values are filled within the archive.
Do you think it could make sense to add a function like this to this repo? The Numpy & Pandas import could be delayed to the function, i.e. it would be an optional dependency.
Alternatively I could just make a file whisper_pandas.py
in my private Github account share some functions there.
from whisper.
Hi @cdeil,
I agreed that Whisper has many use cases, but IMO using it for analytical purposes is not widely adopted.
That's why I do not think that including pandas or numpy as part of library is a good idea.
But you can add your scripts to contrib/
directory, which exist for exact that reason.
from whisper.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.
from whisper.
Just in case someone finds this old thread and is looking for a Whisper file Pandas reader, check this out:
https://github.com/cdeil/whisper_pandas/blob/main/whisper_pandas.py
Of course, any feedback or contribution would be welcome.
Specifically I'm not sure if the data = data[data["time"] != 0]
line and sort_index
is valid.
It seems to work for my files, but the WhisperDB docs at https://graphite.readthedocs.io/en/latest/whisper.html unfortunately don't say how the file is initialised or where the points are inserted.
from whisper.
Related Issues (20)
- [Q] - Is there any way to get list of unique Keys where time-series datapoints are available for a duration HOT 2
- Whisper Resize aggregation issue HOT 8
- [Q] How to reduce the dimensionality of my data? HOT 2
- whisper-info.py human-readable time information HOT 2
- whisper-fetch.py add option to select archive? HOT 3
- will it work for multivariate time series prediction both regression and classification HOT 5
- Whisper should support clean dead data HOT 3
- No able to download the whisper.git file HOT 2
- pip install failed HOT 4
- [BUG] whisper-resize with aggregation writes incorrect values HOT 1
- whisper-update read from stdin HOT 4
- [Q]Display mean/median for timing matrices shows incorrectly on grafana HOT 1
- [BUG] test_resize_with_aggregate failure HOT 7
- [Q] Did the database change between version 0.9.6 and 1.2.0? HOT 5
- [Q]module 'whisper' has no attribute 'load_model' HOT 1
- [BUG] HOT 1
- multiple retentions in one file [BUG] HOT 2
- [Q] Is it safe to whisper-fill.py while carbon is active and the source file contains datapoints which are preceding those in destination? HOT 5
- Add support for creating sparse files with whisper-resize (or read the configuration from carbon) HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from whisper.