Giter Site home page Giter Site logo

Comments (10)

deniszh avatar deniszh commented on May 27, 2024

Hi @cdeil ,

As far as I remember it was not initial intention of author in #57
Other users also noticed that - e.g. #250

from whisper.

deniszh avatar deniszh commented on May 27, 2024

TBH I think proper implementation should not drop point but replace it with 0 or previous value instead. But then we should not call this function "drop", isn't it?

from whisper.

piotr1212 avatar piotr1212 commented on May 27, 2024

I don't see where it says this is intended. Returning the wrong timestamp does not make any sense to me, can't imagine why someone would want this.

from whisper.

cdeil avatar cdeil commented on May 27, 2024

This cost me a day at work, because I ran whisper-fetch.py --drop=nulls and got timestamps for 2010 - 2012 and requested another data extract from a legacy system and was debugging why the extract doesn't work properly there. But really the data was for 2018-2020 as it should be already in the Whisper file, I just created wrong timestamps due to this bug.

For now I'm changing my code to just call whisper.fetch() in my pipeline instead of whisper-fetch.py, put the timestamps and values into a pandas.Series and call dropna to drop irrelevant (t, val).

from whisper.

cdeil avatar cdeil commented on May 27, 2024

Possible fix (completely untested): #306

from whisper.

cdeil avatar cdeil commented on May 27, 2024

Another issue that I ran into is that here you're using local time of the machine that I'm running the data processing on, which gave incorrect results in my case:

if options.time_format:
timestr = time.strftime(options.time_format, time.localtime(t))
else:
timestr = time.ctime(t)

I'm now using this, I think that should be correct:

def read_whisper(path):
    (fromTime, untilTime, step), val = whisper.fetch(path, fromTime=0, archiveToSelect=None)
    fromTimeStamp = pd.Timestamp(fromTime, unit="s", tz="Europe/Berlin")
    index = pd.date_range(
        start=fromTimeStamp,
        freq="H",
        periods=len(val),
    )
    data = {"val": val}
    return pd.DataFrame(data, index=index)

from whisper.

cdeil avatar cdeil commented on May 27, 2024

I was getting incorrect data with whisper.fetch for archiveToSelect=60, tried varies fromTime values.

I see correct data with whisper-dump.py, but I don't want to write temp CSV files.

Wrote this, which seems to work fine:

def read_whisper_archive(path: str, archive_id: int) -> pd.DataFrame:
    """Whisper data read direct implementation with Numpy and Pandas"""
    infos = whisper.info(path)
    if archive_id < 0 or archive_id >= len(infos["archives"]):
        raise ValueError(f"Invalid archive_id = {archive_id}")

    dtype = np.dtype([
        ("time", ">u4"),
        ("val", ">f8")
    ])

    offset = infos["archives"][archive_id]["offset"]
    data = np.fromfile(path, dtype=dtype, offset=offset)
    data = data[np.nonzero(data["time"])]
    # The astype is needed to avoid this error later on
    # ValueError: Big-endian buffer not supported on little-endian compiler
    df = pd.DataFrame(
        data={"val": data["val"].astype(float)},
        index=pd.to_datetime(data["time"], unit="s")
    )
    df = df.sort_index()
    return df

This should be much faster and memory-efficient than the current whisper-dump.py, which used Python types and lists, and also more convenient for use cases where people want to do data analysis directly, or dump to binary formats like Parquet / HDF5 / ....

Is the processing correct, i.e. is it guaranteed that non-filled values have time=0? and is the sorting by time at the end needed, or is this already the case in the file? the description at https://graphite.readthedocs.io/en/stable/whisper.html unfortunately doesn't explain where values are filled within the archive.

Do you think it could make sense to add a function like this to this repo? The Numpy & Pandas import could be delayed to the function, i.e. it would be an optional dependency.
Alternatively I could just make a file whisper_pandas.py in my private Github account share some functions there.

from whisper.

deniszh avatar deniszh commented on May 27, 2024

Hi @cdeil,
I agreed that Whisper has many use cases, but IMO using it for analytical purposes is not widely adopted.
That's why I do not think that including pandas or numpy as part of library is a good idea.
But you can add your scripts to contrib/ directory, which exist for exact that reason.

from whisper.

stale avatar stale commented on May 27, 2024

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

from whisper.

cdeil avatar cdeil commented on May 27, 2024

Just in case someone finds this old thread and is looking for a Whisper file Pandas reader, check this out:
https://github.com/cdeil/whisper_pandas/blob/main/whisper_pandas.py

Of course, any feedback or contribution would be welcome.

Specifically I'm not sure if the data = data[data["time"] != 0] line and sort_index is valid.

It seems to work for my files, but the WhisperDB docs at https://graphite.readthedocs.io/en/latest/whisper.html unfortunately don't say how the file is initialised or where the points are inserted.

from whisper.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.