Giter Site home page Giter Site logo

traces's Introduction

Build status codecov Commit activity

traces

A Python library for unevenly-spaced time series analysis.

Why?

Taking measurements at irregular intervals is common, but most tools are primarily designed for evenly-spaced measurements. Also, in the real world, time series have missing observations or you may have multiple series with different frequencies: it can be useful to model these as unevenly-spaced.

Traces was designed by the team at Datascope based on several practical applications in different domains, because it turns out unevenly-spaced data is actually pretty great, particularly for sensor data analysis.

Installation

To install traces, run this command in your terminal:

$ pip install traces

Quickstart: using traces

To see a basic use of traces, let's look at these data from a light switch, also known as Big Data from the Internet of Things.

The main object in traces is a TimeSeries, which you create just like a dictionary, adding the five measurements at 6:00am, 7:45:56am, etc.

>>> time_series = traces.TimeSeries()
>>> time_series[datetime(2042, 2, 1,  6,  0,  0)] = 0 #  6:00:00am
>>> time_series[datetime(2042, 2, 1,  7, 45, 56)] = 1 #  7:45:56am
>>> time_series[datetime(2042, 2, 1,  8, 51, 42)] = 0 #  8:51:42am
>>> time_series[datetime(2042, 2, 1, 12,  3, 56)] = 1 # 12:03:56am
>>> time_series[datetime(2042, 2, 1, 12,  7, 13)] = 0 # 12:07:13am

What if you want to know if the light was on at 11am? Unlike a python dictionary, you can look up the value at any time even if it's not one of the measurement times.

>>> time_series[datetime(2042, 2, 1, 11,  0, 0)] # 11:00am
0

The distribution function gives you the fraction of time that the TimeSeries is in each state.

>>> time_series.distribution(
>>>   start=datetime(2042, 2, 1,  6,  0,  0), # 6:00am
>>>   end=datetime(2042, 2, 1,  13,  0,  0)   # 1:00pm
>>> )
Histogram({0: 0.8355952380952381, 1: 0.16440476190476191})

The light was on about 16% of the time between 6am and 1pm.

Adding more data...

Now let's get a little more complicated and look at the sensor readings from forty lights in a house.

How many lights are on throughout the day? The merge function takes the forty individual TimeSeries and efficiently merges them into one TimeSeries where the each value is a list of all lights.

>>> trace_list = [... list of forty traces.TimeSeries ...]
>>> count = traces.TimeSeries.merge(trace_list, operation=sum)

We also applied a sum operation to the list of states to get the TimeSeries of the number of lights that are on.

How many lights are on in the building on average during business hours, from 8am to 6pm?

>>> histogram = count.distribution(
>>>   start=datetime(2042, 2, 1,  8,  0,  0),   # 8:00am
>>>   end=datetime(2042, 2, 1,  12 + 6,  0,  0) # 6:00pm
>>> )
>>> histogram.median()
17

The distribution function returns a Histogram that can be used to get summary metrics such as the mean or quantiles.

It's flexible

The measurements points (keys) in a TimeSeries can be in any units as long as they can be ordered. The values can be anything.

For example, you can use a TimeSeries to keep track the contents of a grocery basket by the number of minutes within a shopping trip.

>>> time_series = traces.TimeSeries()
>>> time_series[1.2] = {'broccoli'}
>>> time_series[1.7] = {'broccoli', 'apple'}
>>> time_series[2.2] = {'apple'}          # puts broccoli back
>>> time_series[3.5] = {'apple', 'beets'} # mmm, beets

More info

To learn more, check the examples and the detailed reference.

Contributing

Contributions are welcome and greatly appreciated! Please visit our guidelines for more info.

traces's People

Contributors

gokturksm avatar nsteins avatar pyup-bot avatar sdementen avatar stringertheory avatar vlsd avatar ypleong avatar zackdrescher avatar zzrcxb avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

traces's Issues

[WIP] set_interval problems when dealing with hours

It looks like set_interval wrongly sets the value of the end-item, when dealing with hours and datetime..
Hard to explain, here is an example.

import traces, datetime

# This works as expected.. Notice that we did not split up the day yet...
t = traces.TimeSeries()
t[datetime.datetime(2000, 1, 1)] = 1
t[datetime.datetime(2000, 1, 2)] = 0
t[datetime.datetime(2000, 1, 3)] = 1
t[datetime.datetime(2000, 1, 4)] = 1
t.set_interval(datetime.datetime(2000, 1, 1, 0, 0), datetime.datetime(2000, 1, 2, 0, 0), 1)

# t is now:
{
 datetime.datetime(2000, 1, 1, 0, 0): 1,
 datetime.datetime(2000, 1, 2, 0, 0): 1,
 datetime.datetime(2000, 1, 3, 0, 0): 1,
 datetime.datetime(2000, 1, 4, 0, 0): 1
}

# Do the same thing, but with one of the days split
t = traces.TimeSeries()
t[datetime.datetime(2000, 1, 1)] = 1
t[datetime.datetime(2000, 1, 2)] = 0
t[datetime.datetime(2000, 1, 2, 12)] = 1
t[datetime.datetime(2000, 1, 4)] = 1
t.set_interval(datetime.datetime(2000, 1, 2, 0, 0), datetime.datetime(2000, 1, 2, 12, 0), 1)

# t is now
{
 datetime.datetime(2000, 1, 1, 0, 0): 1,
 datetime.datetime(2000, 1, 2, 0, 0): 1,
 datetime.datetime(2000, 1, 2, 12, 0): 0,
 datetime.datetime(2000, 1, 4, 0, 0): 1
}

This even happens when simple cases, like

t = traces.TimeSeries()
t[1] = 1
t[2] = 0
t[3] = 1
t[4] = 1
t.set_interval(2, 3, 1)

# t is now
{
  1: 1,
  2: 1,
  3: 0,
  4: 1
}

TypeError on TimeSeries.distribution() using numpy datetime64 as index

The following snippet yields a TypeError: duration is an unknown type (600 seconds)

import numpy as np
import traces
ts = traces.TimeSeries()
ts[np.datetime64('2017-01-01T12:00:00')] = 1
ts[np.datetime64('2017-01-01T12:10:00')] = 2
ts[np.datetime64('2017-01-01T12:20:00')] = 1
ts.distribution()

traces version 0.3.1

[Question] How to recreate traces chart?

I wonder, how could one plot traces' Signature plot? signature plot ?

I was wondering if the library has anything to do with the charts (as per the docs that is not the case) but seeing a couple of charts like that in the docs made me think that maybe producing that kind of charts is within the scope of the projects.

better methods for start and end times/values

Maybe something like:

ti, tf = ts.get_first_and_last_times()

vi, vf = ts.get_first_and_last_values()

(ti, vi), (tf, vf) = ts.get_first_and_list()

to be better than:

ti, vi = ts.get_by_index(0)
tf, vf = ts.get_by_index(-1)

(or at least have get_by_index documented)

Feature Request: Logical Negation

As of version 0.3.1 the binary operations ^, &, and | seem to be implemented.

I use traces for working with boolean time series, so I'd be interested in also having ~ implemented.

Usage:

import traces
x = trace.TimeSeries([(0, True), (1, False)])

~x  # [(0, False), (1, True)]

Values in TimeSeries.distribution() are sentence-cased regardless of how vales were added to the TimeSeries

If you are using strings as values in a TimeSeries:

ts = traces.TimeSeries()
ts[1] = JUNK
ts[3] = JANK
ts[5] = WHAT

If you call something like ts.distrubution(min, max), you would see something like this:

Histogram(None, 1000, {'Jank': 0.16008504570112725, 'Junk': 0.04229136076598496, 'What': 0.797577092766277})

It looks like somewhere along the line, the string-values are getting sentence-cased. Not sure exactly where yet, but this could be confusing or cause silly bugs if looking-up these objects with the wrong value.

How are the plots in the documentation created?

Not a bug, but just curious about how you've plotted the charts in the documentation and what the recommended approach for plotting TimeSeries objects is? I couldn't find a trace of this information in the repo. Thanks in advance!

PyPI Package Installed additional tkinter module in site-packages

tkinter was already installed in "C:\Anaconda2\Lib\lib-tk"
This caused an exception when another module (easygui.fileopenbox) was accessing tkinter.

AttributeError: 'module' object has no attribute 'askopenfilename'

If the presence of tkinter could be checked first, I guess pip would not need to install another version of tkinter?

I fixed it by deleting tkinter in site-packages.

add possibility to write ts[start:end] = v to change value on an interval

I have a use case where I need to change the value of a timeseries on an interval without changing the value outside of the interval, ie do something like ts[start:end] = value.
Just setting

ts[end] = ts[end]   # freezing/anchoring the current value of ts as of [end, ...)
ts[start] = value      # changing the value as of [start, ...)

may fail as intermediate points in [start,end) may exist ==> we need to remove all intermediate points (which is easy as ts.iterperiods(start,end) provides them nicely).

I think the function below does it properly (but it would be better integrated in the item to use the slice notation)

def set_slice(ts, start, end, value):
    """
   ts[start:end] = value ==> call set_slice(ts, start, end, value)
    Set the value of the ts so that
    - on the interval [start, end) we have the new value
    - on [end, ...) we haven't change the value
    - on (..., start) we haven't change the value neither
    We replace the value of the ts on an interval.

    :param ts: 
    :param start: 
    :param end: 
    :param value: 
    :return: 
    """
    # for each interval to render
    for i, (s, e, v) in enumerate(list(ts.iterperiods(start, end))):
        # look at all intervals included in the current interval
        # (always at least 1)
        if i == 0:
            # if the first, set initial value to new value of range
            ts[s] = value
        else:
            # otherwise, remove intermediate key
            del ts[s]
    # finish by setting the end of the interval to the previous value
    ts[end] = v

logical_or should not be casting to an int

This is an error we get if we try and logical_or timeseries that have None entries. Note that in python None is falsy, so bool(None) returns False. I would expected something called logical_* to cast to a boolean, if casting to anything.

  File "/src/traces/traces/timeseries.py", line 805, in logical_or
    return self.operation(other, lambda x, y: int(x or y))
  File "/src/traces/traces/timeseries.py", line 750, in operation
    result[time] = function(value, other[time])
  File "/src/traces/traces/timeseries.py", line 805, in <lambda>
    return self.operation(other, lambda x, y: int(x or y))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Documentation needs updating

Documentation lists TimeSeries.first_item() + TimeSeries.last_item() which are depreciated to TimeSeries.first() + TimeSeries.last()

Great work guys ๐Ÿ’ฏ

TypeError using pandas.Timestamp as index

The following snippet yields a TypeError: Cannot compare type 'Timestamp' with type 'Infinity':

import pandas as pd
import traces
ts = traces.TimeSeries()
ts[pd.Timestamp('2017-07-20 12:44')] = 1

traces version 0.3.1

PyPi Package incomplete

The package on PyPi is incomplete
known to affect 0.3.0 and 0.3.1 (others unknown)

the package is missing the folder 'requirements' and contained files.
this causes the 'setup.py' script to fail because it cannot get the list of dependencies.

This can be tested by
download package 'traces-0.3.1.tar.gz' or 'traces-0.3.0.tar.gz' from

extract the files from the archive and running the 'setup.py' script

  • ie. python setup.py install --user

This affects manual install.
have not tested using other methods or other packages not mentioned.

Updating a default value doesn't save the updated value

In a collections.defaultdict, using something like add or update for an empty default key changes the value at that key:

>>> a = defaultdict(dict)
>>> a["hello"]
{}
>>> a["hello"].update([("a",1)])
>>> a["hello"]
{'a': 1}

or for sets...

>>> a = defaultdict(set)
>>> a["hello"]
{}
>>> a["hello"].add(["a"])
>>> a["hello"]
{'a'}

Traces doesn't act this way.

KeyError: 'no measurement at XXX' when setting value of a trace on a slice

When running the following

from datetime import date

import traces

tr = traces.TimeSeries({date(2000, 9, 21): 359.0,
                        date(2019, 6, 28): 0.0,
                        date(2019, 7, 28): 339.0,
                        date(2200, 4, 11): 359.0})

tr[date(2016, 12, 31):date(2021, 12, 31)] = 352.5

I get a KeyError: 'no measurement at 2019-06-28'.

This is due to the fact that set_interval uses iter_periods (that is a generator) and modifies at the same time the data of the traces.
Replacing https://github.com/datascopeanalytics/traces/blob/master/traces/timeseries.py#L213 from

        for i, (s, e, v) in enumerate(self.iterperiods(start, end)):

to

        for i, (s, e, v) in enumerate(list(self.iterperiods(start, end))):

fixes the bug.

Domains lost when using TimeSeries.operation method

Minimal Example using traces v0.3.1

import traces
x = traces.TimeSeries([(0, True), (10, False)], domain=(0, 20))
x.domain  # (0, 20)
y = x.operation(x, lambda a,b: a^b)
y.domain  # (-inf, inf)

It seems to me that y's domain should still be (0, 20). From an API standpoint, if applying a binary operation on x1 and x2, I think that it'd be simpler to have the resulting domain be x1.domain & x2.domain

When using a mask with TimeSeries.distribution(), mask.start() is called in `timeseries.py` but `start()` doesn't exist

I think this will be fixed with the next bump; looked for an issue related to this but didn't find one. Feel free to close this out if it was as simple as defining start() for TimeSeries.

Traceback (most recent call last):
  File "run_plots.py", line 25, in <module>
    make_plots()
  File "/Users/mjfm/projects/modustri/analysis/plots/see_cart_trips.py", line 55, in make_plots
    mask = front_ts,
  File "/Users/mjfm/Virtualenvs/modustri/lib/python2.7/site-packages/traces/timeseries.py", line 622, in distribution
    new_ts = self.slice(mask.start(), mask.end())
AttributeError: 'TimeSeries' object has no attribute 'start'

Add `compact` option to `iterperiods()`

This would merge adjacent periods that have the same value and return them as only one period. Ideally this would be done efficiently, although I'm unclear what that means (store a compact version of the timeseries along with the non-compact one?)

ValueError: start can't be >= end (level_cum_sum >= level_cum_sum)

Hi,

i want to use traces to convert an unevenly spaced Time Series to an evenly spaced one (like the example in the docs). When doing this the following error is raised:
"ValueError: start can't be >= end (level_cum_sum >= level_cum_sum)"

My TimeSeries looks like below when i print it to the console 1. Thus level_cum_sum is ordered and i removed duplicates.

The line raising the error is:
"regular = time_series.moving_average(10, pandas = True)"

I assume there is an error on my side, not a bug. The error message does not help myself much (more or less new to Python, but experienced in other languages).

Any help would be much appreciated.

Best regards,
Jannis

SortedDict(None, 1000, {'level_cum_sum': lagHoursCorrectedCumSum
4.730833 10.0
8.776111 20.0
8.882778 30.0
10.854722 40.0
12.983611 50.0
13.745000 53.0
17.923889 63.0
20.740833 69.0
20.747500 70.0
24.512500 73.0
25.074167 74.0
30.734722 84.0
32.031944 94.0
32.270556 100.0
36.824722 110.0
37.970278 120.0
38.818889 122.0
40.390278 132.0
41.816944 142.0
42.810833 145.0
45.745278 155.0
46.010278 157.0
46.018889 158.0
48.935000 168.0
50.605000 178.0
51.175000 188.0
52.663889 198.0
52.685000 200.0
56.738611 208.0
61.617222 218.0
...
419.344444 1550.0
423.516667 1560.0
426.186389 1569.0
427.395000 1579.0
429.465556 1589.0
429.522222 1590.0
429.528333 1591.0
430.611111 1601.0
435.259722 1611.0
436.240278 1621.0
438.515278 1630.0
439.875833 1640.0
441.641111 1650.0
443.168056 1656.0
443.173611 1657.0
448.372222 1667.0
451.460833 1677.0
454.263333 1687.0
465.015000 1697.0
468.279722 1706.0
471.472222 1716.0
473.009444 1724.0
473.701389 1727.0
473.785833 1728.0
478.067500 1738.0
483.246389 1748.0
484.898333 1758.0
485.238056 1768.0
488.893611 1778.0
488.960278 1780.0
Name: level_cum_sum, dtype: float64})

Can't pickle TimeSeries objects

[UPDATE] This only seems to happen on python 2.7

Trying to pickle a TimeSeries object:

import traces
ofile = open('test.pkl', 'wb')
import pickle
ts = traces.TimeSeries()
ts[23]="blah"
ts[2]="foo"
pickle.dump(ts, ofile)

I get the following error:

In [9]: pickle.dump(ts, ofile)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-f1eed5bd8d83> in <module>()
----> 1 pickle.dump(ts, ofile)

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in dump(obj, file, protocol)
   1374
   1375 def dump(obj, file, protocol=None):
-> 1376     Pickler(file, protocol).dump(obj)
   1377
   1378 def dumps(obj, protocol=None):

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    329
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332
    333     def persistent_id(self, obj):

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    423
    424         if state is not None:
--> 425             save(state)
    426             write(BUILD)
    427

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656
    657     dispatch[DictionaryType] = save_dict

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    667             for k, v in items:
    668                 save(k)
--> 669                 save(v)
    670                 write(SETITEM)
    671             return

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656
    657     dispatch[DictionaryType] = save_dict

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    667             for k, v in items:
    668                 save(k)
--> 669                 save(v)
    670                 write(SETITEM)
    671             return

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

/Users/vlad/.pyenv/versions/2.7.13/envs/prelude_monitor/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
     68     else:
     69         if base is self.__class__:
---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
     71         state = base(self)
     72     args = (self.__class__, base, state)

TypeError: can't pickle instancemethod objects```

How does one set the default to None?

Looking in the code it seems like default=None is taken to mean that there is no default value for the time series. How would one go about to creating a TimeSeries where the default value is the None object?

Feature Request: linear interpolation for mean

So I recently discovered this nice library and decided to try it since I got unevenly spaced data,
however I found out today that the .mean() wasn't doing linear interpolation as I thought it would be:

>>> from traces import TimeSeries
>>> t = TimeSeries()
>>> t[0] = 0
>>> t[1] = 0
>>> t[3] = 20
>>> t.mean(0, 2)
0.0

With linear interpolation between 2 points we would find that t[2] = 10 and doing the average from 0 to 2 would give us 3.333 in this example.
A simple optional argument in mean() to choose the interpolation method would be fantastic, and I really think that it would be useful to many users who are not using traces exclusively for binary data (where linear interpolation would make no sense).
I know that we can re-sample the TimeSeries but I think a shortcut like this would be really neat since this library is designed with ease of use in mind.

Thanks for reading and have a nice day ๐Ÿ‘‹

sample at some time vs sample on some interval (~ moving_average)

I would like to prepare a PR to improve the efficiency of moving_average with pandas=True. I have already the code but I am wondering how to call the function that does this.

More generally, when converting the traces to some regularly spaced pandas time-series, we can either:

  • on each specific datetime, give the value of the TimeSeries ==> the current TimeSeries.sample function
  • on each interval [datetime, datetime+sampling_period), give the average (or the max, min, median, ...) of the TimeSeries in a similar way as the groupyby().agg() functionality of pandas ==> this can't be called moving_average as a) it does not allow moving averages, b) it is not limited to average (as we have max, min, median, etc).

What name could be use for this "sample on an interval through an 'aggregation' function" ?
Would sample_interval makes sense ? would the sample be renamed to sample_instant ?

merge with different domains

It could be useful to be able to merge time series with different domains. The resulting domain could either be the union or intersection. In case of the union, it would probably be necessary to specify a "fill value".

This would be useful if you happen to have time series with different domains (say, light bulb sensors but with different on times), and you still want to do a simple count of "number of lights on".

TimeSeries.merge([a, b], domain='union', fillvalue=0) 
TimeSeries.merge([a, b], domain='intersection')
TimeSeries.merge([a, b], domain=None) # throw error with different domains

Checking for default value generates a KeyError if none is set

Consider the following:

ts = TimeSeries()
hasattr(ts, 'default')

it results in a KeyError. First, this is a bad behavior (bug) and should be fixed. Secondly, how should one properly check if a default is set or not? Maybe a has_default() method would be useful for this?

Feature Request: Bundle of TimeSeries

In #169 I outlined a use case for Domains. In writing up that usecase, I realized that I could also work with a "Bundle" TimeSeries. Namely, I often work with a number of boolean valued time series whose values are constant for large periods of time. However, combining the time series multiplies the number of points required to represent the time series (particularly if combined with a real valued signal).

My current solution has been to create a dictionary of TimeSeries objects, but this ran into problems when the timeseries are defined on different domains. It would be nice to have an object that "bundles" multiple TimeSeries objects but asserts that they all have the same domain.

clean up the base module

Current implementation was a last-minute hack and there's a bunch of duplicate code and terrible names that need to be cleaned up.

wrong value when assigning twice the same interval

Assigning a value to an interval is not idempotent, the second operation will change the value taken after the interval:

import traces

tr = traces.TimeSeries({date(2000, 1, 1): 100,
                        date(2100, 1, 1): 100})

tr[date(2019, 2, 3):date(2019, 2, 7)] = 0
print(tr)

gives

<TimeSeries>
{datetime.date(2000, 1, 1): 100,
 datetime.date(2019, 2, 3): 0,
 datetime.date(2019, 2, 7): 100,
 datetime.date(2100, 1, 1): 100}
</TimeSeries>

but

import traces

tr = traces.TimeSeries({date(2000, 1, 1): 100,
                        date(2100, 1, 1): 100})

tr[date(2019, 2, 3):date(2019, 2, 7)] = 0
tr[date(2019, 2, 3):date(2019, 2, 7)] = 0
print(tr)

gives

<TimeSeries>
{datetime.date(2000, 1, 1): 100,
 datetime.date(2019, 2, 3): 0,
 datetime.date(2019, 2, 7): 0,
 datetime.date(2100, 1, 1): 100}
</TimeSeries>

(the value after the 2019/02/07 is now 0 instead of the original value.

I think line https://github.com/datascopeanalytics/traces/blob/master/traces/timeseries.py#L366 should be
if interval_t0 <= end:
instead of
if interval_t0 < end:

Using merge with unhashable type TimeSeries problematic

Try the following:

ts_a = traces.TimeSeries(default=traces.Histogram({0:1}))
ts_b = traces.TimeSeries(default=traces.Histogram({0:1}))
traces.TimeSeries.merge([ts_a, ts_b])

and it will result in

/traces/traces/timeseries.py in merge(cls, ts_list, compact, operation, default)
    681
    682         if default is None:
--> 683             unique_defaults = set(ts._default for ts in ts_list)
    684             default = unique_defaults.pop()
    685             if unique_defaults:

TypeError: unhashable type: 'Histogram'

Do the following instead, and it will work:

traces.TimeSeries.merge([ts_a, ts_b], default=traces.Histogram())

SorterContainers >2.0 breaks backwards compatibility

This is a reminder that traces doesn't currently work with the current version of SortedContainers because of a backwards incompatibility when iterating over a SortedDict. I should look into this and remedy it.

TimeSeries.merge fails when values are of different types

If you try and run the following:

from traces import TimeSeries
ts_a = TimeSeries(default=None)
ts_b = TimeSeries(default=None)
ts_a[0] = True
ts_b[0] = None
ts_merge = TimeSeries.merge([ts_a, ts_b])

you will get this error:

  File "/src/traces/traces/timeseries.py", line 687, in merge
    for t, merged in cls.iter_merge(ts_list):
  File "/src/traces/traces/timeseries.py", line 654, in iter_merge
    for index, (t, state) in enumerate(cls._iter_merge(timeseries_list)):
  File "/src/traces/traces/timeseries.py", line 616, in _iter_merge
    (t, next_value), index, iterator = queue.get()
  File "/usr/local/lib/python3.6/queue.py", line 174, in get
    item = self._get()
  File "/usr/local/lib/python3.6/queue.py", line 230, in _get
    return heappop(self.queue)
TypeError: '<' not supported between instances of 'bool' and 'NoneType'

merge fails when passed an empty iterable

Running the following line of code fails:

traces.TimeSeries().merge([], default=None)

with the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/vlad/.virtualenvs/steelcase/src/traces/traces/timeseries.py", line 679, in merge
    default = unique_defaults.pop()
KeyError: 'pop from an empty set'

Ideally this would return an empty TimeSeries with the default set to None

Unary operations on TimeSeries

Given the current API (v0.3.1) there does not seem to be a straightforward way to do a unary operation.

Suppose one wishes to negate a boolean valued signal. One option is:

import traces

x = traces.TimeSeries([(0, False), (1, True)])
x.operation(x, lambda val, _: not val)

but this seems somewhat clunky.

Better might be

import traces

x = traces.TimeSeries([(0, True), (1, False)])
x.map(lambda val: not val)

Which could be syntatic sugar for:

traces.TimeSeries((t, not val) for (t, v) in x))

Trying to calculate the mean of an empty Histogram fails

Running .mean() on an empty Histogram object (Histogram(None, 1000, {0: 0.0})) fails with a divide by zero error:

  File "/src/traces/traces/histogram.py", line 30, in mean
    return weighted_sum / float(self.total())
ZeroDivisionError: float division by zero

Recommendation for visualization package/tool

Hey guys,
great project! Could you tell me which package/tool was used to generate the diagrams shown in README? Which other packages do you use in combination with traces? Would it be useful for others to mention a few of these in the README?

Best!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.