stringertheory / traces Goto Github PK

View Code? Open in Web Editor NEW

524.0 13.0 61.0 3.01 MB

A Python library for unevenly-spaced time series analysis

Home Page: http://traces.readthedocs.io

License: MIT License

Python 97.80% Makefile 1.70% Shell 0.50%

traces's Introduction

traces

A Python library for unevenly-spaced time series analysis.

Why?

Taking measurements at irregular intervals is common, but most tools are primarily designed for evenly-spaced measurements. Also, in the real world, time series have missing observations or you may have multiple series with different frequencies: it can be useful to model these as unevenly-spaced.

Traces was designed by the team at Datascope based on several practical applications in different domains, because it turns out unevenly-spaced data is actually pretty great, particularly for sensor data analysis.

Installation

To install traces, run this command in your terminal:

$ pip install traces

Quickstart: using traces

To see a basic use of traces, let's look at these data from a light switch, also known as Big Data from the Internet of Things.

The main object in traces is a TimeSeries, which you create just like a dictionary, adding the five measurements at 6:00am, 7:45:56am, etc.

>>> time_series = traces.TimeSeries()
>>> time_series[datetime(2042, 2, 1,  6,  0,  0)] = 0 #  6:00:00am
>>> time_series[datetime(2042, 2, 1,  7, 45, 56)] = 1 #  7:45:56am
>>> time_series[datetime(2042, 2, 1,  8, 51, 42)] = 0 #  8:51:42am
>>> time_series[datetime(2042, 2, 1, 12,  3, 56)] = 1 # 12:03:56am
>>> time_series[datetime(2042, 2, 1, 12,  7, 13)] = 0 # 12:07:13am

What if you want to know if the light was on at 11am? Unlike a python dictionary, you can look up the value at any time even if it's not one of the measurement times.

>>> time_series[datetime(2042, 2, 1, 11,  0, 0)] # 11:00am
0

The distribution function gives you the fraction of time that the TimeSeries is in each state.

>>> time_series.distribution(
>>>   start=datetime(2042, 2, 1,  6,  0,  0), # 6:00am
>>>   end=datetime(2042, 2, 1,  13,  0,  0)   # 1:00pm
>>> )
Histogram({0: 0.8355952380952381, 1: 0.16440476190476191})

The light was on about 16% of the time between 6am and 1pm.

Adding more data...

Now let's get a little more complicated and look at the sensor readings from forty lights in a house.

How many lights are on throughout the day? The merge function takes the forty individual TimeSeries and efficiently merges them into one TimeSeries where the each value is a list of all lights.

>>> trace_list = [... list of forty traces.TimeSeries ...]
>>> count = traces.TimeSeries.merge(trace_list, operation=sum)

We also applied a sum operation to the list of states to get the TimeSeries of the number of lights that are on.

How many lights are on in the building on average during business hours, from 8am to 6pm?

>>> histogram = count.distribution(
>>>   start=datetime(2042, 2, 1,  8,  0,  0),   # 8:00am
>>>   end=datetime(2042, 2, 1,  12 + 6,  0,  0) # 6:00pm
>>> )
>>> histogram.median()
17

The distribution function returns a Histogram that can be used to get summary metrics such as the mean or quantiles.

It's flexible

The measurements points (keys) in a TimeSeries can be in any units as long as they can be ordered. The values can be anything.

For example, you can use a TimeSeries to keep track the contents of a grocery basket by the number of minutes within a shopping trip.

>>> time_series = traces.TimeSeries()
>>> time_series[1.2] = {'broccoli'}
>>> time_series[1.7] = {'broccoli', 'apple'}
>>> time_series[2.2] = {'apple'}          # puts broccoli back
>>> time_series[3.5] = {'apple', 'beets'} # mmm, beets

More info

To learn more, check the examples and the detailed reference.

Contributing

Contributions are welcome and greatly appreciated! Please visit our guidelines for more info.

traces's People

Contributors

Stargazers

Watchers

Forkers

sdementen antonini ericschles wanjinchang bonfy techscientist txd888 benjamesbabala boumer lfthwjx sharkiteuthis nivertech kpsychas radovankavicky arunnairid romeopatrick11 zackdrescher mvcisback thezedwards ido hal2001 mbloem-steelcase abiraja2004 dreamscape9999 vishalbelsare afcarl xeor 3shmawei nachereshata drarnakarick surfaceowl kristianeschenburg laragoxhaj tnet zhangjiekui gjeusel tsa-heidi nkgfirecream knut0815 sanyam07 cesarrodrig dherincx92 nsteins gokturksm lastfreedom ensj sandy4321 brisabrin fazlencodes rootair compgeoinc fise81 passion4energy yobdoy zzrcxb adwitiya23 arshednabeel butayama

traces's Issues

rename `default_values` to singular (`default_value`)

in the __init__ method of TimeSeries

better error message for `distribution` on empty time series when no start, end specified

currently gives IndexError: list index out of range, but should give something like what happens if there is only one point in TimeSeries (ValueError: start can't be >= end (0 >= 0)) or something more specific

[WIP] set_interval problems when dealing with hours

It looks like set_interval wrongly sets the value of the end-item, when dealing with hours and datetime..
Hard to explain, here is an example.

import traces, datetime

# This works as expected.. Notice that we did not split up the day yet...
t = traces.TimeSeries()
t[datetime.datetime(2000, 1, 1)] = 1
t[datetime.datetime(2000, 1, 2)] = 0
t[datetime.datetime(2000, 1, 3)] = 1
t[datetime.datetime(2000, 1, 4)] = 1
t.set_interval(datetime.datetime(2000, 1, 1, 0, 0), datetime.datetime(2000, 1, 2, 0, 0), 1)

# t is now:
{
 datetime.datetime(2000, 1, 1, 0, 0): 1,
 datetime.datetime(2000, 1, 2, 0, 0): 1,
 datetime.datetime(2000, 1, 3, 0, 0): 1,
 datetime.datetime(2000, 1, 4, 0, 0): 1
}

# Do the same thing, but with one of the days split
t = traces.TimeSeries()
t[datetime.datetime(2000, 1, 1)] = 1
t[datetime.datetime(2000, 1, 2)] = 0
t[datetime.datetime(2000, 1, 2, 12)] = 1
t[datetime.datetime(2000, 1, 4)] = 1
t.set_interval(datetime.datetime(2000, 1, 2, 0, 0), datetime.datetime(2000, 1, 2, 12, 0), 1)

# t is now
{
 datetime.datetime(2000, 1, 1, 0, 0): 1,
 datetime.datetime(2000, 1, 2, 0, 0): 1,
 datetime.datetime(2000, 1, 2, 12, 0): 0,
 datetime.datetime(2000, 1, 4, 0, 0): 1
}

This even happens when simple cases, like

t = traces.TimeSeries()
t[1] = 1
t[2] = 0
t[3] = 1
t[4] = 1
t.set_interval(2, 3, 1)

# t is now
{
  1: 1,
  2: 1,
  3: 0,
  4: 1
}

TypeError on TimeSeries.distribution() using numpy datetime64 as index

The following snippet yields a TypeError: duration is an unknown type (600 seconds)

import numpy as np
import traces
ts = traces.TimeSeries()
ts[np.datetime64('2017-01-01T12:00:00')] = 1
ts[np.datetime64('2017-01-01T12:10:00')] = 2
ts[np.datetime64('2017-01-01T12:20:00')] = 1
ts.distribution()

traces version 0.3.1

[Question] How to recreate traces chart?

I wonder, how could one plot traces' Signature plot? ?

I was wondering if the library has anything to do with the charts (as per the docs that is not the case) but seeing a couple of charts like that in the docs made me think that maybe producing that kind of charts is within the scope of the projects.

better methods for start and end times/values

Maybe something like:

ti, tf = ts.get_first_and_last_times()

vi, vf = ts.get_first_and_last_values()

(ti, vi), (tf, vf) = ts.get_first_and_list()

to be better than:

ti, vi = ts.get_by_index(0)
tf, vf = ts.get_by_index(-1)

(or at least have get_by_index documented)

add test case for using TimeSeries instance as distribution mask

Feature Request: Logical Negation

As of version 0.3.1 the binary operations ^, &, and | seem to be implemented.

I use traces for working with boolean time series, so I'd be interested in also having ~ implemented.

Usage:

import traces
x = trace.TimeSeries([(0, True), (1, False)])

~x  # [(0, False), (1, True)]

Values in TimeSeries.distribution() are sentence-cased regardless of how vales were added to the TimeSeries

If you are using strings as values in a TimeSeries:

ts = traces.TimeSeries()
ts[1] = JUNK
ts[3] = JANK
ts[5] = WHAT

If you call something like ts.distrubution(min, max), you would see something like this:

Histogram(None, 1000, {'Jank': 0.16008504570112725, 'Junk': 0.04229136076598496, 'What': 0.797577092766277})

It looks like somewhere along the line, the string-values are getting sentence-cased. Not sure exactly where yet, but this could be confusing or cause silly bugs if looking-up these objects with the wrong value.

How are the plots in the documentation created?

Not a bug, but just curious about how you've plotted the charts in the documentation and what the recommended approach for plotting TimeSeries objects is? I couldn't find a trace of this information in the repo. Thanks in advance!

PyPI Package Installed additional tkinter module in site-packages

tkinter was already installed in "C:\Anaconda2\Lib\lib-tk"
This caused an exception when another module (easygui.fileopenbox) was accessing tkinter.

AttributeError: 'module' object has no attribute 'askopenfilename'

If the presence of tkinter could be checked first, I guess pip would not need to install another version of tkinter?

I fixed it by deleting tkinter in site-packages.

add possibility to write ts[start:end] = v to change value on an interval

I have a use case where I need to change the value of a timeseries on an interval without changing the value outside of the interval, ie do something like ts[start:end] = value.
Just setting

ts[end] = ts[end]   # freezing/anchoring the current value of ts as of [end, ...)
ts[start] = value      # changing the value as of [start, ...)

may fail as intermediate points in [start,end) may exist ==> we need to remove all intermediate points (which is easy as ts.iterperiods(start,end) provides them nicely).

I think the function below does it properly (but it would be better integrated in the item to use the slice notation)

def set_slice(ts, start, end, value):
    """
   ts[start:end] = value ==> call set_slice(ts, start, end, value)
    Set the value of the ts so that
    - on the interval [start, end) we have the new value
    - on [end, ...) we haven't change the value
    - on (..., start) we haven't change the value neither
    We replace the value of the ts on an interval.

    :param ts: 
    :param start: 
    :param end: 
    :param value: 
    :return: 
    """
    # for each interval to render
    for i, (s, e, v) in enumerate(list(ts.iterperiods(start, end))):
        # look at all intervals included in the current interval
        # (always at least 1)
        if i == 0:
            # if the first, set initial value to new value of range
            ts[s] = value
        else:
            # otherwise, remove intermediate key
            del ts[s]
    # finish by setting the end of the interval to the previous value
    ts[end] = v

concept of "orderable" needs to be described in documentation

and, in the documentation, needs to be clear what this means, maybe take a look at http://stackoverflow.com/a/19637185/1431778

logical_or should not be casting to an int

This is an error we get if we try and logical_or timeseries that have None entries. Note that in python None is falsy, so bool(None) returns False. I would expected something called logical_* to cast to a boolean, if casting to anything.

  File "/src/traces/traces/timeseries.py", line 805, in logical_or
    return self.operation(other, lambda x, y: int(x or y))
  File "/src/traces/traces/timeseries.py", line 750, in operation
    result[time] = function(value, other[time])
  File "/src/traces/traces/timeseries.py", line 805, in <lambda>
    return self.operation(other, lambda x, y: int(x or y))
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'

Documentation needs updating

Documentation lists TimeSeries.first_item() + TimeSeries.last_item() which are depreciated to TimeSeries.first() + TimeSeries.last()

Great work guys 💯

TypeError using pandas.Timestamp as index

The following snippet yields a TypeError: Cannot compare type 'Timestamp' with type 'Infinity':

import pandas as pd
import traces
ts = traces.TimeSeries()
ts[pd.Timestamp('2017-07-20 12:44')] = 1

traces version 0.3.1

PyPi Package incomplete

The package on PyPi is incomplete
known to affect 0.3.0 and 0.3.1 (others unknown)

the package is missing the folder 'requirements' and contained files.
this causes the 'setup.py' script to fail because it cannot get the list of dependencies.

This can be tested by
download package 'traces-0.3.1.tar.gz' or 'traces-0.3.0.tar.gz' from

https://pypi.python.org/simple/traces/

extract the files from the archive and running the 'setup.py' script

ie. python setup.py install --user

This affects manual install.
have not tested using other methods or other packages not mentioned.

Add functionality/documentation for changing the "default" value before first entry

Right now, the default value starts at zero. Several simple examples where this might not need be the case:

a sensor where the readings are between 0 and 16, centered at 8
a sensor where 1 is neutral and 0 is activated
a signal representing the inverse or complement of another signal
etc.

Updating a default value doesn't save the updated value

In a collections.defaultdict, using something like add or update for an empty default key changes the value at that key:

>>> a = defaultdict(dict)
>>> a["hello"]
{}
>>> a["hello"].update([("a",1)])
>>> a["hello"]
{'a': 1}

or for sets...

>>> a = defaultdict(set)
>>> a["hello"]
{}
>>> a["hello"].add(["a"])
>>> a["hello"]
{'a'}

Traces doesn't act this way.

KeyError: 'no measurement at XXX' when setting value of a trace on a slice

When running the following

from datetime import date

import traces

tr = traces.TimeSeries({date(2000, 9, 21): 359.0,
                        date(2019, 6, 28): 0.0,
                        date(2019, 7, 28): 339.0,
                        date(2200, 4, 11): 359.0})

tr[date(2016, 12, 31):date(2021, 12, 31)] = 352.5

I get a KeyError: 'no measurement at 2019-06-28'.

This is due to the fact that set_interval uses iter_periods (that is a generator) and modifies at the same time the data of the traces.
Replacing https://github.com/datascopeanalytics/traces/blob/master/traces/timeseries.py#L213 from

        for i, (s, e, v) in enumerate(self.iterperiods(start, end)):

        for i, (s, e, v) in enumerate(list(self.iterperiods(start, end))):

fixes the bug.

Domains lost when using TimeSeries.operation method

Minimal Example using traces v0.3.1

import traces
x = traces.TimeSeries([(0, True), (10, False)], domain=(0, 20))
x.domain  # (0, 20)
y = x.operation(x, lambda a,b: a^b)
y.domain  # (-inf, inf)

It seems to me that y's domain should still be (0, 20). From an API standpoint, if applying a binary operation on x1 and x2, I think that it'd be simpler to have the resulting domain be x1.domain & x2.domain

When using a mask with TimeSeries.distribution(), mask.start() is called in `timeseries.py` but `start()` doesn't exist

I think this will be fixed with the next bump; looked for an issue related to this but didn't find one. Feel free to close this out if it was as simple as defining start() for TimeSeries.

Traceback (most recent call last):
  File "run_plots.py", line 25, in <module>
    make_plots()
  File "/Users/mjfm/projects/modustri/analysis/plots/see_cart_trips.py", line 55, in make_plots
    mask = front_ts,
  File "/Users/mjfm/Virtualenvs/modustri/lib/python2.7/site-packages/traces/timeseries.py", line 622, in distribution
    new_ts = self.slice(mask.start(), mask.end())
AttributeError: 'TimeSeries' object has no attribute 'start'

`logical_or` does not deal with default values correctly

Consider the following piece of code:

>>> a = traces.TimeSeries()
>>> a.default = 1
>>> a[10] = 0
>>> b = traces.TimeSeries()
>>> b.default = 0
>>> c = a.logical_or(b)
>>> a[1]
1
>>> b[1]
0
>>> c[1]
0
>>>

Document how to set a default after creating a TimeSeries

One can actually set/reset the default attribute of a TimeSeries at any time after instantiation. This is not very clear in the documentation.

Add `compact` option to `iterperiods()`

This would merge adjacent periods that have the same value and return them as only one period. Ideally this would be done efficiently, although I'm unclear what that means (store a compact version of the timeseries along with the non-compact one?)

ValueError: start can't be >= end (level_cum_sum >= level_cum_sum)

Hi,

i want to use traces to convert an unevenly spaced Time Series to an evenly spaced one (like the example in the docs). When doing this the following error is raised:
"ValueError: start can't be >= end (level_cum_sum >= level_cum_sum)"

My TimeSeries looks like below when i print it to the console 1. Thus level_cum_sum is ordered and i removed duplicates.

The line raising the error is:
"regular = time_series.moving_average(10, pandas = True)"

I assume there is an error on my side, not a bug. The error message does not help myself much (more or less new to Python, but experienced in other languages).

Any help would be much appreciated.

Best regards,
Jannis

SortedDict(None, 1000, {'level_cum_sum': lagHoursCorrectedCumSum
4.730833 10.0
8.776111 20.0
8.882778 30.0
10.854722 40.0
12.983611 50.0
13.745000 53.0
17.923889 63.0
20.740833 69.0
20.747500 70.0
24.512500 73.0
25.074167 74.0
30.734722 84.0
32.031944 94.0
32.270556 100.0
36.824722 110.0
37.970278 120.0
38.818889 122.0
40.390278 132.0
41.816944 142.0
42.810833 145.0
45.745278 155.0
46.010278 157.0
46.018889 158.0
48.935000 168.0
50.605000 178.0
51.175000 188.0
52.663889 198.0
52.685000 200.0
56.738611 208.0
61.617222 218.0
...
419.344444 1550.0
423.516667 1560.0
426.186389 1569.0
427.395000 1579.0
429.465556 1589.0
429.522222 1590.0
429.528333 1591.0
430.611111 1601.0
435.259722 1611.0
436.240278 1621.0
438.515278 1630.0
439.875833 1640.0
441.641111 1650.0
443.168056 1656.0
443.173611 1657.0
448.372222 1667.0
451.460833 1677.0
454.263333 1687.0
465.015000 1697.0
468.279722 1706.0
471.472222 1716.0
473.009444 1724.0
473.701389 1727.0
473.785833 1728.0
478.067500 1738.0
483.246389 1748.0
484.898333 1758.0
485.238056 1768.0
488.893611 1778.0
488.960278 1780.0
Name: level_cum_sum, dtype: float64})

Can't pickle TimeSeries objects

[UPDATE] This only seems to happen on python 2.7

Trying to pickle a TimeSeries object:

import traces
ofile = open('test.pkl', 'wb')
import pickle
ts = traces.TimeSeries()
ts[23]="blah"
ts[2]="foo"
pickle.dump(ts, ofile)

I get the following error:

In [9]: pickle.dump(ts, ofile)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-f1eed5bd8d83> in <module>()
----> 1 pickle.dump(ts, ofile)

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in dump(obj, file, protocol)
   1374
   1375 def dump(obj, file, protocol=None):
-> 1376     Pickler(file, protocol).dump(obj)
   1377
   1378 def dumps(obj, protocol=None):

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    329
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332
    333     def persistent_id(self, obj):

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    423
    424         if state is not None:
--> 425             save(state)
    426             write(BUILD)
    427

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656
    657     dispatch[DictionaryType] = save_dict

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    667             for k, v in items:
    668                 save(k)
--> 669                 save(v)
    670                 write(SETITEM)
    671             return

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656
    657     dispatch[DictionaryType] = save_dict

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    667             for k, v in items:
    668                 save(k)
--> 669                 save(v)
    670                 write(SETITEM)
    671             return

/Users/vlad/.pyenv/versions/2.7.13/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

/Users/vlad/.pyenv/versions/2.7.13/envs/prelude_monitor/lib/python2.7/copy_reg.pyc in _reduce_ex(self, proto)
     68     else:
     69         if base is self.__class__:
---> 70             raise TypeError, "can't pickle %s objects" % base.__name__
     71         state = base(self)
     72     args = (self.__class__, base, state)

TypeError: can't pickle instancemethod objects```

How does one set the default to None?

Looking in the code it seems like default=None is taken to mean that there is no default value for the time series. How would one go about to creating a TimeSeries where the default value is the None object?

Feature Request: linear interpolation for mean

So I recently discovered this nice library and decided to try it since I got unevenly spaced data,
however I found out today that the .mean() wasn't doing linear interpolation as I thought it would be:

>>> from traces import TimeSeries
>>> t = TimeSeries()
>>> t[0] = 0
>>> t[1] = 0
>>> t[3] = 20
>>> t.mean(0, 2)
0.0

With linear interpolation between 2 points we would find that t[2] = 10 and doing the average from 0 to 2 would give us 3.333 in this example.
A simple optional argument in mean() to choose the interpolation method would be fantastic, and I really think that it would be useful to many users who are not using traces exclusively for binary data (where linear interpolation would make no sense).
I know that we can re-sample the TimeSeries but I think a shortcut like this would be really neat since this library is designed with ease of use in mind.

Thanks for reading and have a nice day 👋

sample at some time vs sample on some interval (~ moving_average)

I would like to prepare a PR to improve the efficiency of moving_average with pandas=True. I have already the code but I am wondering how to call the function that does this.

More generally, when converting the traces to some regularly spaced pandas time-series, we can either:

on each specific datetime, give the value of the TimeSeries ==> the current TimeSeries.sample function
on each interval [datetime, datetime+sampling_period), give the average (or the max, min, median, ...) of the TimeSeries in a similar way as the groupyby().agg() functionality of pandas ==> this can't be called moving_average as a) it does not allow moving averages, b) it is not limited to average (as we have max, min, median, etc).

What name could be use for this "sample on an interval through an 'aggregation' function" ?
Would sample_interval makes sense ? would the sample be renamed to sample_instant ?

Update documentation on TimeSeries and Domain init

The documentation is not in line anymore with the classes.

TimeSeries does not accept anymore a domain argument in init
Domain cannot be called with something else than 1 argument (Domain(1,4) is not accepted)

merge with different domains

It could be useful to be able to merge time series with different domains. The resulting domain could either be the union or intersection. In case of the union, it would probably be necessary to specify a "fill value".

This would be useful if you happen to have time series with different domains (say, light bulb sensors but with different on times), and you still want to do a simple count of "number of lights on".

TimeSeries.merge([a, b], domain='union', fillvalue=0) 
TimeSeries.merge([a, b], domain='intersection')
TimeSeries.merge([a, b], domain=None) # throw error with different domains

improve documentation of TimeSeries constructor

explain that data is an iterator that initialise the time series as

for (time, v) in data:
  ts[time]=v

also clarify other arguments (even if understandable from other parts of the doc)

Checking for default value generates a KeyError if none is set

Consider the following:

ts = TimeSeries()
hasattr(ts, 'default')

it results in a KeyError. First, this is a bad behavior (bug) and should be fixed. Secondly, how should one properly check if a default is set or not? Maybe a has_default() method would be useful for this?

Feature Request: Bundle of TimeSeries

In #169 I outlined a use case for Domains. In writing up that usecase, I realized that I could also work with a "Bundle" TimeSeries. Namely, I often work with a number of boolean valued time series whose values are constant for large periods of time. However, combining the time series multiplies the number of points required to represent the time series (particularly if combined with a real valued signal).

My current solution has been to create a dictionary of TimeSeries objects, but this ran into problems when the timeseries are defined on different domains. It would be nice to have an object that "bundles" multiple TimeSeries objects but asserts that they all have the same domain.

make sorted dictionary a "private" attribute

to avoid temptations for ts.d

clean up the base module

Current implementation was a last-minute hack and there's a bunch of duplicate code and terrible names that need to be cleaned up.

wrong value when assigning twice the same interval

Assigning a value to an interval is not idempotent, the second operation will change the value taken after the interval:

import traces

tr = traces.TimeSeries({date(2000, 1, 1): 100,
                        date(2100, 1, 1): 100})

tr[date(2019, 2, 3):date(2019, 2, 7)] = 0
print(tr)

gives

<TimeSeries>
{datetime.date(2000, 1, 1): 100,
 datetime.date(2019, 2, 3): 0,
 datetime.date(2019, 2, 7): 100,
 datetime.date(2100, 1, 1): 100}
</TimeSeries>

but

import traces

tr = traces.TimeSeries({date(2000, 1, 1): 100,
                        date(2100, 1, 1): 100})

tr[date(2019, 2, 3):date(2019, 2, 7)] = 0
tr[date(2019, 2, 3):date(2019, 2, 7)] = 0
print(tr)

gives

<TimeSeries>
{datetime.date(2000, 1, 1): 100,
 datetime.date(2019, 2, 3): 0,
 datetime.date(2019, 2, 7): 0,
 datetime.date(2100, 1, 1): 100}
</TimeSeries>

(the value after the 2019/02/07 is now 0 instead of the original value.

I think line https://github.com/datascopeanalytics/traces/blob/master/traces/timeseries.py#L366 should be
if interval_t0 <= end:
instead of
if interval_t0 < end:

Using merge with unhashable type TimeSeries problematic

Try the following:

ts_a = traces.TimeSeries(default=traces.Histogram({0:1}))
ts_b = traces.TimeSeries(default=traces.Histogram({0:1}))
traces.TimeSeries.merge([ts_a, ts_b])

and it will result in

/traces/traces/timeseries.py in merge(cls, ts_list, compact, operation, default)
    681
    682         if default is None:
--> 683             unique_defaults = set(ts._default for ts in ts_list)
    684             default = unique_defaults.pop()
    685             if unique_defaults:

TypeError: unhashable type: 'Histogram'

Do the following instead, and it will work:

traces.TimeSeries.merge([ts_a, ts_b], default=traces.Histogram())

histogram should not return values with 0 mass for min/max

For example:

>>> Histogram(None, 1000, {0: 0.6666666666666666, 1: 0.3333333333333333, 2: 0}).max()
2

It should be 1.

align signature of TimeSeries.iterintervals to TimeSeries.iterperiods

It could be useful to have the same start, end parameters on iterintervals as iterperiods

Basic documentation for `bin` and `rebin` needed

A quick explanation of what these methods to and what their arguments are will help write tests for it and provide more in depth docs later.

Add method for getting the last/most recent item in a TimeSeries

Like negative indexing or .last()

SorterContainers >2.0 breaks backwards compatibility

This is a reminder that traces doesn't currently work with the current version of SortedContainers because of a backwards incompatibility when iterating over a SortedDict. I should look into this and remedy it.

Write good descriptions of what the from_many* methods do

Right, now, they're pretty impenetrable.

TimeSeries.merge fails when values are of different types

If you try and run the following:

from traces import TimeSeries
ts_a = TimeSeries(default=None)
ts_b = TimeSeries(default=None)
ts_a[0] = True
ts_b[0] = None
ts_merge = TimeSeries.merge([ts_a, ts_b])

you will get this error:

  File "/src/traces/traces/timeseries.py", line 687, in merge
    for t, merged in cls.iter_merge(ts_list):
  File "/src/traces/traces/timeseries.py", line 654, in iter_merge
    for index, (t, state) in enumerate(cls._iter_merge(timeseries_list)):
  File "/src/traces/traces/timeseries.py", line 616, in _iter_merge
    (t, next_value), index, iterator = queue.get()
  File "/usr/local/lib/python3.6/queue.py", line 174, in get
    item = self._get()
  File "/usr/local/lib/python3.6/queue.py", line 230, in _get
    return heappop(self.queue)
TypeError: '<' not supported between instances of 'bool' and 'NoneType'

merge fails when passed an empty iterable

Running the following line of code fails:

traces.TimeSeries().merge([], default=None)

with the error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/vlad/.virtualenvs/steelcase/src/traces/traces/timeseries.py", line 679, in merge
    default = unique_defaults.pop()
KeyError: 'pop from an empty set'

Ideally this would return an empty TimeSeries with the default set to None

Unary operations on TimeSeries

Given the current API (v0.3.1) there does not seem to be a straightforward way to do a unary operation.

Suppose one wishes to negate a boolean valued signal. One option is:

import traces

x = traces.TimeSeries([(0, False), (1, True)])
x.operation(x, lambda val, _: not val)

but this seems somewhat clunky.

Better might be

import traces

x = traces.TimeSeries([(0, True), (1, False)])
x.map(lambda val: not val)

Which could be syntatic sugar for:

traces.TimeSeries((t, not val) for (t, v) in x))

Trying to calculate the mean of an empty Histogram fails

Running .mean() on an empty Histogram object (Histogram(None, 1000, {0: 0.0})) fails with a divide by zero error:

  File "/src/traces/traces/histogram.py", line 30, in mean
    return weighted_sum / float(self.total())
ZeroDivisionError: float division by zero

Recommendation for visualization package/tool

Hey guys,
great project! Could you tell me which package/tool was used to generate the diagrams shown in README? Which other packages do you use in combination with traces? Would it be useful for others to mention a few of these in the README?

Best!