kwgoodman / la Goto Github PK
View Code? Open in Web Editor NEWMeet larry, the labeled numpy array
Home Page: http://pypi.python.org/pypi/la
License: Other
Meet larry, the labeled numpy array
Home Page: http://pypi.python.org/pypi/la
License: Other
demean, demedian, zscore choke on 1d input when axis=-1.
After upgrading from numpy 1.4.1 to 1.5.1 I get warnings like "Warning: invalid value encountered in subtract" when I run unit tests (or timeit) using "python -c 'blah'" but not from an interactive session.
See http://mail.scipy.org/pipermail/numpy-discussion/2010-November/054118.html
Related to issue #21 is a bug in morph. It chokes when the larry is empty but the label list is not empty:
>>> lar = la.larry([])
>>> lar.morph([1, 2], 0)
IndexError: index out of range for array
The example in the split() docstring (la/util/resample.py) never uses the split function.
Unlike most larry methods, there are no numpy array versions (in la.farray) of demean, demedian, and zscore. Instead the code is contained in the larry method.
Make la.farray versions of demean, demedian, and zscore and use them in the larry methods. Then I'll be able to use the code on numpy arrays.
Here's a first stab at it. The code below works for 2d larrys, ascending order only. Paste into the larry class in deflarry.py. Honestly it's pretty much a hack-job.
def sort(self, label, axis):
"""
Sorts larry on given label. Ascending order only.
Parameters
----------
label : name of the label on which to sort
axis : {int}
Returns
-------
out: larry
A copy sorted on given label
"""
field = label
data = self.x
col_id = self.label[axis].index(field)
if axis == 0:
new_order = data[col_id,:].argsort() # create index
sorted_data = data[:,new_order]
sorted_labels = list(np.array(self.label[1])[new_order,:]) # sort labels, must be list to create new larry
sorted_larry = larry(sorted_data, label=[self.label[axis], sorted_labels])
if axis == 1:
new_order = data[:,col_id].argsort()
sorted_data = data[new_order,:]
sorted_labels = list(np.array(self.label[0])[new_order,:])
sorted_larry = larry(sorted_data, label=[sorted_labels, self.label[axis]])
return sorted_larry
la.farray.lastrank chokes on empty array input:
>> a = np.array([])
>> la.farray.lastrank(a)
<snip>
NameError: global name 'nans' is not defined
Similar to la.rand() and la.randn() add another convenient function to create larrys for trying things out at the command line.
The new function, la.arange(), would return the same thing as:
>> la.larry(np.arange(3))
label_0
0
1
2
x
array([0, 1, 2])
Like la.rand() and randn() would optionally support creation by labels:
>> la.rand(label=[['row1', 'row2'], ['col1', 'col2']])
label_0
row1
row2
label_1
col1
col2
x
array([[ 0.90377983, 0.07377073],
[ 0.85016068, 0.25740637]])
Add a makefile:
Indexing into a lara using a list of integers produces an error because it tries to use the take
method.
Add a function la.unique(lar) that takes a nd larry, lar, as input and returns a 1d Numpy array as output that contains the unique values in lar. Implement using: np.unique(lar.x)
la.panel() gives the wrong output.
First make larry:
In [9]: x = np.ones((4,3)).cumsum(0) - 1
In [10]: x
Out[10]:
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
In [12]: lar = la.larry(x, [['r1', 'r2', 'r3', 'r4'], ['c1', 'c2', 'c3']])
In [13]: lar
Out[13]:
label_0
r1
r2
r3
r4
label_1
c1
c2
c3
x
array([[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]])
In [15]: lar = lar.insertaxis(0, "name")
In [16]: lar
Out[16]:
label_0
name
label_1
r1
r2
r3
r4
label_2
c1
c2
c3
x
array([[[ 0., 0., 0.],
[ 1., 1., 1.],
[ 2., 2., 2.],
[ 3., 3., 3.]]])
Then make panel:
In [17]: la.panel(lar)
Out[17]:
label_0
('r1', 'c1')
('r1', 'c2')
('r1', 'c3')
...
('r4', 'c1')
('r4', 'c2')
('r4', 'c3')
label_1
name
x
array([[ 0.],
[ 1.],
[ 2.],
[ 3.],
[ 0.],
[ 1.],
[ 2.],
[ 3.],
[ 0.],
[ 1.],
[ 2.],
[ 3.]])
First three element should be zero (since row 'r1' contains all zeros). Changing the function to use order="F" fixes the bug.
Change in bool must be dealt with to upgrade Cython. See: http://wiki.cython.org/ReleaseNotes-0.13
This works:
I[1] la.larry([]).sum(0)
O[1] nan
But this doesn't:
I[2] la.larry([], dtype=np.int).sum(0)
ValueError: cannot convert float NaN to integer
I think a good addition to la would be la.dot() and/or a dot method:
lar1.dot(lar2, join='inner', fill=np.nan)
where fill is used when the join method ('outer', 'left', 'right') creates new data.
A lag of zero should return a copy of the input. Instead it returns an empty data array:
>> y = la.larry([1, 2, 3])
>> y.lag(0)
label_0
0
1
2
x
array([], dtype=int32)
Current repr:
I[2] lar = la.rand(label=[['r1', '2'], ['c1', 'c2']])
I[3] lar
O[3]
label_0
r1
2
label_1
c1
c2
x
array([[ 0.82931648, 0.62778716],
[ 0.27239571, 0.5053495 ]])
Could instead do something like:
c1 c2
r1 0.82931648 0.62778716
r2 0.27239571 0.5053495
That's not easy to do robustly (long labels, large arrays, etc).
Might also want to give printing options like numpy does (number of decimals, scientific notation etc.)
The setup.py file does not copy the data directory during install
This works:
>> lar = la.larry([1,2,3])
>> lar[3:2]
label_0
x
array([], dtype=int32)
But this crashes:
>> lar = la.larry([[1,2,3],[1,2,3]])
>> lar[:,3:2]
ValueError: Exactly one label per dimension needed
The correct shape of the output can be seen by indexing directly into the numpy array:
>> lar.x[:,3:2]
array([], shape=(2, 0), dtype=int32)
It seems like the problem is that an empty label for the second axis was not created.
You might want to initialize an empty larry:
>> lar1 = la.larry([])
And then merge data into the new larry in a loop:
>> lar2 = la.larry([1, 2, 3])
>> lar1.merge(lar2)
But that gives an error:
IndexError: index out of range for array
One way to close an IO instance is "del io". But perhaps it would be nice to provide a close method since that is what the user will expect.
Negative lags would come in handy when you want to push data forward. Would need to add unit tests too.
larry.sortaxis(None) chops off singleton dimensions:
>>> y = la.larry([[1, 2]])
>>> y.shape
(1, 2)
>>> y.sortaxis().shape
(2,)
This is a feature request to allow using la.lix to assign a value.
In [14]: y = larry([1,2,3])
In [15]: y.lix[0]
Out[15]: 1
In [16]: y.lix[0] += 1
...
TypeError: 'Getitemlabel' object does not support item assignment
In [17]:
Wrong:
I[1] lar = la.larry([True, False])
I[2] -lar
O[2]
label_0
0
1
x
array([ True, False], dtype=bool)
Right:
I[3] -lar.x
O[3] array([False, True], dtype=bool)
I have come across a situation where I would like to roll an axis on a larry, much like how one could do so with a regular numpy array. Could that be possible?
Let's say you want to align two 2d larrys. But you only want to align axis 1, say with an inner join. You want to leave axis 0 unchanged in both larrys. Adding a None join method would make that easy:
>>> la.align(lar1, lar2, join=[None, 'inner'])
Would also need to add it (docstring only) to la.binaryop(). la.add(), la.subtract(), la.multiply(), la.divide().
Numpy does this (note that the second example returns a bool not a shape (3,) array):
[29] a = np.array(['w', 'h', 'y'])
I[30] a == 'y'
O[30] array([False, False, True], dtype=bool)
I[31] a == 1
O[31] False
larry does this:
I[37] a = la.larry(['w', 'h', 'y'])
I[38] a == 'y' # Good
O[38]
label_0
0
1
2
x
array([False, False, True], dtype=bool)
I[39] a == 1 # Bug
O[39]
label_0
0
1
2
x
False
Old:
I[1] lar = la.rand(1000, 1000)
I[2] timeit 1.0 / lar
100 loops, best of 3: 10 ms per loop
I[3] lar = la.rand(10, 10)
I[4] timeit 1.0 / lar
100000 loops, best of 3: 6.25 us per loop
New:
I[1] lar = la.rand(1000, 1000)
I[2] timeit 1.0 / lar
100 loops, best of 3: 5.04 ms per loop
I[3] lar = la.rand(10, 10)
I[4] timeit 1.0 / lar
100000 loops, best of 3: 5.42 us per loop
Old:
if np.isscalar(other) or isinstance(other, np.ndarray):
y = self.copy()
y.x = other / y.x
return y
New:
if np.isscalar(other) or isinstance(other, np.ndarray):
label = self.copylabel()
x = other / self.x
return larry(x, label, validate=False)
After successfully building and installing bottleneck 0.5.0rc1, I redid the build and install for the current larry on my Fedora 13 32-bit system. The process appears successful, but the test reported several errors.
import la
la.info()
la 0.6.0dev
la file /home/bvr/.local/lib/python2.6/site-packages/la/init.pyc
NumPy 2.0.0.dev-a1e7be3
Bottleneck 0.5.0rc1
HDF5 archiving Not available
listmap Faster C version
listmap_fill Faster C versionla.test()
Running unit tests for la
NumPy version 2.0.0.dev-a1e7be3
NumPy is installed in /home/bvr/Programs/numpy/numpy
Python version 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)]
nose version 0.11.3
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:117: RuntimeWarning: invalid value encountered in double_scalars
x1 = x1 - x1.mean()
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:118: RuntimeWarning: invalid value encountered in double_scalars
x2 = x2 - x2.mean()
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:134: RuntimeWarning: invalid value encountered in double_scalars
return num / den
./home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:93: RuntimeWarning: invalid value encountered in double_scalars
x1 = x1 - x1.mean()
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:94: RuntimeWarning: invalid value encountered in double_scalars
x2 = x2 - x2.mean()
............................../home/bvr/Programs/numpy/numpy/lib/utils.py:139: DeprecationWarning:movingsum
is deprecated, usemove_nansum
instead!
warnings.warn(depdoc, DeprecationWarning)
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./home/bvr/.local/lib/python2.6/site-packages/la/util/scipy.py:45: RuntimeWarning: invalid value encountered in double_scalars
return np.mean(x,axis)/factor
....../home/bvr/.local/lib/python2.6/site-packages/la/util/scipy.py:77: RuntimeWarning: invalid value encountered in double_scalars
m1 = np.sum(x,axis)/n
.................../home/bvr/Programs/numpy/numpy/lib/utils.py:139: DeprecationWarning:movingrank
is deprecated, usemove_ranking
instead!
warnings.warn(depdoc, DeprecationWarning)
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:134: RuntimeWarning: invalid value encountered in divide
return num / den
....../home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:25: RuntimeWarning: divide by zero encountered in divide
g = 1.0 / m
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:26: RuntimeWarning: invalid value encountered in multiply
x = np.multiply(g, x)
........../home/bvr/.local/lib/python2.6/site-packages/la/farray/normalize.py:174: RuntimeWarning: invalid value encountered in divide
idx /= (countnotnan - 1)
.../home/bvr/.local/lib/python2.6/site-packages/la/farray/normalize.py:117: RuntimeWarning: divide by zero encountered in divide
r = r / (n - 1.0)
/home/bvr/.local/lib/python2.6/site-packages/la/farray/normalize.py:117: RuntimeWarning: invalid value encountered in divide
r = r / (n - 1.0)
.../home/bvr/.local/lib/python2.6/site-packages/la/farray/move.py:366: RuntimeWarning: invalid value encountered in divide
ms = 1.0 * window * msx / msm............EE.........E....E.EEEEEEEE..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................E...........................................................................................................................................................................................................
ERROR: afunc.ranking #1
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 182, in test_ranking_1
actual = ranking(x, axis=0, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 269, in test_ranking_10
actual = ranking(x, axis=1, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 193, in test_ranking_2
actual = ranking(x, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 415, in test_ranking_24
actual = ranking(x, axis=0, ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 429, in test_ranking_26
actual = ranking(x, axis=0, ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 206, in test_ranking_3
actual = ranking(x, axis=1, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 213, in test_ranking_4
actual = ranking(x, axis=0, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 220, in test_ranking_5
actual = ranking(x, axis=1, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 231, in test_ranking_6
actual = ranking(x, axis=0, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 242, in test_ranking_7
actual = ranking(x, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 255, in test_ranking_8
actual = ranking(x, axis=1, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 262, in test_ranking_9
actual = ranking(x, axis=0, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/nose/loader.py", line 382, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.6/site-packages/nose/importer.py", line 39, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.6/site-packages/nose/importer.py", line 86, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/io_test.py", line 13, in
from la import (IO, archive_directory)
ImportError: cannot import name IO
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/nose/case.py", line 186, in runTest
self.test(*self.arg)
File "/home/bvr/Programs/numpy/numpy/testing/utils.py", line 34, in assert_
raise AssertionError(msg)
AssertionError: Type of 'actual' and 'desired' do not match for 'nx'
Ran 3058 tests in 12.074s
FAILED (errors=13, failures=1)
<nose.result.TextTestResult run=3058 errors=13 failures=1>
Add a function that returns True if two larrys are aligned; False otherwise.
Signature:
>>> la.isaligned(lar1, lar2, axis=None)
If inputs are 2d, for example, then axis=0 will return True if the labels along the first axis are aligned; False otherwise. By default (axis=None) all axes are checked.
As reported by Thomas Coffee on the labeled-array mailing list there is an indexing bug when dtype is object:
y = la.larry([None])
y[0]
ValueError: 0d larrys are not supported
Replacing the last if statement in getitem with the following seems to fix it:
if not isinstance(x, np.ndarray):
return x
It also makes getitem a tiny bit faster.
Support python 2.5 by importing with
statement.
Current larry.astype:
y = self.copy()
y.x = y.x.astype(dtype)
return y
Proposed:
label = self.copylabel()
x = self.x.astype(dtype)
return larry(x, label, validate=False)
Current:
I[1] lar = la.rand(1000, 1000)
I[2] lar.dtype
O[2] dtype('float64')
I[3] timeit lar.astype(np.float32)
1000 loops, best of 3: 1.57 ms per loop
Proposed:
I[1] lar = la.rand(1000, 1000)
I[2] timeit lar.astype(np.float32)
1000 loops, best of 3: 766 us per loop
Due to a typo, la.info() will crash if h5py cannot be imported.
if ndim > 0 then I don't think the following line is needed in morph_like:
y = self.copy()
At the moment axis defaults to 0, and None is not supported. But the bottleneck function used by ranking supports axis=None, so it should be fairly doable.
Make larry.push() faster by avoiding a copy.
Old:
I[1] lar = la.rand(1000, 1000)
I[2] lar[lar > 0.8] = np.nan
I[3] timeit lar.push(100)
10 loops, best of 3: 82 ms per loop
New:
I[1] lar = la.rand(1000, 1000)
I[2] lar[lar > 0.8] = np.nan
I[3] timeit lar.push(100)
10 loops, best of 3: 77 ms per loop
Old:
y = self.copy()
y.x = push(y.x, window, axis=axis)
return y
New:
label = self.copylabel()
x = push(self.x, window, axis=axis)
return larry(x, label, validate=False)
When int input is less than system default then output of larry.cumsum() and larry.cumprod() should be system default int.
For example, on my 64 bit system the default numpy int is np.int64. So if I take the cumsum of a np.int32 larry then the output should be a np.int64 larry. The current behavior is to return a np.int32 larry.
Numpy has an axis=None default for cumsum. I'm not sure whether that would be a good idea for larry in general, because numpy flattens out the array when axis is None. But it might be handy to at least allow it for 1d larrys and raise an error if there is no axis and the larry is multi-dimensional. That would make the interfaces a bit more compatible.
larry.quantile and la.farray.quantile choke on axis=None.
When saving larry to a hdf5 archive, dates (datetime.date) are converted to int dates and then converted back to datetime.date when loading (because h5py/hdf5 doesn't handle object arrays).
Doe the same for datetime.datetime.
Create a larry:
>> y = la.larry([10, 11, 12])
keep_label works:
>> y.keep_label('in', [0,2], axis=0)
label_0
0
2
x
array([10, 12])
But keep_x doesn't when op is 'in' or 'not in':
>> y.keep_x('in', [0,2], vacuum=False)
------------------------------------------------------------
File "<string>", line 1
y.x invalue
^
SyntaxError: unexpected EOF while parsing
Fix is to change:
idx = eval('y.x ' + op + 'value')
to:
idx = eval('y.x ' + op + ' value')
And to write unit tests for it when op is 'in' and 'not in'
Add the support in the same way as the support was added for datetime.datetime. See the commit that closed issue #25.
la.align() aligns two larrys using the given join method ('inner', 'outer', 'left', etc). But there is often a need to align more than two larrys. A function like
>>> lar1, lar2, lar3, ... = la.alignmany(lar1, lar2, lar3, ..., join='inner', axis=0)
would be useful, where the default axis is None (align all axes) and the default join method is inner.
Typo in docstring:
cross_validation docstring refers to old name of function: cv
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.