Giter Site home page Giter Site logo

la's People

Contributors

cgohlke avatar chrisspalm avatar josef-pkt avatar khaeru avatar kwgoodman avatar weathergod avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

la's Issues

Make numpy array versions of demean, demedian, and zscore

Unlike most larry methods, there are no numpy array versions (in la.farray) of demean, demedian, and zscore. Instead the code is contained in the larry method.

Make la.farray versions of demean, demedian, and zscore and use them in the larry methods. Then I'll be able to use the code on numpy arrays.

sort larry by column/row function

Here's a first stab at it. The code below works for 2d larrys, ascending order only. Paste into the larry class in deflarry.py. Honestly it's pretty much a hack-job.

    def sort(self, label, axis):
           """
           Sorts larry on given label. Ascending order only.

           Parameters
           ----------
           label : name of the label on which to sort
           axis : {int}


           Returns
           -------

           out: larry
               A copy sorted on given label

           """
           field = label
           data = self.x
           col_id = self.label[axis].index(field)

           if axis == 0:
               new_order = data[col_id,:].argsort() # create index
               sorted_data = data[:,new_order]     
               sorted_labels = list(np.array(self.label[1])[new_order,:])  # sort labels, must be list to create new larry
               sorted_larry = larry(sorted_data, label=[self.label[axis], sorted_labels])

           if axis == 1:
               new_order = data[:,col_id].argsort() 
               sorted_data = data[new_order,:]
               sorted_labels = list(np.array(self.label[0])[new_order,:])
               sorted_larry = larry(sorted_data, label=[sorted_labels, self.label[axis]])

           return sorted_larry

Add la.arange()

Similar to la.rand() and la.randn() add another convenient function to create larrys for trying things out at the command line.

The new function, la.arange(), would return the same thing as:

>> la.larry(np.arange(3))      
label_0
    0
    1
    2
x
array([0, 1, 2]) 

Like la.rand() and randn() would optionally support creation by labels:

>> la.rand(label=[['row1', 'row2'], ['col1', 'col2']])
label_0
    row1
    row2
label_1
    col1
    col2
x
array([[ 0.90377983,  0.07377073],
       [ 0.85016068,  0.25740637]])

Add makefile

Add a makefile:

  • test: run unit tests-
  • sdist: make sdist (and remove the damn MANIFEST)
  • build: compile c file

lara indexing bug

Indexing into a lara using a list of integers produces an error because it tries to use the take method.

Bug in la.panel()

la.panel() gives the wrong output.

First make larry:

In [9]: x = np.ones((4,3)).cumsum(0) - 1

In [10]: x
Out[10]: 
array([[ 0.,  0.,  0.],
       [ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])

In [12]: lar = la.larry(x, [['r1', 'r2', 'r3', 'r4'], ['c1', 'c2', 'c3']])

In [13]: lar
Out[13]: 
label_0
    r1
    r2
    r3
    r4
label_1
    c1
    c2
    c3
x
array([[ 0.,  0.,  0.],
       [ 1.,  1.,  1.],
       [ 2.,  2.,  2.],
       [ 3.,  3.,  3.]])

In [15]: lar = lar.insertaxis(0, "name")

In [16]: lar
Out[16]: 
label_0
    name
label_1
    r1
    r2
    r3
    r4
label_2
    c1
    c2
    c3
x
array([[[ 0.,  0.,  0.],
        [ 1.,  1.,  1.],
        [ 2.,  2.,  2.],
        [ 3.,  3.,  3.]]])

Then make panel:

In [17]: la.panel(lar)
Out[17]: 
label_0
    ('r1', 'c1')
    ('r1', 'c2')
    ('r1', 'c3')
    ...
    ('r4', 'c1')
    ('r4', 'c2')
    ('r4', 'c3')
label_1
    name
x
array([[ 0.],
       [ 1.],
       [ 2.],
       [ 3.],
       [ 0.],
       [ 1.],
       [ 2.],
       [ 3.],
       [ 0.],
       [ 1.],
       [ 2.],
       [ 3.]])

First three element should be zero (since row 'r1' contains all zeros). Changing the function to use order="F" fixes the bug.

Add dot method

I think a good addition to la would be la.dot() and/or a dot method:

lar1.dot(lar2, join='inner', fill=np.nan)

where fill is used when the join method ('outer', 'left', 'right') creates new data.

larry.lag() gives wrong output when lag=0

A lag of zero should return a copy of the input. Instead it returns an empty data array:

>> y = la.larry([1, 2, 3])
>> y.lag(0) 
label_0
    0
    1
    2
x
array([], dtype=int32)

pretty __repr__

Current repr:

I[2] lar = la.rand(label=[['r1', '2'], ['c1', 'c2']])
I[3] lar
O[3] 
label_0
    r1
    2
label_1
    c1
    c2
x
array([[ 0.82931648,  0.62778716],
       [ 0.27239571,  0.5053495 ]])

Could instead do something like:

      c1          c2
r1    0.82931648  0.62778716
r2    0.27239571  0.5053495

That's not easy to do robustly (long labels, large arrays, etc).

Might also want to give printing options like numpy does (number of decimals, scientific notation etc.)

setup.py bug

The setup.py file does not copy the data directory during install

Indexing chokes on lar[:,3:2]

This works:

>> lar = la.larry([1,2,3]) 
>> lar[3:2]  
label_0
x
array([], dtype=int32)

But this crashes:

>> lar = la.larry([[1,2,3],[1,2,3]])
>> lar[:,3:2]
ValueError: Exactly one label per dimension needed

The correct shape of the output can be seen by indexing directly into the numpy array:

>> lar.x[:,3:2]
array([], shape=(2, 0), dtype=int32)

It seems like the problem is that an empty label for the second axis was not created.

Merging two larrys chokes when one is empty

You might want to initialize an empty larry:

>> lar1 = la.larry([])

And then merge data into the new larry in a loop:

>> lar2 = la.larry([1, 2, 3])
>> lar1.merge(lar2)

But that gives an error:

IndexError: index out of range for array

Add close method to IO class?

One way to close an IO instance is "del io". But perhaps it would be nice to provide a close method since that is what the user will expect.

setitem for lix

This is a feature request to allow using la.lix to assign a value.

In [14]: y = larry([1,2,3])

In [15]: y.lix[0]
Out[15]: 1

In [16]: y.lix[0] += 1

...

TypeError: 'Getitemlabel' object does not support item assignment

In [17]:

Add rollaxis method to larry?

I have come across a situation where I would like to roll an axis on a larry, much like how one could do so with a regular numpy array. Could that be possible?

Add a None join method to la.align()

Let's say you want to align two 2d larrys. But you only want to align axis 1, say with an inner join. You want to leave axis 0 unchanged in both larrys. Adding a None join method would make that easy:

>>> la.align(lar1, lar2, join=[None, 'inner'])

Would also need to add it (docstring only) to la.binaryop(). la.add(), la.subtract(), la.multiply(), la.divide().

larry([1, 2]) == 'a' does not return a bool like numpy does

Numpy does this (note that the second example returns a bool not a shape (3,) array):

[29] a = np.array(['w', 'h', 'y'])
I[30] a == 'y'
O[30] array([False, False,  True], dtype=bool)
I[31] a == 1
O[31] False

larry does this:

I[37] a = la.larry(['w', 'h', 'y'])
I[38] a == 'y' # Good
O[38] 
label_0
    0
    1
    2
x
array([False, False,  True], dtype=bool)
I[39] a == 1  # Bug
O[39] 
label_0
    0
    1
    2
x
False

Avoid copy to speed up larry.__rdiv__()

Old:

I[1] lar = la.rand(1000, 1000)
I[2] timeit 1.0 / lar
100 loops, best of 3: 10 ms per loop
I[3] lar = la.rand(10, 10)
I[4] timeit 1.0 / lar
100000 loops, best of 3: 6.25 us per loop

New:

I[1] lar = la.rand(1000, 1000)
I[2] timeit 1.0 / lar
100 loops, best of 3: 5.04 ms per loop
I[3] lar = la.rand(10, 10)
I[4] timeit 1.0 / lar
100000 loops, best of 3: 5.42 us per loop

Old:

    if np.isscalar(other) or isinstance(other, np.ndarray):
        y = self.copy()
        y.x = other / y.x
        return y 

New:

    if np.isscalar(other) or isinstance(other, np.ndarray):
        label = self.copylabel()
        x = other / self.x
        return larry(x, label, validate=False)

test failures when using NumPy master

After successfully building and installing bottleneck 0.5.0rc1, I redid the build and install for the current larry on my Fedora 13 32-bit system. The process appears successful, but the test reported several errors.

import la
la.info()
la 0.6.0dev
la file /home/bvr/.local/lib/python2.6/site-packages/la/init.pyc
NumPy 2.0.0.dev-a1e7be3
Bottleneck 0.5.0rc1
HDF5 archiving Not available
listmap Faster C version
listmap_fill Faster C version

la.test()
Running unit tests for la
NumPy version 2.0.0.dev-a1e7be3
NumPy is installed in /home/bvr/Programs/numpy/numpy
Python version 2.6.4 (r264:75706, Jun 4 2010, 18:20:16) [GCC 4.4.4 20100503 (Red Hat 4.4.4-2)]
nose version 0.11.3
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:117: RuntimeWarning: invalid value encountered in double_scalars
x1 = x1 - x1.mean()
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:118: RuntimeWarning: invalid value encountered in double_scalars
x2 = x2 - x2.mean()
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:134: RuntimeWarning: invalid value encountered in double_scalars
return num / den
./home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:93: RuntimeWarning: invalid value encountered in double_scalars
x1 = x1 - x1.mean()
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:94: RuntimeWarning: invalid value encountered in double_scalars
x2 = x2 - x2.mean()
............................../home/bvr/Programs/numpy/numpy/lib/utils.py:139: DeprecationWarning: movingsum is deprecated, use move_nansum instead!
warnings.warn(depdoc, DeprecationWarning)
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................./home/bvr/.local/lib/python2.6/site-packages/la/util/scipy.py:45: RuntimeWarning: invalid value encountered in double_scalars
return np.mean(x,axis)/factor
....../home/bvr/.local/lib/python2.6/site-packages/la/util/scipy.py:77: RuntimeWarning: invalid value encountered in double_scalars
m1 = np.sum(x,axis)/n
.................../home/bvr/Programs/numpy/numpy/lib/utils.py:139: DeprecationWarning: movingrank is deprecated, use move_ranking instead!
warnings.warn(depdoc, DeprecationWarning)
................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................F................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................../home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:134: RuntimeWarning: invalid value encountered in divide
return num / den
....../home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:25: RuntimeWarning: divide by zero encountered in divide
g = 1.0 / m
/home/bvr/.local/lib/python2.6/site-packages/la/farray/misc.py:26: RuntimeWarning: invalid value encountered in multiply
x = np.multiply(g, x)
........../home/bvr/.local/lib/python2.6/site-packages/la/farray/normalize.py:174: RuntimeWarning: invalid value encountered in divide
idx /= (countnotnan - 1)
.../home/bvr/.local/lib/python2.6/site-packages/la/farray/normalize.py:117: RuntimeWarning: divide by zero encountered in divide
r = r / (n - 1.0)
/home/bvr/.local/lib/python2.6/site-packages/la/farray/normalize.py:117: RuntimeWarning: invalid value encountered in divide
r = r / (n - 1.0)
.../home/bvr/.local/lib/python2.6/site-packages/la/farray/move.py:366: RuntimeWarning: invalid value encountered in divide
ms = 1.0 * window * msx / msm

............EE.........E....E.EEEEEEEE..............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................E...........................................................................................................................................................................................................

ERROR: afunc.ranking #1

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 182, in test_ranking_1
actual = ranking(x, axis=0, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #10

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 269, in test_ranking_10
actual = ranking(x, axis=1, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #2

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 193, in test_ranking_2
actual = ranking(x, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking_24

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 415, in test_ranking_24
actual = ranking(x, axis=0, ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking_26

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 429, in test_ranking_26
actual = ranking(x, axis=0, ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #3

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 206, in test_ranking_3
actual = ranking(x, axis=1, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #4

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 213, in test_ranking_4
actual = ranking(x, axis=0, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #5

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 220, in test_ranking_5
actual = ranking(x, axis=1, norm='0,N-1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #6

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 231, in test_ranking_6
actual = ranking(x, axis=0, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #7

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 242, in test_ranking_7
actual = ranking(x, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #8

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 255, in test_ranking_8
actual = ranking(x, axis=1, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: afunc.ranking #9

Traceback (most recent call last):
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/farray_test.py", line 262, in test_ranking_9
actual = ranking(x, axis=0, norm='-1,1', ties=False)
TypeError: ranking() got an unexpected keyword argument 'ties'

ERROR: Failure: ImportError (cannot import name IO)

Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/nose/loader.py", line 382, in loadTestsFromName
addr.filename, addr.module)
File "/usr/lib/python2.6/site-packages/nose/importer.py", line 39, in importFromPath
return self.importFromDir(dir_path, fqname)
File "/usr/lib/python2.6/site-packages/nose/importer.py", line 86, in importFromDir
mod = load_module(part_fqname, fh, filename, desc)
File "/home/bvr/.local/lib/python2.6/site-packages/la/tests/io_test.py", line 13, in
from la import (IO, archive_directory)
ImportError: cannot import name IO

FAIL: Test larry methods for proper handling of empty larrys

Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/nose/case.py", line 186, in runTest
self.test(*self.arg)
File "/home/bvr/Programs/numpy/numpy/testing/utils.py", line 34, in assert_
raise AssertionError(msg)
AssertionError: Type of 'actual' and 'desired' do not match for 'nx'


Ran 3058 tests in 12.074s

FAILED (errors=13, failures=1)
<nose.result.TextTestResult run=3058 errors=13 failures=1>

Add la.isaligned() function

Add a function that returns True if two larrys are aligned; False otherwise.

Signature:

>>> la.isaligned(lar1, lar2, axis=None)

If inputs are 2d, for example, then axis=0 will return True if the labels along the first axis are aligned; False otherwise. By default (axis=None) all axes are checked.

Bug indexing single element of larry with object dtype

As reported by Thomas Coffee on the labeled-array mailing list there is an indexing bug when dtype is object:

y = la.larry([None])
y[0]
ValueError: 0d larrys are not supported

Replacing the last if statement in getitem with the following seems to fix it:

if not isinstance(x, np.ndarray):
    return x

It also makes getitem a tiny bit faster.

Avoid extra copy in larry.astype

Current larry.astype:

    y = self.copy()
    y.x = y.x.astype(dtype)    
    return y

Proposed:

    label = self.copylabel()
    x = self.x.astype(dtype)    
    return larry(x, label, validate=False)

Current:

I[1] lar = la.rand(1000, 1000)
I[2] lar.dtype
O[2] dtype('float64')
I[3] timeit lar.astype(np.float32)
1000 loops, best of 3: 1.57 ms per loop

Proposed:

I[1] lar = la.rand(1000, 1000)
I[2] timeit lar.astype(np.float32)
1000 loops, best of 3: 766 us per loop

axis=None support for ranking()?

At the moment axis defaults to 0, and None is not supported. But the bottleneck function used by ranking supports axis=None, so it should be fairly doable.

Make larry.push() faster

Make larry.push() faster by avoiding a copy.

Old:

I[1] lar = la.rand(1000, 1000)
I[2] lar[lar > 0.8] = np.nan
I[3] timeit lar.push(100)
10 loops, best of 3: 82 ms per loop

New:

I[1] lar = la.rand(1000, 1000)
I[2] lar[lar > 0.8] = np.nan
I[3] timeit lar.push(100)
10 loops, best of 3: 77 ms per loop

Old:

    y = self.copy()
    y.x = push(y.x, window, axis=axis)
    return y

New:

    label = self.copylabel()
    x = push(self.x, window, axis=axis)
    return larry(x, label, validate=False)

BUG larry.cumsum and cumprod give int16 ouput with int16 input

When int input is less than system default then output of larry.cumsum() and larry.cumprod() should be system default int.

For example, on my 64 bit system the default numpy int is np.int64. So if I take the cumsum of a np.int32 larry then the output should be a np.int64 larry. The current behavior is to return a np.int32 larry.

la cumsum has no default for axis, numpy does.

Numpy has an axis=None default for cumsum. I'm not sure whether that would be a good idea for larry in general, because numpy flattens out the array when axis is None. But it might be handy to at least allow it for 1d larrys and raise an error if there is no axis and the larry is multi-dimensional. That would make the interfaces a bit more compatible.

Bug in larry.keep_x when operation is 'in' or 'not in'

Create a larry:

>> y = la.larry([10, 11, 12])

keep_label works:

>> y.keep_label('in', [0,2], axis=0)
label_0
    0
    2
x
array([10, 12])

But keep_x doesn't when op is 'in' or 'not in':

>> y.keep_x('in', [0,2], vacuum=False)
------------------------------------------------------------
   File "<string>", line 1
     y.x invalue
               ^
SyntaxError: unexpected EOF while parsing

Fix is to change:

    idx = eval('y.x ' + op + 'value')

to:

    idx = eval('y.x ' + op + ' value')

And to write unit tests for it when op is 'in' and 'not in'

Add la.alignmany() function

la.align() aligns two larrys using the given join method ('inner', 'outer', 'left', etc). But there is often a need to align more than two larrys. A function like

>>> lar1, lar2, lar3, ... = la.alignmany(lar1, lar2, lar3, ..., join='inner', axis=0)

would be useful, where the default axis is None (align all axes) and the default join method is inner.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.