blaze / blaze Goto Github PK
View Code? Open in Web Editor NEWNumPy and Pandas interface to Big Data
Home Page: blaze.pydata.org
License: BSD 3-Clause "New" or "Revised" License
NumPy and Pandas interface to Big Data
Home Page: blaze.pydata.org
License: BSD 3-Clause "New" or "Revised" License
Following the "Custom DShapes" example at the bottom of the Quickstart:
from blaze import Table, derived
from blaze import RecordDecl as Record
from blaze import int32
class Custom(Record):
max = int32
min = int32
@derived
def mid(self):
return (self.min + self.max)/2
I get the output seen in this gist: https://gist.github.com/ce09394e928890825263
In [47]: alst = [1, 2, 3]
In [48]: array(alst.__iter__())
Out[48]:
array([ 1., 2., 3.],
dshape='3, float64')
In [50]: array(alst)
Out[50]:
array([1, 2, 3],
dshape='3, int32')
zsh» make docs
cd docs; make html
make[1]: Entering directory `/home/esc/git-working/blaze/docs'
sphinx-build -b html -d build/doctrees source build/html
Making output directory...
Running Sphinx v1.1.3
pdfTeX 3.1415926-1.40.10-2.2 (TeX Live 2009/Debian)
kpathsea version 5.0.0
Copyright 2009 Peter Breitenlohner (eTeX)/Han The Thanh (pdfTeX).
There is NO warranty. Redistribution of this software is
covered by the terms of both the pdfTeX copyright and
the Lesser GNU General Public License.
For more information about these matters, see the file
named COPYING and the pdfTeX source.
Primary author of pdfTeX: Peter Breitenlohner (eTeX)/Han The Thanh (pdfTeX).
Compiled with libpng 1.2.44; using libpng 1.2.44
Compiled with zlib 1.2.3.4; using zlib 1.2.3.4
Compiled with poppler version 0.12.4
Exception occurred:
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/pgen2/pgen.py", line 15, in __init__
stream = open(filename)
IOError: [Errno 2] No such file or directory: '/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/Grammar.txt'
The full traceback has been saved in /tmp/sphinx-err-ocrnkt.log, if you want to report the issue to the developers.
Please also report this if it was a user error, so that a better error message can be provided next time.
Either send bugs to the mailing list at <http://groups.google.com/group/sphinx-dev/>,
or report them in the tracker at <http://bitbucket.org/birkenfeld/sphinx/issues/>. Thanks!
make[1]: *** [html] Error 1
make[1]: Leaving directory `/home/esc/git-working/blaze/docs'
make: *** [docs] Error 2
The full traceback is
# Sphinx version: 1.1.3
# Python version: 2.7.3
# Docutils version: 0.9.1 release
# Jinja2 version: 2.6
Traceback (most recent call last):
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/cmdline.py", line 188, in main
warningiserror, tags)
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/application.py", line 114, in __init__
self.setup_extension(extension)
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/application.py", line 247, in setup_extension
mod = __import__(extension, None, None, ['setup'])
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/ext/autodoc.py", line 26, in <module>
from sphinx.pycode import ModuleAnalyzer, PycodeError
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/__init__.py", line 25, in <module>
pygrammar = driver.load_grammar(_grammarfile)
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/pgen2/driver.py", line 126, in load_grammar
g = pgen.generate_grammar(gt)
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/pgen2/pgen.py", line 383, in generate_grammar
p = ParserGenerator(filename)
File "/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/pgen2/pgen.py", line 15, in __init__
stream = open(filename)
IOError: [Errno 2] No such file or directory: '/home/esc/anaconda/lib/python2.7/site-packages/sphinx/pycode/Grammar.txt'
if you run the array_creation.py script in samples, you will write data to the specified blz location
however the repr of that storage says mode='r', even though it clearly isn't, because we just wrote some data to it
In July, i installed successfully, recently the updated version couldn't
drill@goldMINER:~/blaze$ sudo python setup.py install
* Found Cython 0.19.1 package installed.
* Found numpy 1.6.1 package installed.
running install
running build
running build_py
running build_ext
skipping 'blaze/blz/blz_ext.c' Cython extension (up-to-date)
Rebuilding the datashape parser...
Traceback (most recent call last):
File "setup.py", line 301, in <module>
'build' : make_build(build),
File "/usr/lib/python2.7/distutils/core.py", line 152, in setup
dist.run_commands()
File "/usr/lib/python2.7/distutils/dist.py", line 953, in run_commands
self.run_command(cmd)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "/usr/lib/python2.7/distutils/command/install.py", line 601, in run
self.run_command('build')
File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "setup.py", line 256, in run
build_command.run(self)
File "/usr/lib/python2.7/distutils/command/build.py", line 128, in run
self.run_command(cmd_name)
File "/usr/lib/python2.7/distutils/cmd.py", line 326, in run_command
self.distribution.run_command(command)
File "/usr/lib/python2.7/distutils/dist.py", line 972, in run_command
cmd_obj.run()
File "setup.py", line 260, in run
rebuild_parse_tables()
File "setup.py", line 265, in rebuild_parse_tables
from blaze.datashape.parser import rebuild
File "build/lib.linux-x86_64-2.7/blaze/__init__.py", line 13, in <module>
from .array import Array
File "build/lib.linux-x86_64-2.7/blaze/array.py", line 14, in <module>
from blaze.ops import ufuncs
File "build/lib.linux-x86_64-2.7/blaze/ops/ufuncs.py", line 43, in <module>
@elementwise('A -> A -> bool')
File "build/lib.linux-x86_64-2.7/blaze/function.py", line 26, in decorator
return overload(signature, elementwise=True)(f)
File "build/lib.linux-x86_64-2.7/blaze/overloading.py", line 69, in decorator
signature = dshape(signature)
File "build/lib.linux-x86_64-2.7/blaze/datashape/util.py", line 67, in dshape
ds = _dshape(o, multi)
File "build/lib.linux-x86_64-2.7/blaze/datashape/util.py", line 75, in _dshape
return parser.parse(o)
File "build/lib.linux-x86_64-2.7/blaze/datashape/parser.py", line 481, in parse
ds = _parse(pattern)
File "build/lib.linux-x86_64-2.7/blaze/datashape/parser.py", line 463, in _parse
raise RuntimeError("Parse tables not built, run install script.")
RuntimeError: Parse tables not built, run install script.
The compute context which worked with the server on top of dynd hasn't been ported to the server in blaze. We need to discuss and figure out how we want it to work, based on the compute mechanisms built in blaze.
One result is that datashapes still need commas on input, but print semicolons on output.
The problem seems to be from blaze/datashape/init.py, where it says
from parse import parse
I've tried changing it to
import parser
and have it call parser.parse instead, and this causes the dshape with the semicolon below to work, but causes many errors in the test suite.
In [1]: import blaze
In [2]: blaze.dshape('{x:int32,y:int32}')
Out[2]: dshape("{ x : int32; y : int32 }")
In [3]: blaze.dshape('{x:int32;y:int32}')
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
C:\Anaconda\lib\site-packages\IPython\core\interactiveshell.pyc in run_code(self, code_obj)
2730 self.CustomTB(etype,value,tb)
2731 except:
-> 2732 self.showtraceback()
2733 else:
2734 outflag = 0
C:\Anaconda\lib\site-packages\IPython\core\interactiveshell.pyc in showtraceback(self, exc_tuple, filename, tb_offset, exception_only)
1718 value, tb, tb_offset=tb_offset)
1719
-> 1720 self._showtraceback(etype, value, stb)
1721 if self.call_pdb:
1722 # drop into debugger
C:\Anaconda\lib\site-packages\IPython\zmq\zmqshell.pyc in _showtraceback(self, etype, evalue, stb)
537 u'traceback' : stb,
538 u'ename' : unicode(etype.__name__),
--> 539 u'evalue' : safe_unicode(evalue)
540 }
541
C:\Anaconda\lib\site-packages\IPython\zmq\zmqshell.pyc in safe_unicode(e)
443 """
444 try:
--> 445 return unicode(e)
446 except UnicodeError:
447 pass
C:\Anaconda\lib\site-packages\blaze\error.pyc in __str__(self)
54 filename = self.filename,
55 lineno = self.lineno,
---> 56 line = self.text.split()[self.lineno],
57 pointer = ' '*self.col_offset + '^',
58 msg = self.msg,
TypeError: list indices must be integers, not str
The shape of a ND array is converted into an 1D object, i.e. an array that is stored as:
barray: Array
datashape := 3, 4, float64
values := [CArray(ptr=4376427488)]
metadata := [manifest, arraylike]
layout := Chunked(dim=0)
it is retrieved as:
barray2: Array
datashape := 12, float64
values := [CArray(ptr=4376462352)]
metadata := [manifest, arraylike]
layout := Chunked(dim=0)
The next code snipped shows the issue:
import os.path
import shutil
import numpy as np
import blaze as blz
shape = (3,4)
arr = np.ones(shape)
dshape = "%s,%s, float64" % (shape[0], shape[1])
path = "p.blz"
if os.path.exists(path):
shutil.rmtree(path)
bparams = blz.params(storage=path)
barray = blz.Array(arr, dshape, params=bparams)
print "barray:", repr(barray)
barray2 = blz.open(path)
print "barray2:", repr(barray2)
assert(str(barray.datashape) == str(barray2.datashape))
If I run the code given there:
from blaze import Array, dshape
ds = dshape('2, 2, int')
a = Array([1,2,3,4], ds)
I get an object that has the datashape described, but behaves like a one-dimensional array. This code should throw an exception, in my opinion, because the data doesn't match the datashape.
The following is what the code should look like:
from blaze import Array, dshape
ds = dshape('2, 2, int')
a = Array([[1,2],[3,4]], ds)
I understand that it is recommended that users use Anaconda instead of attempting to build all of the dependencies. However, it would be great if the dependencies of blaze were listed somewhere. setup.py
references numpy and cython and there's a commented out reference to llvmpy. However, there's no mention of ply for example. Here are some possibilities:
I'm open to writing the code or docs.
Many of these were put in there to express desired capabilities. I think we should remove them, and rather have a design document describing how we want it to work.
In [1]: import blaze
In [2]: a = blaze.array([1,2,3])
In [3]: [x for x in dir(a) if not x.startswith('_')]
Out[3]: ['axes', 'capabilities', 'dshape', 'expr', 'labels', 'user', 'view']
Based on a fresh checkout (I'm currently on c287f4c):
$ python -c 'import blaze'
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/__init__.py", line 5, in <module>
from lib import *
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/lib.py", line 41, in <module>
from blaze.rts.funcs import PythonFn, install, lift
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/rts/funcs.py", line 30, in <module>
from blaze.metadata import all_prop
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/metadata.py", line 2, in <module>
from blaze.expr.utils import Symbol as S
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/expr/__init__.py", line 1, in <module>
import ops
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/expr/ops.py", line 1, in <module>
from graph import Op
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/expr/graph.py", line 26, in <module>
from blaze.sources.canonical import PythonSource
File "/home/ehiggs/.virtualenvs/pandas/local/lib/python2.7/site-packages/blaze/sources/canonical.py", line 6, in <module>
from blaze.sources.descriptors.byteprovider import ByteProvider
ImportError: No module named descriptors.byteprovider
fixing issue #14 surfaced some issues in carray when dealing with objects. Object support is not complete and needs its own set of tests to make sure all behavior is tested and expected behavior is properly documented.
In some occasions, specially in code that is in-flux for doing the conversion from numpy to dynd, you need to convert arrays and types bidirectionally between the two packages. In dynd we already have nd.to_numpy()
, but we lack a way to convert types to numpy types.
I have tried this:
In [7]: dt = nd.type_of(nd.empty('2,2,int32'))
In [8]: np.dtype(str(dt.dtype))
Out[8]: dtype('int32')
but the user can get unpleasant surprises:
In [9]: np.dtype(dt.dtype)
Violació de segment
I think it would be nice to offer something like a ndt.to_numpy()
function or a .to_numpy()
method for converting dynd types to numpy equivalents (when possible).
It seems that blaze.zeros() has undergone some significant slowdown lately, as the next script shows:
import blaze as blz
import numpy as np
from time import time
len_ = np.prod((100,100,100))
print "len for array:", len_
t0 = time()
a = np.arange(len_)
print "numpy creation time: %.3f" % (time() - t0,)
t0 = time()
b = blz.Array(a, dshape='%d, int32' % (len_,))
t1 = time() - t0
print "Final datashape:", b.datashape
print "blaze.Array creation time: %.3f" % (t1,)
t0 = time()
c = blz.zeros(dshape='%d, int32'% (len_,))
t2 = time() - t0
print "Final datashape:", c.datashape
print "blaze.zeros creation time: %.3f" % (t2,)
print "time ratio blaze.Array vs blaze.zeros: %.1fx" % (t2 / t1,)
and the out in my laptop is:
len for array: 1000000
numpy creation time: 0.008
Final datashape: 1000000, int32
blaze.Array creation time: 0.013
Final datashape: 1000000, int32
blaze.zeros creation time: 2.479
time ratio blaze.Array vs blaze.zeros: 184.3x
Perhaps it is a bit soon for this, but we should start considering some performance regression tool like FunkLoad or codespeed (or whatever).
I followed the instructions here to install:
http://blaze.pydata.org/docs/install.html
using the conda install ...
approach, and then looked at the quick start example here:
http://blaze.pydata.org/docs/quickstart.html
but import blaze
doesn't work:
In [1]: import blaze
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-5c5ee3cb747a> in <module>()
----> 1 import blaze
ImportError: No module named blaze
I tried conda install blaze
and pip install blaze
but both failed.
Incidentally, it would be nice if conda install blaze
installed ply
and blosc
.
Please advise what I am doing wrong. I should note that I've tried this both in Wakari and on my own laptop.
Right now:
In [43]: def gen(rows):
for i in rows:
yield 0.1*i
....:
In [44]: blaze.fromiter(gen(100), 'x, { f1: int; f2: int }')
Out[44]:
Array
datashape := 0, { f1 : int32; f2 : int32 }
values := [CArray(ptr=140531395163520)]
metadata := [manifest, arraylike]
layout := Chunked(dim=0)
[]
in numpy the exception passes through:
In [45]: np.fromiter(gen(100), np.float32)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-45-4665ba131c41> in <module>()
----> 1 np.fromiter(gen(100), np.float32)
<ipython-input-43-ee234889df46> in gen(rows)
1 def gen(rows):
----> 2 for i in rows:
3 yield 0.1*i
4
TypeError: 'int' object is not iterable
It seems that numpy behavior is more sensible.
Note: if passing a dshape with a non-variable dimension:
In [50]: blaze.fromiter(gen(100), '100, float')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-50-c413bd6d80b0> in <module>()
----> 1 blaze.fromiter(gen(100), '100, float')
/Users/ovillellas/continuum/blaze-core/blaze/toplevel.pyc in fromiter(iterable, dshape, params)
189 return open(rootdir)
190 else:
--> 191 ica = carray.fromiter(iterable, dtype, count=count, cparams=cparams)
192 source = CArraySource(ica, params=params)
193 return Array(source)
/Users/ovillellas/continuum/blaze-core/blaze/carray/toplevel.pyc in fromiter(iterable, dtype, count, **kwargs)
183 blen = chunklen
184 if count != sys.maxint:
--> 185 chunk = np.fromiter(iterable, dtype=dtype, count=blen)
186 else:
187 try:
ValueError: iterator too short
It may be that it is interpreting any exception as an end of iteration exception
Sorry, this isn't a "code" related issue, but I thought you'd like to know that your mailing list icon on the http://blaze.pydata.org/ home page doesn't point where I'd expect it to. It takes me to GitHub instead of to the Google Groups.
While writing some more tests, I noticed that blaze.toplevel.open() can be called with no arguments. The open() function tries to instantiate a CArraySource with no arguments, which doesn't work:
Traceback (most recent call last):
File "/Users/stan/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/work/projects/blaze-core/blaze/tests/test_toplevel.py", line 4, in test_open_uri_none
toplevel.open()
File "/work/projects/blaze-core/blaze/toplevel.py", line 44, in open
source = CArraySource()
File "/work/projects/blaze-core/blaze/sources/chunked.py", line 49, in __init__
(params.get('storage'))
AttributeError: 'NoneType' object has no attribute 'get'
What is the intended meaning of calling blaze.toplevel.open() with a uri set to None?
Right now, the barray and btable objects in BLZ implement the iter in the same class, and this can create problems in different situations:
the len() cannot be shared between the iterator and the underlying object (e.g. nd.array(b.where(a<5)) uses the len(b) to fill the object).
two iterators cannot be run simultaneously (e.g. zip(b.where(a<5), b.where(a>1 && a<6)))
Making the iterator to be an independent object will solve these issues.
Mark Wiebe reported this problem on Windows (master branch):
I'm getting the following failures in blaze master:
Cheers,
Mark
======================================================================
ERROR: blaze.tests.test_vlen.test_object_persistent_blob
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "D:\Develop\blaze\blaze\tests\test_vlen.py", line 54, in test_object_persistent_blob
for i, v in enumerate(c):
File "D:\Develop\blaze\blaze\table.py", line 240, in __getitem__
return retrieve(cc, indexer)
File "D:\Develop\blaze\blaze\layouts\query.py", line 36, in retrieve
return getitem(cc, indexer)
File "D:\Develop\blaze\blaze\layouts\query.py", line 81, in getitem
datum = elt.read(elt, lc)
File "D:\Develop\blaze\blaze\sources\chunked.py", line 148, in read
return self.ca.__getitem__(key)
File "carrayExtension.pyx", line 1654, in blaze.carray.carrayExtension.carray.__getitem__ (blaz
e/carray\carrayExtension.c:18053)
File "carrayExtension.pyx", line 1609, in blaze.carray.carrayExtension.carray.getitem_object (b
laze/carray\carrayExtension.c:17783)
File "carrayExtension.pyx", line 629, in blaze.carray.carrayExtension.chunks.__getitem__ (blaze
/carray\carrayExtension.c:7465)
File "carrayExtension.pyx", line 609, in blaze.carray.carrayExtension.chunks.read_chunk (blaze/
carray\carrayExtension.c:7094)
ValueError: chunkfile c:\users\mwiebe\appdata\local\temp\tmppkq0gg\c\data\__10.blp not found
======================================================================
ERROR: blaze.tests.test_vlen.test_object_persistent_blob_reopen
----------------------------------------------------------------------
Traceback (most recent call last):
File "C:\Anaconda\lib\site-packages\nose\case.py", line 197, in runTest
self.test(*self.arg)
File "D:\Develop\blaze\blaze\tests\test_vlen.py", line 69, in test_object_persistent_blob_reope
n
c2 = blaze.open(tmppath)
File "D:\Develop\blaze\blaze\toplevel.py", line 69, in open
source = CArraySource(params=parms)
File "D:\Develop\blaze\blaze\sources\chunked.py", line 63, in __init__
self.ca = carray.carray(data, rootdir=rootdir, cparams=cparams)
File "carrayExtension.pyx", line 874, in blaze.carray.carrayExtension.carray.__cinit__ (blaze/c
array\carrayExtension.c:10006)
File "carrayExtension.pyx", line 1120, in blaze.carray.carrayExtension.carray.read_meta (blaze/
carray\carrayExtension.c:13320)
IOError: [Errno 2] No such file or directory: '\\users\\mwiebe\\appdata\\local\\temp\\tmpeskzuq\\
c\\meta\\sizes'
Apparently this works on Unix.
We seem to require the network protocol for all uri's on file openings. For example
store = blaze.Storage('csv:///tmp/test.csv')
instead of
store = blaze.Storage('/tmp/test.csv')
not writing the network protocol is far better since this abuse of uri's will be misunderstood. Let's dispatch on file extension where possible.
Hi,
I'm trying to use & test blaze from the master branch. Should 'DyND' be in the requirements.txt or README file (and thus the related C++ libdynd library) ?
'doc/source/install.rst' said that dynd can be optional, is that true ? Maybe update install doc.
Thanks.
Damien G.
The problem comes from not handling when the source object is another array.
a=blaze.zeros('10,10,float64')
b=blaze.array(a) # fails
Looks like the CFFI tests are failing and being skipped on travis and jenkins.
RROR: test_1d_array (blaze.datadescriptor.tests.test_cffi_membuf_data_descriptor.TestCFFIMemBufDataDescriptor)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/mark/blaze/blaze/datadescriptor/tests/test_cffi_membuf_data_descriptor.py", line 34, in test_1d_array
self.assertEqual(dd.dshape, blaze.dshape('32, int16'))
AttributeError: 'module' object has no attribute 'dshape'
======================================================================
ERROR: test_2d_array (blaze.datadescriptor.tests.test_cffi_membuf_data_descriptor.TestCFFIMemBufDataDescriptor)
The next code shows the problem:
In []: a = blaze.array([(1, 2.1, "23")], dshape='1, { x: int32; y: float32; z: string }')
In []: a.dshape
Out[]: dshape("1, { x : int32; y : float32; z : string }")
In []: print a
exception raised in fillFormat: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
exception raised in fillFormat: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
exception raised in fillFormat: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
<snip>
/Users/faltet/software/blaze/blaze/_printing/_arrayprint.pyc in _to_numpy(ds)
36
37 def _to_numpy(ds):
---> 38 res = _internal_to_numpy(ds)
39 res = res if type(res) is tuple else ((), res)
40 return res
/Users/faltet/software/blaze/blaze/datashape/coretypes.py in to_numpy(ds)
1013
1014 # The datashape dimensions
-> 1015 for dim in ds[:-1]:
1016 if isinstance(dim, IntegerConstant):
1017 shape += (dim,)
/Users/faltet/software/blaze/blaze/datashape/coretypes.py in __getitem__(self, key)
785
786 def __getitem__(self, key):
--> 787 return self.__fdict[key]
788
789 def __eq__(self, other):
TypeError: unhashable type
However, dynd support this:
In []: dynd.nd.array([('1.2', '1', 'sds')], dtype='{x: float32; y: int8; z: string}')
Out[]: nd.array([[1.2, 1, "sds"]], strided_dim<{x : float32; y : int8; z : string}>)
I would say that we should start using dynd instead of numpy for printing blaze arrays.
Finally, note that the code above fails even if we declare the string as a fixed length (which should be supported by numpy):
In []: print blaze.array([(1, 2.1, "23")], dshape='1, { x: int32; y: float32; z: string(10) }')
I don't think drop universally signals deletion, I think we should rename it.
[Adapted from https://github.com/Blosc/bcolz/issues/25]
With Numpy I can do something like this:
foo = np.zeros([ 2 ] * 20)
And get an ndarray
with the corresponding shape. I can then:
ac = blaze.blz.barray(foo)
To get a barray
object. Great. But I'm playing around with blaze.blz because I want to use array sizes that are larger than could otherwise fit in memory, and [2] * 20
is an easy shape for Numpy to handle, so for me it's a baseline of sorts.
Looking to explore the capabilities of blze.blz
, I try to create the object directly, without the intermediate Numpy step:
ac = blaze.blz.zeros([2] * 20)
But I get an error:
In [11]: blz.zeros([2]*20)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-11-32ca191bdff9> in <module>()
----> 1 blz.zeros([2]*20)
/home/faltet/software/blaze/blaze/blz/bfuncs.pyc in zeros(shape, dtype, **kwargs)
239 """
240 dtype = np.dtype(dtype)
--> 241 return fill(shape=shape, dflt=np.zeros((), dtype), dtype=dtype, **kwargs)
242
243
/home/faltet/software/blaze/blaze/blz/bfuncs.pyc in fill(shape, dflt, dtype, **kwargs)
203 # Then fill it
204 # We need an array for the defaults so as to keep the atom info
--> 205 dflt = np.array(obj.dflt, dtype=dtype)
206 # Making strides=(0,) below is a trick to create the array fast and
207 # without memory consumption
ValueError: number of dimensions must be within [0, 32]
Which leads me to wonder: is something like this possible using blaze.blz
?
I ran many examples in the docs document, they didnt work. Much change has been happened in the 0.2 dev version? Figure out the matched document quickly please! thank in advance!
In [43]: dname = 'persisted.blz'
In [44]: store = blaze.Storage(dname)
ipython shows:
ValueError Traceback (most recent call last)
/home/drill/blaze/samples/basics/<ipython-input-44-465fd73ab60e> in <module>()
----> 1 store = blaze.Storage(dname)
/usr/local/lib/python2.7/dist-packages/blaze/storage.pyc in __init__(self, uri, mode, permanent)
91 self._mode = mode
92 if self._format != 'blz':
---> 93 raise ValueError("BLZ `format` '%s' is not supported." % self._format)
94 if not permanent:
95 raise ValueError(
ValueError: BLZ `format` '' is not supported.
Two things: in-memory caching, and on-disk caching.
The dynd-based server demo used naive in-memory caching of all its arrays. The blaze server doesn't, in lieu of putting a proper caching mechanism in place. This means the blaze server is really slow if the data backing it gets bigger, e.g. a few gigabytes.
For on-disk caching, we need a file format which fully supports the generality of the blaze data model. I think a format memory-mappable by dynd is the way to go, analogous to the numpy .npy format.
open fails to open persisted tables with an exception. It seems that tables do not create a proper meta folder and that's causing open to fail
During setup.py build. ucr-dtw dir is empty.
Matt Knox emailed the mailing list:
Just wanted to point out that the blaze docs can't be read in an
iPhone (at least on my 4S with latest iOS). The nav menu
hovers and blocks most of the text as you scroll.
In [23]: blaze.ones('32, float32')
Out[23]:
Array
datashape := 32, float64
values := [CArray(ptr=140628285688896)]
metadata := [manifest, arraylike]
layout := Chunked(dim=0)
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.
1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
http://blaze.pydata.org/docs/quickstart.html
from blaze import Table gives me an importError ( cannot import Table)
Implement the variable length string as described in proposal, with optional flag for encoding.
Reference: http://blaze.pydata.org/docs/datashape.html#string-types
There are several warnings "warning: implicit conversion shortens 64-bit value into a 32-bit value" when building extensions. Files involved are carrayExtension (cython), blosclz and blosc.
The record datashape itself can be created as:
"""
In []: blaze.datashape.dshape("Var, {x: int32; y:bool}")
Out[]: dshape("Var, { x : int32; y : bool }")
"""
but it does not work when using it for creating arrays:
"""
In []: blaze.array([(1, True), (2, False)], dshape="Var, { x : int32; y : bool }")
Out[]: exception raised in fillFormat: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
exception raised in fillFormat: ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
TypeError Traceback (most recent call last)
in ()
----> 1 blaze.array([(1, True), (2, False)], dshape="Var, { x : int32; y : bool }")
[clip]
/Users/faltet/software/blaze/blaze/datashape/coretypes.pyc in getitem(self, key)
785
786 def getitem(self, key):
--> 787 return self.fdict[key]
788
789 def __eq(self, other):
TypeError: unhashable type
"""
Notice that there are strange exceptions coming from fillFormat
too.
Running the command dshape(s) in the following example produces different error messages when run repeatedly. It should produce the same error message each time. After a while, it repeatedly gives the 'list index out of range' error, the last one listed below.
Error Messages:
C:\Anaconda\lib\site-packages\blaze\datashape\parser.pyc in p_error(p)
273 p.lexpos,
274 '<stdin>',
--> 275 p.lexer.lexdata,
276 )
277 else:
DatashapeSyntaxError:
File <stdin>, line 32
{
^
DatashapeSyntaxError: invalid syntax
DatashapeSyntaxError:
File <stdin>, line 63
in
^
DatashapeSyntaxError: invalid syntax
DatashapeSyntaxError:
File <stdin>, line 94
type:
^
DatashapeSyntaxError: invalid syntax
DatashapeSyntaxError:
File <stdin>, line 125
#
^
DatashapeSyntaxError: invalid syntax
DatashapeSyntaxError:
File <stdin>, line 156
{
^
DatashapeSyntaxError: invalid syntax
DatashapeSyntaxError:
File <stdin>, line 187
processed_date:
^
DatashapeSyntaxError: invalid syntax
DatashapeSyntaxError:
File <stdin>, line 218
bulkEntries:
^
DatashapeSyntaxError: invalid syntax
C:\Anaconda\lib\site-packages\blaze\error.pyc in __str__(self)
54 filename = self.filename,
55 lineno = self.lineno,
---> 56 line = self.text.split()[self.lineno],
57 pointer = ' '*self.col_offset + '^',
58 msg = self.msg,
IndexError: list index out of range
Code:
from blaze import dshape
s = """5, {
id: int64;
name: string;
description: {
languages: VarDim, string(2);
texts: json # map<string(2), string>;
};
status: string; # LoanStatusType;
funded_amount: float64;
basket_amount: json; # Option(float64);
paid_amount: json; # Option(float64);
image: {
id: int64;
template_id: int64;
};
video: json; # Option({
# id: int64;
# youtube_id: string;
#});
activity: string;
sector: string;
use: string;
# For 'delinquent', saw values \"null\" and \"true\" in brief search, map null -> false on import?
delinquent: bool;
location: {
country_code: string(2);
country: string;
town: json; # Option(string);
geo: {
level: string; # GeoLevelType
pairs: string; # latlong
type: string; # GeoTypeType
}
};
partner_id: int64;
posted_date: json; # datetime<seconds>;
planned_expiration_date: json; # Option(datetime<seconds>);
loan_amount: float64;
currency_exchange_loss_amount: json; # Option(float64);
borrowers: VarDim, {
first_name: string;
last_name: string;
gender: string(1); # GenderType
pictured: bool;
};
terms: {
disbursal_date: json; # datetime<seconds>;
disbursal_currency: json; # Option(string);
disbursal_amount: float64;
loan_amount: float64;
local_payments: VarDim, {
due_date: json; # datetime<seconds>;
amount: float64;
};
scheduled_payments: VarDim, {
due_date: json; # datetime<seconds>;
amount: float64;
};
loss_liability: {
nonpayment: string; # Categorical(string, [\"lender\", \"partner\"]);
currency_exchange: string;
currency_exchange_coverage_rate: json; # Option(float64);
}
};
payments: VarDim, {
amount: float64;
local_amount: float64;
processed_date: json; # datetime<seconds>;
settlement_date: json; # datetime<seconds>;
rounded_local_amount: float64;
currency_exchange_loss_amount: float64;
payment_id: int64;
comment: json; # Option(string);
};
funded_date: json; # datetime<seconds>;
paid_date: json; # datetime<seconds>;
journal_totals: {
entries: int64;
bulkEntries: int64;
}
}
type KivaLoansFile = {
header: {
total: int64;
page: int64;
date: string;
page_size: int64;
};
loans: VarDim, KivaLoan;
}"""
dshape(s)
Build breaks in some C files.
The test_dtype_compat()
test is currently failing because of a confusion between the "float" dtype in blaze and the "float" dtype in numpy:
======================================================================
FAIL: test_dtype_compat (blaze.tests.test_numpy_compat.TestToNumPy)
----------------------------------------------------------------------
Traceback (most recent call last):
File "blaze/tests/test_numpy_compat.py", line 23, in test_dtype_compat
self.assertEqual(to_numpy(blaze.float_), np.float_)
AssertionError: dtype('float32') != <type 'numpy.float64'>
----------------------------------------------------------------------
The issue is that blaze.float_ is defined in blaze.datashape.coretypes
to be:
float_ = CType('float')
Then CType.to_dtype()
has a special case:
if self.name == "float":
return np.dtype("f")
And np.dtype("f")
is a 32-bit float.
Fixing this is easy, but I think the root confusion is whether the blaze.float_
dshape is supposed to be the C "float" (32-bit) or the Python "float" (64-bit). Because of this mismatch, I personally would vote to either:
blaze.float32
or blaze.float64
dshapes.float_ = float64
.Happy to make a PR, but this is a design question for you guys. :)
I think the blaze.open should match blaze.load.
The example in: http://blaze.pydata.org/docs/quickstart.html#custom-dshapes fails with this error:
======================================================================
ERROR: blaze.tests.test_table.test_custom_dshape
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/faltet/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/faltet/software/blaze-core/blaze/tests/test_table.py", line 50, in test_custom_dshape
from blaze import int32, string
ImportError: cannot import name string
This fails even not using strings with this error:
======================================================================
ERROR: blaze.tests.test_table.test_custom_dshape
----------------------------------------------------------------------
Traceback (most recent call last):
File "/Users/faltet/anaconda/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/Users/faltet/software/blaze-core/blaze/tests/test_table.py", line 60, in test_custom_dshape
a = Table([(120, 153)], CustomStock)
File "/Users/faltet/software/blaze-core/blaze/table.py", line 453, in __init__
self._axes = self._datashape[-1].names
TypeError: 'DeclMeta' object does not support indexing
installed anaconda CE
got the blaze-core code from git
ran make build
python --version: Python 2.7.3 :: AnacondaCE 1.3.0 (x86_64)
OS: OSX Version 10.6.8
executed
"python setup.py test"
result:
When I try to run the tests in blaze/tests/test_quickstart.py with nose, I get an error that looks like a circular import:
======================================================================
ERROR: tests.test_quickstart.test_sqlite
----------------------------------------------------------------------
Traceback (most recent call last):
File "/work/projects/blaze/env/lib/python2.7/site-packages/nose/case.py", line 197, in runTest
self.test(*self.arg)
File "/work/projects/blaze/tests/test_quickstart.py", line 52, in test_sqlite
from blaze import open
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/__init__.py", line 5, in <module>
from lib import *
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/lib.py", line 41, in <module>
from blaze.rts.funcs import PythonFn, install, lift
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/rts/funcs.py", line 30, in <module>
from blaze.metadata import all_prop
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/metadata.py", line 2, in <module>
from blaze.expr.utils import Symbol as S
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/expr/__init__.py", line 1, in <module>
import ops
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/expr/ops.py", line 1, in <module>
from graph import Op
File "/work/projects/blaze/env/lib/python2.7/site-packages/blaze/expr/graph.py", line 24, in <module>
from blaze.expr import nodes, catalog
ImportError: cannot import name nodes
I think the problem is that blaze.expr.init ultimately performs an absolute import of blaze.expr again when trying to import blaze.expr.nodes.
Datashape related failure on CTable open.
ERROR: blaze.tests.test_toplevel.test_open_ctable
Traceback (most recent call last):
/home/stephen/continuum/anaconda/lib/python2.7/site-packages/nose/case.py line 197 in runTest
self.test(*self.arg)
blaze/tests/test_toplevel.py line 40 in test_open_ctable
c = toplevel.open(uri)
blaze/toplevel.py line 60 in open
source = CTableSource(params=parms)
blaze/sources/chunked.py line 206 in __init__
self.ca = ctable(data, rootdir=rootdir, cparams=cparams)
blaze/carray/ctable.py line 199 in __init__
self.open_ctable()
blaze/carray/ctable.py line 291 in open_ctable
self.cols.read_meta_and_open()
blaze/carray/ctable.py line 46 in read_meta_and_open
self._cols[str(name)] = carray(rootdir=dir_, mode=self.mode)
carrayExtension.pyx line 892 in blaze.carray.carrayExtension.carray.__cinit__ (blaze/carray/carrayExtension.c:10069)
carrayExtension.pyx line 1170 in blaze.carray.carrayExtension.carray.read_meta (blaze/carray/carrayExtension.c:13872)
/home/stephen/continuum/anaconda/lib/python2.7/site-packages/numpy/core/_internal.py line 166 in _commastring
(len(result)+1, astr))
ValueError: format number 1 of "[('x', '<i4'), ('y', '<i4')]" is not recognized
But yaml is an external dependency that should be listed at least in requisites for Blaze.
Test code:
>>> from blaze import dshape
>>> dshape('{x:int32; y:int32}')
dshape("{ x : int32; y : int32 }")
>>> dshape('type P = {x:int32; y:int32}')
>>> dshape("""type P = {x:int32; y:int32}
... 3, P""")
>>>
The behavior I expected is for it to return the last datashape declared in the string.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.