Giter Site home page Giter Site logo

hickle's Introduction

PyPI - Latest Release PyPI - Python Versions CodeCov - Coverage Status JOSS Status

Hickle

Hickle is an HDF5 based clone of pickle, with a twist: instead of serializing to a pickle file, Hickle dumps to an HDF5 file (Hierarchical Data Format). It is designed to be a "drop-in" replacement for pickle (for common data objects), but is really an amalgam of h5py and pickle with extended functionality.

That is: hickle is a neat little way of dumping python variables to HDF5 files that can be read in most programming languages, not just Python. Hickle is fast, and allows for transparent compression of your data (LZF / GZIP).

Why use Hickle?

While hickle is designed to be a drop-in replacement for pickle (or something like json), it works very differently. Instead of serializing / json-izing, it instead stores the data using the excellent h5py module.

The main reasons to use hickle are:

  1. It's faster than pickle and cPickle.
  2. It stores data in HDF5.
  3. You can easily compress your data.

The main reasons not to use hickle are:

  1. You don't want to store your data in HDF5. While hickle can serialize arbitrary python objects, this functionality is provided only for convenience, and you're probably better off just using the pickle module.
  2. You want to convert your data in human-readable JSON/YAML, in which case, you should do that instead.

So, if you want your data in HDF5, or if your pickling is taking too long, give hickle a try. Hickle is particularly good at storing large numpy arrays, thanks to h5py running under the hood.

Documentation

Documentation for hickle can be found at telegraphic.github.io/hickle/.

Usage example

Hickle is nice and easy to use, and should look very familiar to those of you who have pickled before.

In short, hickle provides two methods: a hickle.load method, for loading hickle files, and a hickle.dump method, for dumping data into HDF5. Here's a complete example:

import os
import hickle as hkl
import numpy as np

# Create a numpy array of data
array_obj = np.ones(32768, dtype='float32')

# Dump to file
hkl.dump(array_obj, 'test.hkl', mode='w')

# Dump data, with compression
hkl.dump(array_obj, 'test_gzip.hkl', mode='w', compression='gzip')

# Compare filesizes
print('uncompressed: %i bytes' % os.path.getsize('test.hkl'))
print('compressed:   %i bytes' % os.path.getsize('test_gzip.hkl'))

# Load data
array_hkl = hkl.load('test_gzip.hkl')

# Check the two are the same file
assert array_hkl.dtype == array_obj.dtype
assert np.all((array_hkl, array_obj))

HDF5 compression options

A major benefit of hickle over pickle is that it allows fancy HDF5 features to be applied, by passing on keyword arguments on to h5py. So, you can do things like:

hkl.dump(array_obj, 'test_lzf.hkl', mode='w', compression='lzf', scaleoffset=0,
         chunks=(100, 100), shuffle=True, fletcher32=True)

A detailed explanation of these keywords is given at http://docs.h5py.org/en/latest/high/dataset.html, but we give a quick rundown below.

In HDF5, datasets are stored as B-trees, a tree data structure that has speed benefits over contiguous blocks of data. In the B-tree, data are split into chunks, which is leveraged to allow dataset resizing and compression via filter pipelines. Filters such as shuffle and scaleoffset move your data around to improve compression ratios, and fletcher32 computes a checksum. These file-level options are abstracted away from the data model.

Dumping custom objects

Hickle provides several options to store objects of custom python classes. Objects of classes derived from built in classes, numpy, scipy, pandas and astropy objects will be stored using the corresponding loader provided by hickle. Any other classes per default will be stored as binary pickle string. Starting with version 4.x hickle offers the possibility to define dedicated loader functions for custom classes and starting with hickle 5.x these can be collected in module, package and application specific loader modules.

class MyClass():
    def __init__(self):
        self.name = 'MyClass'
        self.value = 42

To create a loader for MyClass the create_MyClass_dataset and either the load_MyClass or the MyClassContainer class have to be defined.

import hdf5
from hickle.helpers import no_compression

def create_MyClass_dataset(py_obj, h_group, name, **kwargs):
    """ 
    py_obj ..... the instance of MyClass to be dumped
    h_group .... the h5py.Group py_obj should be dumped into
    name ....... the name of the h5py.Dataset or h5py.Group representing py_obj
    **kwargs ... the compression keyword arguments passed to hickle.dump
    """
    
    # if content of MyClass can be represented as single matrix, vector or scalar
    # values than created a dataset of appropriate size. and either set its shape and 
    # dtype parameters # to the appropriate size and tyoe . or directly pass the data
    # using the data parameter
    ds = h_group.create_dataset(name,data = py_obj.value,**kwargs)

    ## NOTE: if your class represents a scalar using empty tuple for shape
    ##       then kwargs have to be filtered by no_compression
    # ds = h_group.create_dataset(name,data = py_obj.value,shape=(),**no_compression(kwargs))

    # set additional attributes providing additional specialisation of content
    ds.attrs['name'] = py_obj.name

    # when done return the new dataset object and an empty tuple or list
    return ds,()

def load_Myclass(h_node, base_type, py_obj_type):
    """
    h_node ........ the h5py.Dataset object containing the data of MyClass object to restore
    base_type ..... byte string naming the loader to be used for restoring MyClass object
    py_obj_type ... MyClass class or MyClass subclass object 
    """

    # py_obj_type should point to MyClass or any of its subclasses
    new_instance = py_obj_type()
    new_instance.name = h_node.attrs['name']
    new_instance.value = h_node[()]
	
    return new_instance

For dumping content of complex objects consisting of multiple sub-items which have to be stored as individual h5py.Dataset or h5py.Group objects than define create_MyClass_dataset using create_group method instead of create_dataset and define the corresponding MyClassContainer class.

import h5py
from hickle.helpers import PyContainer

def create_MyClass_dataset(py_obj, h_group, name, **kwargs):
    """ 
    py_obj ..... the instance of MyClass to be dumped
    h_group .... the h5py.Group py_obj should be dumped into
    name ....... the name of the h5py.Dataset or h5py.Group representing py_obj
    **kwargs ... the compression keyword arguments passed to hickle.dump
    """
    
    ds = h_group.create_group(name)

    # set additional attributes providing additional specialisation of content
    ds.attrs['name'] = py_obj.name

    # when done return the new dataset object and a tuple, list or generator function
    # providing for all subitems a tuple or list describing containgin 
    #  name ..... the name to be used storing the subitem within the h5py.Group object
    #  item ..... the subitem object to be stored
    #  attrs .... dictionary included in attrs of created h5py.Group or h5py.Dataset
    #  kwargs ... the kwargs as passed to create_MyClass_dataset function
    return ds,(('name',py_obj.name,{},kwargs),('value',py_obj.value,{'the answer':True},kwargs))


class MyClassContainer(PyContainer):
    """
    Valid container classes must be derived from hickle.helpers.PyContainer class
    """

    def __init__(self,h5_attrs,base_type,object_type):
        """
        h5_attrs ...... the attrs dictionary attached to the group representing MyClass
        base_type ..... byte string naming the loader to be used for restoring MyClass object
        py_obj_type ... MyClass class or MyClass subclass object 
        """

        # the optional protected _content parameter of the PyContainer __init__
        # method can be used to change the data structure used to store
	# the subitems passed to the append method of the PyContainer class
        # per default it is set to []
        super().__init__(h5_attrs,base_type,object_type,_content = dict())

    def filter(self,h_parent): # optional overload
        """
        generator member functoin which can be overloaded to reorganize subitems
        of h_parent h5py.Group before being restored by hickle. Its default
        implementation simply yields from h_parent.items(). 
        """
        yield from super().filter(h_parent)

    def append(self,name,item,h5_attrs): # optional overload
        """
        in case _content parameter was explicitly set or subitems should be sored 
        in specific order or have to be preprocessed before the next item is appended
        than this can be done before storing in self._content.

        name ....... the name identifying subitem item within the parent hdf5.Group
        item ....... the object representing the subitem
        h5_attrs ... attrs dictionary attached to h5py.Dataset, h5py.Group representing item
        """
        self._content[name] = item

    def convert(self):
        """
        called by hickle when all sub items have been appended to MyClass PyContainer
        this method must be implemented by MyClass PyContainer.
        """

        # py_obj_type should point to MyClass or any of its subclasses
        new_instance = py_obj_type()
        new_instance.__dict__.update(self._content)
        	
        return new_instance

In a last step the loader for MyClass has to be registered with hickle. This is done by calling hickle.lookup.LoaderManager.register_class method

from hickle.lookup import LoaderManager

# to register loader for object mapped to h5py.Dataset use
LoaderManager.register_class(
   MyClass,                # MyClass type object this loader handles
   b'MyClass',             # byte string representing the name of the loader 
   create_MyClass_Dataset, # the create dataset function defined in first example above
   load_MyClass,           # the load dataset function defined in first example above
   None,                   # usually None
   True,                   # Set to False to force explicit storage of MyClass instances in any case 
   'custom'                # Loader is only used when custom loaders are enabled on calling hickle.dump
)

# to register loader for object mapped to h5py.Group use
LoaderManager.register_class(
   MyClass,                # MyClass type object this loader handles
   b'MyClass',             # byte string representing the name of the loader 
   create_MyClass_Dataset, # the create dataset function defined in first example above
   None,                   # usually None
   MyClassContainer        # the PyContainer to be used to restore content of MyClass
   True,                   # Set to False to force explicit storage of MyClass instances in any case 
   None                    # if set to None loader is enabled unconditionally
)

# NOTE: in case content of MyClass instances may be mapped to h5py.Dataset or h5py.Group dependent upon
# their actual complexity than both types of loaders can be merged into one single
# using one common create_MyClass_dataset functoin and defining load_MyClass function and
# MyClassContainer class

For complex python modules, packages and applications defining several classes to be dumped and handled by hickle calling hickle.lookup.LoaderManager.register_class explicitly very quickly becomes tedious and confusing. Therefore hickle offers from hickle 5.x on the possibility to collect all loaders for classes and objects defined by your module, package or application within dedicated loader modules and install them along with your module, package and application.

For packages and application packages the load_MyPackage.py loader module has to be stored within hickle_loaders directory of the package directory (the first which contains a init.py file) and should be structured as follows.

from hickle.helpers import PyContainer

## define below all create_MyClass_dataset load_MyClass functions and MyClassContainer classes
## of the loaders serving your module, package, application package or application

....

## the class_register table and the exclude_register table are required
## by hickle to properly load and apply your loaders
## each row in the class register table will corresponds to the parameters
## of LoaderManager.register_class and has to be specified in the same order
## as above

class_register = [
   [ MyClass,                # MyClass type object this loader handles
     b'MyClass',             # byte string representing the name of the loader 
     create_MyClass_Dataset, # the create dataset function defined in first example above
     load_MyClass,           # the load dataset function defined in first example above
     None,                   # usually None
     True,                   # Set to False to force explicit storage of MyClass instances in any case 
     'custom'                # Loader is only used when custom loaders are enabled on calling hickle.dump
   ],
   [ MyClass,                # MyClass type object this loader handles
     b'MyClass',             # byte string representing the name of the loader 
     create_MyClass_Dataset, # the create dataset function defined in first example above
     None,                   # usually None
     MyClassContainer        # the PyContainer to be used to restore content of MyClass
     True,                   # Set to False to force explicit storage of MyClass instances in any case 
     None                    # if set to None loader is enabled unconditionally
   ]
]

# used by hickle 4.x legacy loaders and other special loaders
# usually an empty list
exclude_register = []

For single file modules and application scripts the load_MyModule.py or load_MyApp.py files have to be stored within the hickle_loaders directory located within the same directory as MyModule.py or My_App.py. For further examples of more complex loaders and on how to store bytearrays and strings such that they can be compressed when stored see default loader modules in hickle/loaders/ directory.

Note: storing complex objects in HDF5 file

The HDF5 file format is designed to store several big matrices, images and vectors efficiently and attach some metadata and to provide a convenient way access the data through a tree structure. It is not designed like python pickle format for efficiently mapping the in memory object structure to a file. Therefore mindlessly storing plenty of tiny objects and scalar values without combining them into a single datataset will cause the HDF5 and thus the file created by hickle explode. File sizes of several 10 GB are likely and possible when a pickle file would just need some 100 MB. This can be prevented by create_MyClass_dataset method combining sub-items into bigger numpy arrays or other data structures which can be mapped to h5py.Datasets and load_MyClass function and/or MyClassContainer.convert method restoring actual structure of the sub-items on load.

Recent changes

  • December 2021: Release of version 5, support for h5py >= 3.0 and numpy >= 1.21
  • June 2020: Major refactor to version 4, and removal of support for Python 2.
  • December 2018: Accepted to Journal of Open-Source Software (JOSS).
  • June 2018: Major refactor and support for Python 3.
  • Aug 2016: Added support for scipy sparse matrices bsr_matrix, csr_matrix and csc_matrix.

Performance comparison

Hickle runs a lot faster than pickle with its default settings, and a little faster than pickle with protocol=2 set:

In [1]: import numpy as np

In [2]: x = np.random.random((2000, 2000))

In [3]: import pickle

In [4]: f = open('foo.pkl', 'w')

In [5]: %time pickle.dump(x, f)  # slow by default
CPU times: user 2 s, sys: 274 ms, total: 2.27 s
Wall time: 2.74 s

In [6]: f = open('foo.pkl', 'w')

In [7]: %time pickle.dump(x, f, protocol=2)  # actually very fast
CPU times: user 18.8 ms, sys: 36 ms, total: 54.8 ms
Wall time: 55.6 ms

In [8]: import hickle

In [9]: f = open('foo.hkl', 'w')

In [10]: %time hickle.dump(x, f)  # a bit faster
dumping <type 'numpy.ndarray'> to file <HDF5 file "foo.hkl" (mode r+)>
CPU times: user 764 us, sys: 35.6 ms, total: 36.4 ms
Wall time: 36.2 ms

So if you do continue to use pickle, add the protocol=2 keyword (thanks @mrocklin for pointing this out).

For storing python dictionaries of lists, hickle beats the python json encoder, but is slower than uJson. For a dictionary with 64 entries, each containing a 4096 length list of random numbers, the times are:

json took 2633.263 ms
uJson took 138.482 ms
hickle took 232.181 ms

It should be noted that these comparisons are of course not fair: storing in HDF5 will not help you convert something into JSON, nor will it help you serialize a string. But for quick storage of the contents of a python variable, it's a pretty good option.

Installation guidelines

Easy method

Install with pip by running pip install hickle from the command line.

Install on Windows 32 bit

Prebuilt Python wheels packages are available on PyPi until H5PY version 2.10 and Python 3.8. Any newer versions have to be built and installed Manually.

  1. Install h5py 2.10 with pip by running pip install "h5py==2.10" from the commandline

  2. Install with pip by running pip install hickle form the command line

Manual install

  1. You should have Python 3.5 and above installed

  2. Install hdf5 (Official page: http://www.hdfgroup.org/ftp/HDF5/current/src/unpacked/release_docs/INSTALL) (Binary Downloads: https://portal.hdfgroup.org/display/support/Downloads) Note: On Windows 32 bit install prebuilt binary package for libhdf5 1.10.4, which is the latest version supporting 32 bit on Windows

  3. Install h5py (Official page: http://docs.h5py.org/en/latest/build.html)

  4. Download hickle: via terminal: git clone https://github.com/telegraphic/hickle.git via manual download: Go to https://github.com/telegraphic/hickle and on right hand side you will find Download ZIP file

  5. cd to your downloaded hickle directory

  6. Then run the following command in the hickle directory: python setup.py install

Optional requirements:

  • dill: needed when files generated by hickle 3 and/or hickle 4 are to be loaded with hickle >= 5, for developent and testing
  • astropy: needed for development and testing
  • pandas: needed for development and testing

Testing

Once installed from source, run python setup.py test to check it's all working.

Bugs & contributing

Contributions and bugfixes are very welcome. Please check out our contribution guidelines for more details on how to contribute to development.

Referencing hickle

If you use hickle in academic research, we would be grateful if you could reference our paper in the Journal of Open-Source Software (JOSS).

Price et al., (2018). Hickle: A HDF5-based python pickle replacement. Journal of Open Source Software, 3(32), 1115, https://doi.org/10.21105/joss.01115

hickle's People

Contributors

1313e avatar arctice avatar arfon avatar basnijholt avatar betteridiot avatar craffel avatar ebenolson avatar edwardbetts avatar eendebakpt avatar elliottash avatar femtotrader avatar hernot avatar isuruf avatar ldryan0 avatar mmckerns avatar mr-c avatar r-xue avatar telegraphic avatar wangqr avatar xerus avatar zimmerrol avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hickle's Issues

How to append?

`import hickle

import numpy as np

a = np.ones(32000, dtype='float32')

hickle.dump(a, 'test.hkl', mode='a')

b = np.ones(32000, dtype='float32')

hickle.dump(b, 'test.hkl', mode='a')
`

Then I got an error: Unable to create link (Name already exists)

Error when using hkl.load using python3.5

I am using Dev branch, and when I am using hkl.load I get following error
File "C:\Users\Sanket\AppData\Local\Programs\Python\Python35\lib\site-packages\hickle-3.0.0-py3.5.egg\hickle\hickle.py", line 487, in load py_container = _load(py_container, h_root_group) File "C:\Users\Sanket\AppData\Local\Programs\Python\Python35\lib\site-packages\hickle-3.0.0-py3.5.egg\hickle\hickle.py", line 549, in _load py_subcontainer = _load(py_subcontainer, h_node) File "C:\Users\Sanket\AppData\Local\Programs\Python\Python35\lib\site-packages\hickle-3.0.0-py3.5.egg\hickle\hickle.py", line 549, in _load py_subcontainer = _load(py_subcontainer, h_node) File "C:\Users\Sanket\AppData\Local\Programs\Python\Python35\lib\site-packages\hickle-3.0.0-py3.5.egg\hickle\hickle.py", line 539, in _load py_subcontainer.key_type = h_group.attrs['key_type'] File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (D:\Build\h5py\h5py-2.7.0\h5py\_objects.c:2853) File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (D:\Build\h5py\h5py-2.7.0\h5py\_objects.c:2811) File "C:\Users\Sanket\AppData\Local\Programs\Python\Python35\lib\site-packages\h5py\_hl\attrs.py", line 58, in __getitem__ attr = h5a.open(self._id, self._e(name)) File "h5py\_objects.pyx", line 54, in h5py._objects.with_phil.wrapper (D:\Build\h5py\h5py-2.7.0\h5py\_objects.c:2853) File "h5py\_objects.pyx", line 55, in h5py._objects.with_phil.wrapper (D:\Build\h5py\h5py-2.7.0\h5py\_objects.c:2811) File "h5py\h5a.pyx", line 77, in h5py.h5a.open (D:\Build\h5py\h5py-2.7.0\h5py\h5a.c:2350) KeyError: "Can't open attribute (Can't locate attribute: 'key_type')"

Python 3 compatibility?

Are there any plans to make hickle compatible with Python 3?

What kind of issues would an attempted transition face?

Todo list: V3 functionality and documentation

After pretty major refactor, there's some tidying to do, and also some other functionality that could be added. Here's the new todo list:

  • Python 3 support (#8)
  • Refactor code for DRY niceness
  • Sphinx documentation / readthedocs
  • Write formal hickle specification V2, with metadata moved into HDF5 attributes
  • Look at parsing globals() and locals() to save entire state (may be a bad idea!)
  • Create 'plugin' system to allow users to add their own datatype handling, without adding loads of dependencies on hickle itself.
    • Consider handling Pandas DataFrame type (may be a bad idea, adds pandas dependency)
    • Consider handling Astropy types
  • Add versioning and extra metadata so hickle files can be matched to the Python and Hickle version that created them.
  • Add warnings if trying to open Py 2 file in Py 3 and vice-versa, and then do best-effort to open it anyway.
  • Dump objects into h5py.group (See #54)
  • Ensure user-defined object handles OK (See #39)
  • Add Collections support

Hickling doesn't preserve list order

Is this the expected behavior? Seems to be true in general, this isn't a contrived example.

d = [np.arange(n) for n in range(20)]
print d[:4]
hickle.dump(d, 'test.h5')
print hickle.load('test.h5')[:4]

yields

[array([], dtype=int64), array([0]), array([0, 1]), array([0, 1, 2])]
[array([], dtype=int64), array([0]), array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10])]

Hickle doesn't handle empty lists

Empty lists trigger an error in helpers.check_iterable_item_type. This might be a bit tricky, since it's not clear which type to return. In theory, any type should be fine?

NameError: name 'file' is not defined in Python3

Used the below code to dump 100 x 100 numpy array to a hickle file

def save_hickle(file_name, data):
    hickle.dump(data,file_name,mode='w')


#save hickle
data = np.zeros([100, 100])
hickle_name ='data_100' + '.hkl'
save_hickle(hickle_name, data)

I get the following Traceback:

NameError Traceback (most recent call last)
in ()
27 #save hickle
28 hickle_name = filename + '.hkl'
---> 29 save_hickle(hickle_name, data)
30
31 # save joblib

in save_hickle(file_name, data)
18
19 def save_hickle(file_name, data):
---> 20 hickle.dump(data,file_name,mode='w')
21
22 def save_joblib(filename, data):

/Users/machinename/anaconda/lib/python3.5/site-packages/hickle.py in dump(py_obj, file_obj, mode, track_times, path, **kwargs)
306 try:
307 # Open the file
--> 308 h5f = file_opener(file_obj, mode, track_times)
309 h5f.attrs["CLASS"] = 'hickle'
310 h5f.attrs["VERSION"] = 3

/Users/machinename/anaconda/lib/python3.5/site-packages/hickle.py in file_opener(f, mode, track_times)
146 """
147 # Were we handed a file object or just a file name string?
--> 148 if isinstance(f, file):
149 filename, mode = f.name, f.mode
150 f.close()

NameError: name 'file' is not defined

Show pickle timings with protocol=2 in README

Hey, nice project. I really like seeing more options for the dump/load interface.

Your README shows timings that place pickle as much much slower than hickle. I suspect that you're not using the protocol= keyword to dump. Using this keyword pickle dumps numpy arrays in very close to their raw form, suffering almost no overhead.

Some timings

In [1]: import numpy as np

In [2]: x = np.random.random((2000, 2000))

In [3]: import pickle

In [4]: f = open('foo.pkl', 'w')

In [5]: %time pickle.dump(x, f)  # slow by default
CPU times: user 2 s, sys: 274 ms, total: 2.27 s
Wall time: 2.74 s

In [6]: f = open('foo.pkl', 'w')

In [7]: %time pickle.dump(x, f, protocol=2)  # actually very fast
CPU times: user 18.8 ms, sys: 36 ms, total: 54.8 ms
Wall time: 55.6 ms

In [8]: import hickle

In [9]: f = open('foo.hkl', 'w')

In [10]: %time hickle.dump(x, f)  # a bit faster
dumping <type 'numpy.ndarray'> to file <HDF5 file "foo.hkl" (mode r+)>
CPU times: user 764 µs, sys: 35.6 ms, total: 36.4 ms
Wall time: 36.2 ms

Hickle fails with dictionaries with unicode keys

hickle 2.1.0 on python 2.7:

str keys ok:

In [10]: data = {'abc': 123, 'def': 456}
In [12]: hickle.dump(data, 'tst.hkl')
In [13]: data2 = hickle.load('tst.hkl')
In [14]: data2
Out[14]: {'abc': 123, 'def': 456}

unicode key fail:

In [15]: data = {'abc': 123, u'def': 456}
In [16]: hickle.dump(data, 'tst.hkl')
In [17]: data2 = hickle.load('tst.hkl')
In [18]: data2
Out[18]: ['abc', u'def']

Python3: legacy loading

Hi,

I tested the hickle code and found that the part in line 487 (dev branch):
except AssertionError: import hickle_legacy return hickle_legacy.load(fileobj, safe) ``
is not Python3 compatible, maybe it is reasonable to implement a _legacy_load() function instead.

Best regards

Jan

Dictionary keys with a backslash

HDF5 format uses the backslash to identify a collection so if a backslash is encountered from a mappable object then the backslash will cause the creation of a collection instead of escaping the backslash. Maybe you can replace the backslash with some sort of 'key' like \backslash while packing, which you could then replace with the symbol '/' on reconstruction?

Close file after exception in dump

Hi,

The output file is not closed if an exception occurs during dump. So if application passes the filename only, it has no way to close file by itself.

In particular, application can not remove the broken file created after the unsuccessful dump.

Thanks,
Sergey

Unable to dump complex values using a dictionary

I am trying to dump a dictionary with some complex values in a dictionary, I get a warning:

UserWarning: <type 'complex'> type not understood, data have been serialized

However for the same data I have numpy arrays with complex data in them, and are pickle without problem. It this expected, I am doing something wrong?

#!/usr/bin/env python
# -*- coding: utf-8 -*-
import hickle as hkl
import numpy as np
data = {"A":1.5, "B":1.5 + 1j, "C":np.linspace(0,1,4) + 2j}
hkl.dump(data, "foo.hkl")
#/home/user/.local/lib/python2.7/site-packages/hickle.py:497: UserWarning: <type 'complex'> type not understood, data have been serialized
data_load = hkl.load("foo.hkl")
print data_load["B"]
# ['c__builtin__\ncomplex\np1\n(F1.5\nF1\ntRp2\n.']
print data_load["C"]
# [ 0.00000000+2.j  0.33333333+2.j  0.66666667+2.j  1.00000000+2.j]

Cheers,

global name 'scipy' is not defined

This is a really frustrating error to get after running a couple hours of processing and trying to dump a file:

Traceback (most recent call last):
  File "__init__.py", line 28, in <module>
    dof    = 6)
  File "/home/mohawkjohn/axiom_control/plot.py", line 198, in all
    cogs_6dof = cog.compute_or_load_candidates(thruster_config, R = R, F = F, dof = 6, n_on = 6)
  File "/home/mohawkjohn/axiom_control/cog.py", line 244, in compute_or_load_candidates
    hkl.dump(cogs, filename, 'w')
  File "/home/mohawkjohn/.local/lib/python2.7/site-packages/hickle.py", line 319, in dump
    _dump(py_obj, h_root_group, **kwargs)
  File "/home/mohawkjohn/.local/lib/python2.7/site-packages/hickle.py", line 259, in _dump
    elif check_is_scipy_sparse_array(py_obj):
  File "/home/mohawkjohn/.local/lib/python2.7/site-packages/hickle.py", line 237, in check_is_scipy_sparse_array
    is_sparse = type(py_obj) in (type(scipy.sparse.csr_matrix([0])), type(scipy.sparse.csc_matrix([0])))
NameError: global name 'scipy' is not defined

I'm trying to write a NumPy ndarray. I don't have SciPy installed. I don't need SciPy. SciPy is not a dependency for Hickle. I now have to install SciPy to allow it to do a simple type-check.

:P

AttributeError: 'NoneType' object has no attribute 'attrs'

import hickle as hkl
a = 'full path to file'
b = hkl.load(a,'r')

I am getting following error during another run. I don't know why it is creating NoneType type error. h5py with the h5py.File(a) is working.

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 624, in load
    py_container = _load(py_container, h_root_group)
  File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 756, in _load
    subdata = load_dataset(h_group)
  File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 643, in load_dataset
    py_type = h_node.attrs["type"][0]
AttributeError: 'NoneType' object has no attribute 'attrs'

I am able to read the file using h5py. This must be the hickle error.
How come h_node is producing None?

Should be able to handle recursive tuples of tuples of numpy objects

hickle is not capable of handling nested structures of lists and tuples containing various leaf nodes of numpy arrays, integers, strings, etc. at various positions.

This is because internally its handling of lists and tuples is not recursive. If dump_list() used the same dumper_lookup() logic to dump the elements of the list as the main dump() function, then hickle would be able to handle arbitrarily deep structures. Same applies to dump_tuple() and dump_dict().

Since hdf5 natively handles tree structured (or graph-structured) data, all of this can be made to work just fine.

This would also dramatically simplify dump_dict()

Storing scalars with compression

When running hickle.dump({'d': 0}, 'dummy.hkl', mode='w', compression='gzip') with hickle 2.0.4 I get a "Scalar datasets don't support chunk/filter options" from h5py. However, the same command works with hickle 1.1.1. What did change?
Also hickle.dump({'d': 0}, 'dummy.hkl', mode='w') works with hickle 2.0.4.

Dumping / loading multiple hickles to a single HDF5 file

A desired enhancement, as discussed in issue #20.

Example usage:

...
hkl.dump(d1, 'filename.hdf')
hkl.dump(d2, 'filename.hdf', dset='data_2')  

d1 = hkl.load('filename.hdf')
d2 = hkl.load('filename.hdf', dset='data_2')

I'm already calling it the "pickle jar"

Is hickle support sparse matrix from scipy?

The data to be dumped with hickle contains the sparse matrix.

the sparse matrix comes from scipy.sparse.csr_matrix

but it will throw the exceptions when dump

File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 251, in _dump
    _dump(py_subobj, h_subgroup, call_id=ii, **kwargs)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 233, in _dump
    item_type = check_iterable_item_type(py_obj)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/hickle.py", line 195, in check_iterable_item_type
    first_type = type(next(iseq))
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/base.py", line 148, in __iter__
    yield self[r, :]
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/csr.py", line 292, in __getitem__
    return self._get_row_slice(row, col)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/csr.py", line 381, in _get_row_slice
    row_slice = self._get_submatrix(i, cslice)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/csr.py", line 456, in _get_submatrix
    return self.__class__((data,indices,indptr), shape=shape)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 25, in __init__
    _data_matrix.__init__(self)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/data.py", line 23, in __init__
    spmatrix.__init__(self)
  File "/home/alec/.pyenv/versions/my-faster-rcnn/lib/python2.7/site-packages/scipy/sparse/base.py", line 72, in __init__
    if self.__class__.__name__ == 'spmatrix':
RuntimeError: maximum recursion depth exceeded in cmp
Exception KeyError: KeyError(139816068399184,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816068017600,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816068172608,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816068913344,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816067814976,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816068498368,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816068657472,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816067497568,) in 'h5py._objects.ObjectID.__dealloc__' ignored
Exception KeyError: KeyError(139816067652576,) in 'h5py._objects.ObjectID.__dealloc__' ignored

Seems it recursively called _dump method in hickle.

Any idea?

File Cache increasing

Is there any way to stop caching the file into memory while loading them?
I'm using hickle format to store large amount of image batches. When I run my python code on linux,checking the "top", the cache and "used mem" kept increasing until full. However I don't really need to cache them because I only use each file content once. So I wonder if this kind of feature can be disabled.

file locking

Sorry for not being too precise but on some occasions I have found that the files are locked and inaccessible (locked) after loading or dumping. Terminating the python process releases the files.

Not able to load hickle

Dear friends,

I am having the following error when trying to import hickle:

>>> import hickle as hkl
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/dlm/theano/lib/python3.4/site-packages/hickle.py", line 621
    print h_node.name, py_type, h_node.attrs.keys()
               ^

Can you help me?

Thanks,

David

hickle is broken, at least in python3

If I run the example in the tutorial I get the following error.
I am referring to the version installed using pip

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-76-16caf8e6745e> in <module>()
      7 
      8 # Dump to file
----> 9 hkl.dump(array_obj, 'test.hkl', mode='w')
     10 
     11 # Dump data, with compression

~/anaconda3/lib/python3.6/site-packages/hickle.py in dump(py_obj, file_obj, mode, track_times, path, **kwargs)
    306     try:
    307         # Open the file
--> 308         h5f = file_opener(file_obj, mode, track_times)
    309         h5f.attrs["CLASS"] = 'hickle'
    310         h5f.attrs["VERSION"] = 3

~/anaconda3/lib/python3.6/site-packages/hickle.py in file_opener(f, mode, track_times)
    146     """
    147     # Were we handed a file object or just a file name string?
--> 148     if isinstance(f, file):
    149         filename, mode = f.name, f.mode
    150         f.close()

NameError: name 'file' is not defined

Using dump is there anyway to not output the individual dataset names?

Hello hickle creators,

This is a wonderful package and I want to say thank you for putting it together! It works like a breeze.

As a test, I followed your example in the readme file. I extended to get closer to the example I need for outputting my own dataset.

h = 22
data = {

"single_eff_N_{0}".format(h) : np.array((100)),
"single_L_PSD_{0}".format(h) : np.array((100,1000))
    }

hkl.dump(data, 'output_filename2.h5')

When I look into the file that I created, it has additional information that I would prefer not to have. It wasn't quite clear to me by looking at the source code for hickle how this piece of information gets written down. I am still learning the ins-n-outs of store data into hdf5 files and am not sure where to look to ask it not to output that info. The datasets that I would like to use this on are pretty long and it would be great not to have the redundancy.

additional info

Thank you!

ShimWarning when importing hickle in ipython

Hello,

I'm using hickle 2.1.0, python 2.7.11, ipython 5.3.0.

import hickle

still allows me to use hickle, but the warning message below is bothering:

/Users/xxx/anaconda/lib/python2.7/site-packages/IPython/kernel/init.py:13: ShimWarning: The IPython.kernel package has been deprecated since IPython 4.0.You should import from ipykernel or jupyter_client instead.
"You should import from ipykernel or jupyter_client instead.", ShimWarning)

Thanks

Import error in hickle

I tried to load hickle but got the following error:

import hickle as hkl
Traceback (most recent call last):
File "", line 1, in
ImportError: No module named hickle

I tried to install hickle by sudo pip install hickle it says Could not find any downloads that satisfy the requirement hickle. Any suggestions?

saving list with string fails

The following minimal example fails on the dev branch.

import hickle

hickle.dump('hi', 'test1.hickle') # fine
hickle.dump(['hello'], 'test2.hickle')  # fails 

It is related to the h5py string handling, but since the data is so simple I think hickle should be able to handle this case.

@telegraphic

NameError: name 'pickle' is not defined

I am working in python3.6 and dev branch.

When I try to pickle a more complex class containing as attributes numpy arrays I get the following error:

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-12-9d2713da6e4d> in <module>()
----> 1 hkl.dump(vlm, open('test4.hkl', "w"))

~/Github/hickle/hickle/hickle.py in dump(py_obj, file_obj, mode, track_times, path, **kwargs)
    278             h_root_group.attrs["type"] = [b'hickle']
    279 
--> 280         _dump(py_obj, h_root_group, **kwargs)
    281         h5f.close()
    282     except NoMatchError:

~/Github/hickle/hickle/hickle.py in _dump(py_obj, h_group, call_id, **kwargs)
    245     # item is not iterable, so create a dataset for it
    246     else:
--> 247         create_hkl_dataset(py_obj, h_group, call_id, **kwargs)
    248 
    249 

~/Github/hickle/hickle/hickle.py in create_hkl_dataset(py_obj, h_group, call_id, **kwargs)
    325 
    326     # do the creation
--> 327     create_dataset(py_obj, h_group, call_id, **kwargs)
    328 
    329 

~/Github/hickle/hickle/hickle.py in no_match(py_obj, h_group, call_id, **kwargs)
    388         call_id (int): index to identify object's relative location in the iterable.
    389     """
--> 390     pickled_obj = pickle.dumps(py_obj)
    391     d = h_group.create_dataset('data_%i' % call_id, data=[pickled_obj])
    392     d.attrs["type"] = [b'pickle']

NameError: name 'pickle' is not defined

Incorrect data type stored - ndarray instead of float

Original object:
(('6', 0.0, 279.0, ('', 'data', 'probe_td', 'eval', '00000043', 's02_2011_11_03', '00000043_s02_a02.edf')),
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...])

Stored object:
(('6', ndarray,ndarray, ('', 'data', 'probe_td', 'eval', '00000043', 's02_2011_11_03', '00000043_s02_a02.edf')),
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, ...])

Tuple not supported?

in _dump_dict you could add:

    elif type(dd[key]) is tuple:
        hgroup.create_dataset("%s" % key, data=dd[key], compression=compression)
        hgroup.create_dataset("_%s" % key, data=["tuple"])

Hickle dump doesn't work on simple case (Python 3.6)

Python 3.6.3 |Anaconda, Inc.| (default, Nov 20 2017, 20:41:42)
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hickle
/home/antor/miniconda3/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
>>> print(hickle.__version__)
2.1.0
>>> dict = { }
>>> dict['foo'] ='bar'
>>> hickle.dump(dict, 'test.hkl')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/antor/miniconda3/lib/python3.6/site-packages/hickle.py", line 308, in dump
    h5f = file_opener(file_obj, mode, track_times)
  File "/home/antor/miniconda3/lib/python3.6/site-packages/hickle.py", line 148, in file_opener
    if isinstance(f, file):
NameError: name 'file' is not defined

Refactor code for v2 release

While code base is still relatively small, parts of it violate DRY (don't repeat yourself) principles, and it would be nice to refactor. A V2 release is motivated, with extra documentation and more thought into backwards compatibility. A quick todo list:

  • Refactor code for DRY niceness
  • Sphinx documentation / readthedocs
  • Write formal hickle specification V2, with metadata moved into HDF5 attributes
  • Ensure backward compatibility by adding version info in root attribute
  • Look at parsing globals() and locals() to save entire state (may be a bad idea!)
  • Allow loading and dumping multiple hickles to a single HDF5 "hickle jar" (see ticket #21)

IOError: Unable to open file when opening .hkl file from Directory

I have compiled my training data into hickle file with extension .hkl

>>> import hickle as hkl
>>> a = hkl.load('~/path/folder with file/X_train.hkl')

Above line is giving me the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 616, in load
    h5f = file_opener(fileobj)
  File "/usr/local/lib/python2.7/dist-packages/hickle.py", line 154, in file_opener
    h5f = h5.File(filename, mode)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 269, in __init__
    fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/usr/local/lib/python2.7/dist-packages/h5py/_hl/files.py", line 99, in make_fid
    fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
IOError: Unable to open file (unable to open file: name = '~/path/folder with file/X_train.hkl', errno = 2, error message = 'No such file or directory', flags = 0, o_flags = 0)

I have tried using 'r' with it, but same error is generated. What is the issue here? Is the file type not supported or way to load the file?

Please help. Thanks.

load hickle file with C++

Hi! I'm trying to load a hickle file with C++ and not sure what what's the associated dataset name or the group. Any hint will be very appreciated. Thanks!

Add support for OrderedDict

OrderedDict is a commonly used object from Collections. Also, note that Py 3.6 now retains dictionary key order.

Mapping these into sensible HDF5 containers will likely require extra metadata to be added. The group name should still be the dictionary key.

Redundant data_0?

Is it possible to make hickle to strip data_0 for simple structures like below?

import hickle
hickle.dump(dict(a = 1, b = dict(c = 2)), 'log.h5')
h5ls -r log.h5

# /                        Group
# /data_0                  Group
# /data_0/a                Group
# /data_0/a/data_0         Dataset {SCALAR}
# /data_0/b                Group
# /data_0/b/data_0         Group
# /data_0/b/data_0/c       Group
# /data_0/b/data_0/c/data_0 Dataset {SCALAR}

Is there any reason for having data_0 in the group structure? Couldn't the object be represented like below? (which is nicer for manual inspection)

# /                 Group
# /a                Dataset {SCALAR}
# /b                Group
# /b/c              Dataset {SCALAR}

Thanks for the useful package! It's quite handy for serializing PyTorch neural net models to a portable format.

Can't dump to existing file.

Can't dump more than one object to a file… not good.

>>> import hickle
>>> import h5py
>>> f = h5py.File('test.hdf', 'w')
>>> import numpy as np   
>>> a = np.arange(5)
>>> 
>>> hickle.dump(a, f, 'w')
dumping <type 'numpy.ndarray'> to file <HDF5 file "test.hdf" (mode r+)>
>>> hickle.dump(a, f, 'w')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mmckerns/lib/python2.7/site-packages/hickle.py", line 349, in dump
    h5f = file_opener(file, mode, track_times)
  File "/Users/mmckerns/lib/python2.7/site-packages/hickle.py", line 107, in file_opener
    raise FileError
hickle.FileErrorError: cannot open file. Please pass either a filename string, a file object, or a h5py.File

Fails for w mode and a mode.

hickle complex object.

Hi its unclear to me if this can be used to save complex objects into memory, including methods and imported modules? Or is it just for simple data formats.

Many thanks

support data type 'None'

Please add support for the data type none 'None'. Maybe initially convert it into a Boolean False, but this might cause issues.

Dump objects into h5py.Group

I am trying to hickle objects into existing h5py.File objects and put each object into its own Group. Unfortunately, hickle does not allow me to do that.
Here is a minimal example:

import hickle, numpy, h5py
# open h5py file and create a group
h = h5py.File("test.hdf5", 'w')
h.create_group("Array")
# dump data into that group
d = numpy.ones((10,10))
hickle.dump(d, h["Array"])

which raises:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File ".../lib/python2.7/site-packages/hickle.py", line 308, in dump
    h5f = file_opener(file_obj, mode, track_times)
  File ".../lib/python2.7/site-packages/hickle.py", line 163, in file_opener
    raise FileError
hickle.FileError: Cannot open file. Please pass either a filename string, a file object, or a h5py.File

I assume that it should be easy to also accept h5py.Group objects, as they have the same functionality as the h5py.File.

User-defined object handling

I'm trying to dump a user-defined object to a hickle file then later load it. However, when I try to load the object it is being deserialized as a numpy array instead of the user-defined python object. Below is an example...

import hickle

class Test:
    def __init__(self, _x, _y):
        self.x = _x
        self.y = _y

with open('bar.hkl', 'w') as f:
    obj = Test('hello', 'world')
    hickle.dump(obj, f)

with open('bar.hkl', 'r') as f:
    obj = hickle.load( f )

    print 'data type:', obj.dtype
    print obj
    print obj.x
    print obj.y

The output when this is run in python2.7.12 is...

>$ python test.py
/usr/local/lib/python2.7/dist-packages/hickle.py:498: UserWarning: <type 'instance'> type not understood, data have been serialized
  "serialized" % type(py_obj))
(u'/data_0', 'pickle', [u'type'])
data type: |S62
["(i__main__\nTest\np1\n(dp2\nS'y'\nS'world'\np3\nsS'x'\nS'hello'\np4\nsb."]
Traceback (most recent call last):
  File "test.py", line 17, in <module>
    print obj.x
AttributeError: 'numpy.ndarray' object has no attribute 'x'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.