cloudpipe / cloudpickle Goto Github PK

Extended pickling support for Python objects

License: Other

Python 99.88% Shell 0.12%

cloudpickle's Issues

Struggling to pickle nested class

When setting up an inner class like the one below, cloudpickle fails where dill would succeed. Based on the rather lengthy traceback, this appears to be due to falling back to Python's standard pickling mechanism.

import cloudpickle

class A(object):
    class B(object):
        def __init__(self):
            self.c = 0       
    def __init__(self):
        self.b = A.B()

a = A()
cloudpickle.dumps(a)

Integration tests against numpy and pandas

We'll also need to modify our Travis setup to bring in numpy, scipy, etc. as part of the builds. Example from patsy.

cloudpickle.loads returns a bad result on first invocation, correct on second?

Hi,

I ran into the following strange issue: I use cloudpickle to serialize my objects to send them to Kafka, and today I saw the following bug: calling cloudpickle.dumps on an object returns a malformed result that I can't load(), but calling it again on the very same object works!

Including the output from the debug console below; the object in question is a pretty big one using internal libraries.

Any idea what might be causing that?

import cloudpickle
a = cloudpickle.dumps(self)
b = cloudpickle.dumps(self)
cloudpickle.loads(b)
Out[5]: 
<aiostreams.runner.SendTo at 0x15383bbba20>
cloudpickle.loads(a)
Traceback (most recent call last):
  File "C:\Users\Egor\Anaconda2\envs\py3k\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-6-c33e741d5197>", line 1, in <module>
    cloudpickle.loads(a)
EOFError: Ran out of input

pickle method descriptor objects in Python 2.7

I have the following problem with method descriptor objects in Python 2.7:

Python 2.7.10 |Continuum Analytics, Inc.| (default, Oct 19 2015, 18:04:42) 
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import cloudpickle, pickle

>>> cloudpickle.dumps(set.union)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 602, in dumps
    cp.dump(obj)
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 111, in dump
    raise pickle.PicklingError(msg)
pickle.PicklingError: Could not pickle object as excessively deep recursion required.

>>> cloudpickle.dumps(str.decode)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 602, in dumps
    cp.dump(obj)
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 111, in dump
    raise pickle.PicklingError(msg)
pickle.PicklingError: Could not pickle object as excessively deep recursion required.

>>> pickle.dumps(set.union)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/pickle.py", line 1374, in dumps
    Pickler(file, protocol).dump(obj)
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/home/pmd/anaconda3/envs/python2/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle method_descriptor objects

>>> def f(x):
...     return set.union(x)
... 
>>> cloudpickle.dumps(f)
'\x80\x02ccloudpickle.cloudpickle\n_fill_function\nq\x00(ccloudpickle.cloudpickle\n_make_skel_func\nq\x01ccloudpickle.cloudpickle\n_builtin_type\nq\x02U\x08CodeTypeq\x03\x85q\x04Rq\x05(K\x01K\x01K\x02KCU\rt\x00\x00j\x01\x00|\x00\x00\x83\x01\x00Sq\x06N\x85q\x07U\x03setq\x08U\x05unionq\t\x86q\nU\x01xq\x0b\x85q\x0cU\x07<stdin>q\rU\x01fq\x0eK\x01U\x02\x00\x01q\x0f))tq\x10Rq\x11]q\x12}q\x13\x87q\x14Rq\x15}q\x16N}q\x17tR.'

>>> def g(x):
...     return str.decode(x)
... 
>>> cloudpickle.dumps(g)
'\x80\x02ccloudpickle.cloudpickle\n_fill_function\nq\x00(ccloudpickle.cloudpickle\n_make_skel_func\nq\x01ccloudpickle.cloudpickle\n_builtin_type\nq\x02U\x08CodeTypeq\x03\x85q\x04Rq\x05(K\x01K\x01K\x02KCU\rt\x00\x00j\x01\x00|\x00\x00\x83\x01\x00Sq\x06N\x85q\x07U\x03strq\x08U\x06decodeq\t\x86q\nU\x01xq\x0b\x85q\x0cU\x07<stdin>q\rU\x01gq\x0eK\x01U\x02\x00\x01q\x0f))tq\x10Rq\x11]q\x12}q\x13\x87q\x14Rq\x15}q\x16N}q\x17tR.'
>>>

Is this a known issue, and is there anything I can do about it so I don't have to wrap the method descritpors with another function? Thanks

Bring in the version of cloudpickle from pyspark

The Spark folks went to the trouble of getting cloudpickle licensed as BSD (from LGPL), and have been improving on it directly within pyspark. Let's bring it over and maintain it as an overall library.

Register on PyPI

The imp package is pending deprecation in favor of importlib

According to the Python documentation for the imp package, "Deprecated since version 3.4: The imp package is pending deprecation in favor of importlib."

cloudpickle.py uses imp.new_module() (https://github.com/cloudpipe/cloudpickle/blob/master/cloudpickle/cloudpickle.py#L925) and imp.find_module() (https://github.com/cloudpipe/cloudpickle/blob/master/cloudpickle/cloudpickle.py#L1075), each of which has an equivalent function in importlib.

Cannot pickle nested function that refers to itself

The following code does not work, and may be a potential bug:

>>> def f():
    def g(): return g
    return g
>>> import cloudpickle; cloudpickle.dumps(f())
Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    import cloudpickle; cloudpickle.dumps(f())
  File "cloudpickle.py", line 629, in dumps
    cp.dump(obj)
  File "cloudpickle.py", line 111, in dump
    raise pickle.PicklingError(msg)
PicklingError: Could not pickle object as excessively deep recursion required.

Is it possible to fix this, or is this a fundamental limitation of closures in Python that cannot be worked around?

Cannot serialize -> deserialize scipy.sparse.dok_matrix

Current behaviour:

>>> import cloudpickle; cloudpickle.__version__
'0.2.2'
>>> from scipy.sparse import dok_matrix
>>> A = dok_matrix((2,2); A
<2x2 sparse matrix of type '<type 'numpy.float64'>'
	with 0 stored elements in Dictionary Of Keys format>
>>> cloudpickle.loads(cloudpickle.dumps(A))
<2x2 sparse matrix of type '<type 'numpy.float64'>'
	with 0 stored elements in Dictionary Of Keys format>
>>> A[0,0] = 1
>>> cloudpickle.loads(cloudpickle.dumps(A))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1388, in loads
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1204, in load_setitem
    dict[key] = value
  File "/usr/lib/python2.7/dist-packages/scipy/sparse/dok.py", line 235, in __setitem__
    if (isintlike(i) and isintlike(j) and 0 <= i < self.shape[0]
  File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 525, in __getattr__
    raise AttributeError(attr + " not found")
AttributeError: shape not found

Expected behaviour: an object of type dok_matrix properly (de)serializes regardless of content. Note that pickle and cPickle both work:

>>> import pickle
>>> pickle.loads(pickle.dumps(A))
<2x2 sparse matrix of type '<type 'numpy.float64'>'
	with 1 stored elements in Dictionary Of Keys format>
>>> import cPickle
>>> cPickle.loads(cPickle.dumps(A))
<2x2 sparse matrix of type '<type 'numpy.float64'>'
	with 1 stored elements in Dictionary Of Keys format>

Loggers can't be pickled

Built-in pickle doesn't work on Logger objects, but cloudpickle could try to be a bit smarter. Upstream issue at https://bugs.python.org/issue30520

uninitialized classmethod object

I have a very simple classmethod example that fails.

import cloudpickle
import pickle

class A(object):
  @classmethod
  def test(cls):
    pass


a = A()

res = cloudpickle.dumps(a)
new_obj = pickle.loads(res)

new_obj.__class__.test()

pickling memoryviews doesn't work

This is on Python 3.5. It seems cloudpickle tries to support memoryviews (the traceback shows a dedicated save_memoryview method, but fails:

>>> import cloudpickle
>>> m = memoryview(b"abc")
>>> cloudpickle.dumps(m)
Traceback (most recent call last):
  File "<ipython-input-3-c69575090534>", line 1, in <module>
    cloudpickle.dumps(m)
  File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 706, in dumps
    cp.dump(obj)
  File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 146, in dump
    return Pickler.dump(self, obj)
  File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/pickle.py", line 408, in dump
    self.save(obj)
  File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/pickle.py", line 475, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 154, in save_memoryview
    Pickler.save_string(self, str(obj))
AttributeError: type object '_Pickler' has no attribute 'save_string'

The `transient` attribute. What is it? Could it be thrown away?

It seems it is not a standard Python dunder attribute. For what I have seen it is used to exclude some attributes from __dict__ before serialization.

Why is this used instead of __getstate__?

It seems in save_inst() you are actually trying to use __getstate__ first if it exists, and only if it does not, then you look for __transient__. However, in save_reduce() you are always directly trying to look for this attribute (if protocol version is >= 2). Is this necessary? Couldn't __getstate__ tried to be used as well first?

Cross reference: irmen/Pyro4#179

Cannot pickle lock objects

Encountering issues trying to pickle lock objects. Not sure if this is something that should be permissible or not. Seems cloudpickle just falls back to pickle in this case. Traceback shown below.

>>> import threading
>>> import cloudpickle
>>> l = threading.Lock()
>>> cloudpickle.pickle.dumps(l)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/zopt/conda2/envs/pickle_test/lib/python2.7/pickle.py", line 1380, in dumps
    Pickler(file, protocol).dump(obj)
  File "/zopt/conda2/envs/pickle_test/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/zopt/conda2/envs/pickle_test/lib/python2.7/pickle.py", line 306, in save
    rv = reduce(self.proto)
  File "/zopt/conda2/envs/pickle_test/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
    raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle lock objects

name: pickle_test
channels: !!python/tuple
- !!python/unicode
  'conda-forge'
- !!python/unicode
  'defaults'
dependencies:
- conda-forge::ca-certificates=2017.1.23=0
- conda-forge::cloudpickle=0.2.2=py27_2
- conda-forge::dill=0.2.6=py27_0
- conda-forge::ncurses=5.9=10
- conda-forge::openssl=1.0.2h=3
- conda-forge::python=2.7.12=2
- conda-forge::readline=6.2=0
- conda-forge::sqlite=3.13.0=1
- conda-forge::tk=8.5.19=1
- conda-forge::zlib=1.2.11=0
prefix: /zopt/conda2/envs/pickle_test

AttributeError when pickled function uses submodules

This problem is when a function refers (by attribute) to a sub-module of a package. Cloudpickle appears to pickle functions not by name, but by code plus (a subset of) globals. So the parent package is injected into the pickle, but the sub-module is not.

def func():
    # import unittest.mock
    x = unittest.TestCase
    x = unittest.mock.Mock
import unittest.mock

import cloudpickle as pickle
s = pickle.dumps(func)

del unittest
import sys
del sys.modules['unittest']
del sys.modules['unittest.mock']

f = pickle.loads(s)
# import unittest.mock as anything
f()
AttributeError: module 'unittest' has no attribute 'mock'

This leads to non-intuitive bugs in applications such as cluster computing (e.g. with dask.distributed).
Workarounds:

perform imports inside functions (contrary to PEP8).
import sub-modules (or their contents) as globals.
ensure (somehow) that all relevant sub-modules are automatically imported by respective parent packages (e.g. __init__.py).
arrange for the unpickling process to have already done an import of the sub-module (e.g. by first uncloudpickling something else that did refer to the sub-module as a global).

I assume cloudpickle checks whether a global is an imported module, and if so then stores the name (rather than pickling its attributes). Is it practical to also check (via sys.modules.keys()) which sub-modules had previously been imported, and ensure every such module is subsequently initialised?

Cannot pickle Ellipsis objects

Appears that I cannot pickle an Ellipsis object, but I can pickle slices. It would be nice to have support for pickling Ellipsis. FWIW, this is solved by dill.

>>> import cloudpickle
>>> cloudpickle.dumps(Ellipsis)
    ---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-cc84d30bf9cd> in <module>()
----> 1 cloudpickle.dumps(Ellipsis)

/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dumps(obj, protocol)
    600 
    601     cp = CloudPickler(file,protocol)
--> 602     cp.dump(obj)
    603 
    604     return file.getvalue()

/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dump(self, obj)
    105         self.inject_addons()
    106         try:
--> 107             return Pickler.dump(self, obj)
    108         except RuntimeError as e:
    109             if 'recursion' in e.args[0]:

/zopt/conda/envs/nanshenv/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

/zopt/conda/envs/nanshenv/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

TypeError: can't pickle ellipsis objects

Fail to (un)pickle class with abc.ABCMeta metaclass (python 2.7)

Hello!

The current code from the master branch fails to work with classes that have the abc.ABCMeta metaclass.

MWE:

class Q(object):
    __metaclass__ = abc.ABCMeta

q = Q()
cloudpickle.loads(cloudpickle.dumps(q))

With python 2.7.3 this yields:

object.__new__(getset_descriptor) is not safe, use getset_descriptor.__new__()

With python 2.7.13:

TypeError: can't pickle wrapper_descriptor objects

Enums

This fails

import enum
class MyEnum(enum.Enum):
     SPAM = 'SPAM'

import cloudpickle
cloudpickle.dumps(MyEnum.SPAM)

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dump(self, obj)
    145         try:
--> 146             return Pickler.dump(self, obj)
    147         except RuntimeError as e:

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in dump(self, obj)
    408             self.framer.start_framing()
--> 409         self.save(obj)
    410         self.write(STOP)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    520         # Save the reduce() output and finally memoize the object
--> 521         self.save_reduce(obj=obj, *rv)
    522 

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    584         else:
--> 585             save(func)
    586             save(args)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    489             if issc:
--> 490                 self.save_global(obj)
    491                 return

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_global(self, obj, name, pack)
    424 
--> 425             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    426         else:

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    585             save(func)
--> 586             save(args)
    587             write(pickle.REDUCE)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_tuple(self, obj)
    735             for element in obj:
--> 736                 save(element)
    737             # Subtle.  Same as in the big comment below.

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
    820         self.memoize(obj)
--> 821         self._batch_setitems(obj.items())
    822 

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
    846                     save(k)
--> 847                     save(v)
    848                 write(SETITEMS)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    520         # Save the reduce() output and finally memoize the object
--> 521         self.save_reduce(obj=obj, *rv)
    522 

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    600         if dictitems is not None:
--> 601             self._batch_setitems(dictitems)
    602 

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
    851                 save(k)
--> 852                 save(v)
    853                 write(SETITEM)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    520         # Save the reduce() output and finally memoize the object
--> 521         self.save_reduce(obj=obj, *rv)
    522 

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    584         else:
--> 585             save(func)
    586             save(args)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    489             if issc:
--> 490                 self.save_global(obj)
    491                 return

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_global(self, obj, name, pack)
    424 
--> 425             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    426         else:

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    585             save(func)
--> 586             save(args)
    587             write(pickle.REDUCE)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_tuple(self, obj)
    735             for element in obj:
--> 736                 save(element)
    737             # Subtle.  Same as in the big comment below.

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
    820         self.memoize(obj)
--> 821         self._batch_setitems(obj.items())
    822 

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
    846                     save(k)
--> 847                     save(v)
    848                 write(SETITEMS)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
    820         self.memoize(obj)
--> 821         self._batch_setitems(obj.items())
    822 

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
    851                 save(k)
--> 852                 save(v)
    853                 write(SETITEM)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    520         # Save the reduce() output and finally memoize the object
--> 521         self.save_reduce(obj=obj, *rv)
    522 

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    584         else:
--> 585             save(func)
    586             save(args)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    489             if issc:
--> 490                 self.save_global(obj)
    491                 return

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_global(self, obj, name, pack)
    424 
--> 425             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    426         else:

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
    585             save(func)
--> 586             save(args)
    587             write(pickle.REDUCE)

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_tuple(self, obj)
    735             for element in obj:
--> 736                 save(element)
    737             # Subtle.  Same as in the big comment below.

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    475         if f is not None:
--> 476             f(self, obj) # Call unbound method with explicit self
    477             return

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
    820         self.memoize(obj)
--> 821         self._batch_setitems(obj.items())
    822 

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
    846                     save(k)
--> 847                     save(v)
    848                 write(SETITEMS)

... last 10 frames repeated, from the frame below ...

/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
    520         # Save the reduce() output and finally memoize the object
--> 521         self.save_reduce(obj=obj, *rv)
    522 

RecursionError: maximum recursion depth exceeded

During handling of the above exception, another exception occurred:

PicklingError                             Traceback (most recent call last)
<ipython-input-3-d64a2267ce31> in <module>()
----> 1 cloudpickle.dumps(MyEnum.SPAM)

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dumps(obj, protocol)
    704 
    705     cp = CloudPickler(file,protocol)
--> 706     cp.dump(obj)
    707 
    708     return file.getvalue()

/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dump(self, obj)
    148             if 'recursion' in e.args[0]:
    149                 msg = """Could not pickle object as excessively deep recursion required."""
--> 150                 raise pickle.PicklingError(msg)
    151 
    152     def save_memoryview(self, obj):

PicklingError: Could not pickle object as excessively deep recursion required.

Originally reported here: dask/distributed#1178 by @AndrewPashkin

PicklingError: Can't pickle <function f at ...>: it's not the same object as main.f

Can anyone explain me why the following code isn't working?

from cloudpickle import pickle
namespace = {}
exec('def f(x): return x', namespace)
pickle.dumps(namespace['f'])

And if this is the expected behavior, you would make me very happy with a solution that uses exec('def f(x): return x', namespace) and results in a serializable function f and I prefer not to use globals()

(doc) PiCloud links in comments are dead

cloudpickle/cloudpickle/cloudpickle.py

Line 12 in 62027de

<http://www.picloud.com>`_.

str.upper is not picklable under PyPy3

Report from the failing test ran with the PyPy3 environment configured by tox (not available on travis-ci):

__________________________________________________________________________________ CloudPickleTest.test_method_descriptors __________________________________________________________________________________

self = <tests.cloudpickle_test.CloudPickleTest testMethod=test_method_descriptors>

    def test_method_descriptors(self):
>       f = pickle_depickle(str.upper)

tests/cloudpickle_test.py:241: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/cloudpickle_test.py:37: in pickle_depickle
    return pickle.loads(cloudpickle.dumps(obj))
cloudpickle/cloudpickle.py:605: in dumps
    cp.dump(obj)
cloudpickle/cloudpickle.py:107: in dump
    return Pickler.dump(self, obj)
../../opt/pypy3/lib-python/3/pickle.py:237: in dump
    self.save(obj)
../../opt/pypy3/lib-python/3/pickle.py:299: in save
    f(self, obj) # Call unbound method with explicit self
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <cloudpickle.cloudpickle.CloudPickler object at 0x0000000004568560>, obj = <function upper at 0x0000000001a32fc0>, name = 'upper'

    def save_function(self, obj, name=None):
        """ Registered with the dispatch to handle all function types.

            Determines what kind of function obj is (e.g. lambda, defined at
            interactive prompt, etc) and handles the pickling appropriately.
            """
        write = self.write

        if name is None:
            name = obj.__name__
        modname = pickle.whichmodule(obj, name)
        # print('which gives %s %s %s' % (modname, obj, name))
        try:
            themodule = sys.modules[modname]
        except KeyError:
            # eval'd items such as namedtuple give invalid items for their function __module__
            modname = '__main__'

        if modname == '__main__':
            themodule = None

        if themodule:
            self.modules.add(themodule)
            if getattr(themodule, name, None) is obj:
                return self.save_global(obj, name)

        # if func is lambda, def'ed at prompt, is in main, or is nested, then
        # we'll pickle the actual function object rather than simply saving a
        # reference (as is done in default pickler), via save_function_tuple.
>       if islambda(obj) or obj.__code__.co_filename == '<stdin>' or themodule is None:
E       AttributeError: 'builtin-code' object has no attribute 'co_filename'

Fail to unpickle namedtuple

Hello!

The current upstream code can't handle namedtuples.

Here's an MWE:

import cloudpickle
from collections import namedtuple

X = namedtuple('X', ['a'])

cloudpickle.loads(cloudpickle.dumps(X))

Traceback:

Traceback (most recent call last):
  File "t.py", line 6, in <module>
    cloudpickle.loads(cloudpickle.dumps(X))
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1388, in loads
    return Unpickler(file).load()
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 864, in load
    dispatch[key](self)
  File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1139, in load_reduce
    value = func(*args)
  File "/Users/amatanhead/Documents/cloudpickle/cloudpickle/cloudpickle.py", line 1043, in _rehydrate_skeleton_class
    setattr(skeleton_class, attrname, attr)
AttributeError: attribute '__dict__' of 'type' objects is not writable

Pickling list or sets performs poorly

I'm using dask and notices that cloudpickles performs very slow when pickling bigger lists or sets. Why is it so much slower and is there a way to avoid this?

In [1]: import cloudpickle

In [2]: import pickle

In [3]: data = set(range(100000))

In [4]: %%time
   ...: silent = pickle.dumps(data)
   ...: 
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 3.34 ms

In [5]: %%time
   ...: silent = cloudpickle.dumps(data)
   ...: 
CPU times: user 200 ms, sys: 0 ns, total: 200 ms
Wall time: 197 ms

In [6]: %%time
   ...: silent = pickle.dumps(list(data))
   ...: 
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 3.64 ms

In [7]: %%time
   ...: silent = cloudpickle.dumps(list(data))
   ...: 
CPU times: user 192 ms, sys: 0 ns, total: 192 ms
Wall time: 194 ms

Serializing modules can be slow

Here is an analysis from a colleague:

Quote

The speed-up for us seems to be coming from the fact that pickling modules takes a long time:

In [25]: %timeit cloudpickle.dumps(numpy, -1)
100 loops, best of 3: 3.03 ms per loop

It looks like _find_module() will use imp.find_module() which traverses sys.path to look for things that look like numpy. In our environment, sys.path tends to be long and our filesystems tend to be slow, hence the 3.03 ms.

    def save_module(self, obj):
        """
        Save a module as an import
        """
        mod_name = obj.__name__
        # If module is successfully found then it is not a dynamically created module
        try:
            _find_module(mod_name)     # EXPENSIVE!!!!!
            is_dynamic = False
        except ImportError:
            is_dynamic = True

        self.modules.add(obj)
        if is_dynamic:
            self.save_reduce(dynamic_subimport, (obj.__name__, vars(obj)), obj=obj)
        else:
            self.save_reduce(subimport, (obj.__name__,), obj=obj)
    dispatch[types.ModuleType] = save_module

So it looks like cloudpickle is trying to allow for "dynamically created modules". If it didn't try to be this flexible, then the entire function should just be

self.save_reduce(subimport, (obj.__name__,), obj=obj)

So the danger is if people are using "dynamically created modules", which we don't tend to do.

Maybe an easy way out is to check if obj.__file__ exists (the attribute, not the file). If it does, then immediately assume that is_dynamic=False.

Fwiw, I think we're pickling numpy because we're pickling functions that refer to numpy. Not positive though.

Cloud Pickle cannot serialize some objects.

I use map to execute some code.

############## Testing of IPyrallel on DEAP  ###################################
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMin)

toolbox = base.Toolbox()

**#Using Parallell Processing
import ipyparallel as ipp,  time
rc= ipp.Client()
# pool = rc.load_balanced_view()
rc[:].use_cloudpickle()
pool= rc[:]
toolbox.register("map", pool.map)**


toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)


def evalSymbReg(individual, points):
    func = toolbox.compile(expr=individual)   # Transform the tree expression in a callable function
    # and the real function : x**4 + x**3 + x**2 + x
    sqerrors = ((func(x) - x**4 - x**3 - x**2 - x)**2 for x in points)
    return math.fsum(sqerrors) / len(points),


toolbox.register("evaluate", evalSymbReg, points=[x/10. for x in range(-10,10)])
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)

toolbox.decorate("mate", gp.staticLimit(key=operator.attrgetter("height"), max_value=17))
toolbox.decorate("mutate", gp.staticLimit(key=operator.attrgetter("height"), max_value=17))

def main():
    random.seed(318)

    pop = toolbox.population(n=300)
    hof = tools.HallOfFame(1)
    
    stats_fit = tools.Statistics(lambda ind: ind.fitness.values)
    stats_size = tools.Statistics(len)
    mstats = tools.MultiStatistics(fitness=stats_fit, size=stats_size)
    mstats.register("avg", np.mean)
    mstats.register("std", np.std)
    mstats.register("min", np.min)
    mstats.register("max", np.max)

    pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 40, stats=mstats,
                                   halloffame=hof, verbose=True)
    # print log
    return pop, log, hof

I got this error :

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-978da9be5b87> in <module>()
      1 if __name__ == "__main__":
----> 2     pop, log, hof= main()

<ipython-input-10-6ff69ab06682> in main()
     15 
     16     pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 40, stats=mstats,
---> 17                                    halloffame=hof, verbose=True)
     18     # print log
     19     return pop, log, hof

D:\_devs\Python01\Anaconda27\lib\site-packages\deap\algorithms.pyc in eaSimple(population, toolbox, cxpb, mutpb, ngen, stats, halloffame, verbose)
    145     # Evaluate the individuals with an invalid fitness
    146     invalid_ind = [ind for ind in population if not ind.fitness.valid]
--> 147     fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
    148     for ind, fit in zip(invalid_ind, fitnesses):
    149         ind.fitness.values = fit

<decorator-gen-141> in map(self, f, *sequences, **kwargs)

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in sync_results(f, self, *args, **kwargs)
     48     self._in_sync_results = True
     49     try:
---> 50         ret = f(self, *args, **kwargs)
     51     finally:
     52         self._in_sync_results = False

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in map(self, f, *sequences, **kwargs)
    613         assert len(sequences) > 0, "must have some sequences to map onto!"
    614         pf = ParallelFunction(self, f, block=block, **kwargs)
--> 615         return pf.map(*sequences)
    616 
    617     @sync_results

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\remotefunction.pyc in map(self, *sequences)
    283         and mismatched sequence lengths will be padded with None.
    284         """
--> 285         return self(*sequences, __ipp_mapping=True)
    286 
    287 __all__ = ['remote', 'parallel', 'RemoteFunction', 'ParallelFunction']

<decorator-gen-131> in __call__(self, *sequences, **kwargs)

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\remotefunction.pyc in sync_view_results(f, self, *args, **kwargs)
     74     view = self.view
     75     if view._in_sync_results:
---> 76         return f(self, *args, **kwargs)
     77     view._in_sync_results = True
     78     try:

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\remotefunction.pyc in __call__(self, *sequences, **kwargs)
    257             view = self.view if balanced else client[t]
    258             with view.temp_flags(block=False, **self.flags):
--> 259                 ar = view.apply(f, *args)
    260                 ar.owner = False
    261 

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in apply(self, f, *args, **kwargs)
    209         ``f(*args, **kwargs)``.
    210         """
--> 211         return self._really_apply(f, args, kwargs)
    212 
    213     def apply_async(self, f, *args, **kwargs):

<decorator-gen-140> in _really_apply(self, f, args, kwargs, targets, block, track)

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in sync_results(f, self, *args, **kwargs)
     48     self._in_sync_results = True
     49     try:
---> 50         ret = f(self, *args, **kwargs)
     51     finally:
     52         self._in_sync_results = False

<decorator-gen-139> in _really_apply(self, f, args, kwargs, targets, block, track)

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in save_ids(f, self, *args, **kwargs)
     33     n_previous = len(self.client.history)
     34     try:
---> 35         ret = f(self, *args, **kwargs)
     36     finally:
     37         nmsgs = len(self.client.history) - n_previous

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
    555         for ident in _idents:
    556             future = self.client.send_apply_request(self._socket, f, args, kwargs, track=track,
--> 557                                     ident=ident)
    558             futures.append(future)
    559         if track:

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\client.pyc in send_apply_request(self, socket, f, args, kwargs, metadata, track, ident)
   1387         bufs = serialize.pack_apply_message(f, args, kwargs,
   1388             buffer_threshold=self.session.buffer_threshold,
-> 1389             item_threshold=self.session.item_threshold,
   1390         )
   1391 

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\serialize\serialize.pyc in pack_apply_message(f, args, kwargs, buffer_threshold, item_threshold)
    164 
    165     arg_bufs = list(chain.from_iterable(
--> 166         serialize_object(arg, buffer_threshold, item_threshold) for arg in args))
    167 
    168     kw_keys = sorted(kwargs.keys())

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\serialize\serialize.pyc in <genexpr>((arg,))
    164 
    165     arg_bufs = list(chain.from_iterable(
--> 166         serialize_object(arg, buffer_threshold, item_threshold) for arg in args))
    167 
    168     kw_keys = sorted(kwargs.keys())

D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\serialize\serialize.pyc in serialize_object(obj, buffer_threshold, item_threshold)
    110         buffers.extend(_extract_buffers(cobj, buffer_threshold))
    111 
--> 112     buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
    113     return buffers
    114 

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in dumps(obj, protocol)
    627 
    628     cp = CloudPickler(file,protocol)
--> 629     cp.dump(obj)
    630 
    631     return file.getvalue()

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in dump(self, obj)
    105         self.inject_addons()
    106         try:
--> 107             return Pickler.dump(self, obj)
    108         except RuntimeError as e:
    109             if 'recursion' in e.args[0]:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    527         else:
    528             save(func)
--> 529             save(args)
    530             write(pickle.REDUCE)
    531 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_function(self, obj, name)
    203                 or getattr(obj.__code__, 'co_filename', None) == '<stdin>'
    204                 or themodule is None):
--> 205             self.save_function_tuple(obj)
    206             return
    207         else:

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_function_tuple(self, func)
    251 
    252         # save the rest of the func data needed by _fill_function
--> 253         save(f_globals)
    254         save(defaults)
    255         save(dct)

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    545 
    546         if state is not None:
--> 547             save(state)
    548             write(pickle.BUILD)
    549 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    545 
    546         if state is not None:
--> 547             save(state)
    548             write(pickle.BUILD)
    549 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_tuple(self, obj)
    566         write(MARK)
    567         for element in obj:
--> 568             save(element)
    569 
    570         if id(obj) in memo:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    545 
    546         if state is not None:
--> 547             save(state)
    548             write(pickle.BUILD)
    549 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
    684                 write(MARK)
    685                 for k, v in tmp:
--> 686                     save(k)
    687                     save(v)
    688                 write(SETITEMS)

D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

TypeError: can't pickle member_descriptor objects

Memoize save_function

I often need to serialize many small objects containing many python functions.

In [1]: def inc(x):
    return x + 1
   ...: 

In [2]: d = {i: (inc, i) for i in range(10000)}

Sometimes I do this all at once; this works great.

In [3]: from cloudpickle import dumps, loads

In [4]: %time len(dumps(d))
CPU times: user 118 ms, sys: 0 ns, total: 118 ms
Wall time: 117 ms

But sometimes I do this in several small batches, which is much slower.

In [5]: %time len([dumps(item) for item in d.items()])
CPU times: user 2.7 s, sys: 3.93 ms, total: 2.7 s
Wall time: 2.71 s

A quick profile shows that the majority of time is spent in save_function

In [7]: %prun -s cumtime len([dumps(item) for item in d.items()])
   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    4.782    4.782 {built-in method exec}
        1    0.001    0.001    4.782    4.782 <string>:1(<module>)
        1    0.038    0.038    4.782    4.782 <string>:1(<listcomp>)
    10000    0.025    0.000    4.744    0.000 cloudpickle.py:598(dumps)
    10000    0.011    0.000    4.658    0.000 cloudpickle.py:104(dump)
    10000    0.030    0.000    4.646    0.000 pickle.py:401(dump)
450000/10000    0.894    0.000    4.597    0.000 pickle.py:460(save)
120000/10000    0.287    0.000    4.568    0.000 pickle.py:716(save_tuple)
50000/10000    0.115    0.000    4.296    0.000 cloudpickle.py:162(save_function)
    10000    0.053    0.000    4.254    0.000 cloudpickle.py:214(save_function_tuple)
    10000    0.020    0.000    2.834    0.000 cloudpickle.py:142(save_codeobject)
40000/10000    0.120    0.000    2.814    0.000 cloudpickle.py:470(save_reduce)
50000/40000    0.117    0.000    1.285    0.000 cloudpickle.py:318(save_global)
    20000    0.039    0.000    1.058    0.000 pickle.py:680(save_bytes)
   290000    0.519    0.000    1.044    0.000 pickle.py:416(memoize)
    40000    0.257    0.000    0.716    0.000 pickle.py:898(save_global)
    70000    0.147    0.000    0.479    0.000 pickle.py:698(save_str)
   800000    0.270    0.000    0.392    0.000 pickle.py:212(write)

And so I'm tempted to memoize save_function between dumps calls. Presumably with some sort of LRU mechanism, keying by object identity. This is unsafe if functions mutate in any way. I've never run into such a situation but I'm unsure if it's done elsewhere.

On looking into cloudpickle more deeply, it appears that Pickler has a caching mechanism within it. Does anyone have experience with these memo objects? I would need to clear out non-function elements from the cache between calls.

I'm happy to do the work here if we are able to agree on a good solution.

Speed considerations

Hey cloudpipe team!

I'm doing an exploratory analysis for the gensim library to potentially use cloudpickle (here's the discussion), and noticed that 'regular' cloudpickle is consistently ~8x slower than python's pickle module for pretty much all the data structures I threw at it.

Is this the expected normal behavior, or am I doing something wrong in my tests? I'm using python2.7/3.4 on windows, without C-compilers (not using the optimized versions if there are any),

Would you guys have any ideas if we could modify the module selectively for certain tasks to improve performance on the most-used features?

Failing tests in 1.1 for python 3.x

Hi. I am trying to package cloudpickle for openSUSE, with unit tests to make sure everything is working properly. The unit tests works fine for python 2.x (2.6 and 2.7), but the unit tests fail for all versions of Python 3.x (3.3, 3.4 and 3.5). The packages are identical besides being python 2.x or python 3.x. All stated dependencies are included, and as near as I can tell my test invocation shouldn't have any substantial differences to how your travis tests are invoked. Here are the failures (this is for python 3.4, but identical failures occur in 3.3 and 3.5):

======================================================================
ERROR: test_pickling_special_file_handles (tests.cloudpickle_file_test.CloudPickleFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/tests/cloudpickle_file_test.py", line 102, in test_pickling_special_file_handles
    self.assertEquals(out, pickle.loads(cloudpickle.dumps(out)))
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 602, in dumps
    cp.dump(obj)
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib64/python3.4/pickle.py", line 412, in dump
    self.save(obj)
  File "/usr/lib64/python3.4/pickle.py", line 479, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 548, in save_file
    raise pickle.PicklingError("Cannot pickle files that do not map to an actual file")
_pickle.PicklingError: Cannot pickle files that do not map to an actual file

======================================================================
ERROR: test_temp_file (tests.cloudpickle_file_test.CloudPickleFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/tests/cloudpickle_file_test.py", line 96, in test_temp_file
    newfile = pickle.loads(cloudpickle.dumps(f))
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 602, in dumps
    cp.dump(obj)
  File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib64/python3.4/pickle.py", line 412, in dump
    self.save(obj)
  File "/usr/lib64/python3.4/pickle.py", line 499, in save
    rv = reduce(self.proto)
TypeError: cannot serialize '_io.BufferedRandom' object

Make a release

Hey @ogrisel, @mrocklin, @pitrou!

We should make a release. I've made the last few, I'd love to have someone else take the reigns on shipping. Since I'm no longer using this package directly, I don't have much vested interest in getting this shipped (other than as a user of dask and pyspark).

@pitrou - what is your username on PyPI?

Cannot pickle certain type hints

> python --version
Python 2.7.11 :: Anaconda 4.0.0 (x86_64)

Also

>>> cloudpickle.__version__
'0.1.1'

Hi, I'm trying to pickle some stuff in the typing module. I'm curious if there are fundamental limitations here or if this is out of scope for cloudpickle. Thanks for your help!

from typing import List, Callable
from cloudpickle import loads, dumps

This works.

>>> List
typing.List<~T>

>>> loads(dumps(List))
typing.List<~T>

This seems to lose some information.

>>> Callable[[int, str], float]
typing.Callable[[int, str], float]

>>> loads(dumps(Callable[[int, str], float]))
typing.Callable

This doesn't work

>>> dumps(List[int])
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-15-f02da844db1c> in <module>()
----> 1 dumps(List[int])

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dumps(obj, protocol)
    600 
    601     cp = CloudPickler(file,protocol)
--> 602     cp.dump(obj)
    603 
    604     return file.getvalue()

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dump(self, obj)
    105         self.inject_addons()
    106         try:
--> 107             return Pickler.dump(self, obj)
    108         except RuntimeError as e:
    109             if 'recursion' in e.args[0]:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in dump(self, obj)
    222         if self.proto >= 2:
    223             self.write(PROTO + chr(self.proto))
--> 224         self.save(obj)
    225         self.write(STOP)
    226 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    298                 issc = 0
    299             if issc:
--> 300                 self.save_global(obj)
    301                 return
    302 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
    351                 d['__new__'] = obj.__new__
    352 
--> 353             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    354         else:
    355             raise pickle.PicklingError("Can't pickle %r" % obj)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    509         else:
    510             save(func)
--> 511             save(args)
    512             write(pickle.REDUCE)
    513 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    298                 issc = 0
    299             if issc:
--> 300                 self.save_global(obj)
    301                 return
    302 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
    351                 d['__new__'] = obj.__new__
    352 
--> 353             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    354         else:
    355             raise pickle.PicklingError("Can't pickle %r" % obj)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    509         else:
    510             save(func)
--> 511             save(args)
    512             write(pickle.REDUCE)
    513 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    298                 issc = 0
    299             if issc:
--> 300                 self.save_global(obj)
    301                 return
    302 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
    351                 d['__new__'] = obj.__new__
    352 
--> 353             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    354         else:
    355             raise pickle.PicklingError("Can't pickle %r" % obj)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    509         else:
    510             save(func)
--> 511             save(args)
    512             write(pickle.REDUCE)
    513 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    566         write(MARK)
    567         for element in obj:
--> 568             save(element)
    569 
    570         if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    298                 issc = 0
    299             if issc:
--> 300                 self.save_global(obj)
    301                 return
    302 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
    351                 d['__new__'] = obj.__new__
    352 
--> 353             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    354         else:
    355             raise pickle.PicklingError("Can't pickle %r" % obj)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    509         else:
    510             save(func)
--> 511             save(args)
    512             write(pickle.REDUCE)
    513 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    298                 issc = 0
    299             if issc:
--> 300                 self.save_global(obj)
    301                 return
    302 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
    351                 d['__new__'] = obj.__new__
    352 
--> 353             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    354         else:
    355             raise pickle.PicklingError("Can't pickle %r" % obj)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    509         else:
    510             save(func)
--> 511             save(args)
    512             write(pickle.REDUCE)
    513 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    527 
    528         if state is not None:
--> 529             save(state)
    530             write(pickle.BUILD)
    531 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_function(self, obj, name)
    197             klass = getattr(themodule, name, None)
    198             if klass is None or klass is not obj:
--> 199                 self.save_function_tuple(obj)
    200                 return
    201 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_function_tuple(self, func)
    240         # save the rest of the func data needed by _fill_function
    241         save(f_globals)
--> 242         save(defaults)
    243         save(dct)
    244         write(pickle.TUPLE)

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    329 
    330         # Save the reduce() output and finally memoize the object
--> 331         self.save_reduce(obj=obj, *rv)
    332 
    333     def persistent_id(self, obj):

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    494                     "args[0] from __newobj__ args has the wrong class")
    495             args = args[1:]
--> 496             save(cls)
    497 
    498             #Don't pickle transient entries

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
    351                 d['__new__'] = obj.__new__
    352 
--> 353             self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
    354         else:
    355             raise pickle.PicklingError("Can't pickle %r" % obj)

/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
    509         else:
    510             save(func)
--> 511             save(args)
    512             write(pickle.REDUCE)
    513 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
    552         if n <= 3 and proto >= 2:
    553             for element in obj:
--> 554                 save(element)
    555             # Subtle.  Same as in the big comment below.
    556             if id(obj) in memo:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    284         f = self.dispatch.get(t)
    285         if f:
--> 286             f(self, obj) # Call unbound method with explicit self
    287             return
    288 

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_dict(self, obj)
    653 
    654         self.memoize(obj)
--> 655         self._batch_setitems(obj.iteritems())
    656 
    657     dispatch[DictionaryType] = save_dict

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
    685                 for k, v in tmp:
    686                     save(k)
--> 687                     save(v)
    688                 write(SETITEMS)
    689             elif n:

/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
    304             reduce = getattr(obj, "__reduce_ex__", None)
    305             if reduce:
--> 306                 rv = reduce(self.proto)
    307             else:
    308                 reduce = getattr(obj, "__reduce__", None)

TypeError: can't pickle wrapper_descriptor objects

Regression when pickling closures

This is on git master:

>>> import cloudpickle
>>> def f():
...:    s = {1,2}
...:    def g():
...:        return len(s)
...:    return g
...:
>>> g = f()
>>> g
<function __main__.f.<locals>.g>
>>> cloudpickle.dumps(g)
Traceback (most recent call last):
  File "<ipython-input-5-3faa44bc74aa>", line 1, in <module>
    cloudpickle.dumps(g)
  File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 734, in dumps
    cp.dump(obj)
  File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 146, in dump
    return Pickler.dump(self, obj)
  File "/home/antoine/miniconda3/envs/dask36/lib/python3.6/pickle.py", line 409, in dump
    self.save(obj)
  File "/home/antoine/miniconda3/envs/dask36/lib/python3.6/pickle.py", line 476, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 267, in save_function
    self.save_function_tuple(obj)
  File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 336, in save_function_tuple
    self._save_subimports(code, set(f_globals.values()) | set(closure))
TypeError: unhashable type: 'set'

cell_count is sometimes a list?

I'm getting the following error within the dask/distributed test suite on newer versions of cloudpickle. Sorry for the lack of a clean reproducible test case. This only appears to happen in odd situations. Hopefully the error message is somewhat informative.

>   if cell_count >= 0 else
    None
        )
E       TypeError: '>=' not supported between instances of 'list' and 'int'

cc @llllllllll

Issue deserializing scikit learn pipeline on Docker using cloudpickle

#Hi,

I had serialized a pipeline using cloudpickle. When I try to load it inside a docker container, I get the below error;

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/pickle.py", line 1378, in load
    return Unpickler(file).load()
  File "/usr/lib/python2.7/pickle.py", line 858, in load
    dispatch[key](self)
  File "/usr/lib/python2.7/pickle.py", line 1246, in load_build
    for k, v in slotstate.items():
AttributeError: 'mtrand.RandomState' object has no attribute 'items'

Load seems to work on the same environment / host. It doesnt load in docker container running python 2.7.

I am using python2.7 , cloudpickle==0.2.2

Fails on str.format

In [1]: import cloudpickle

In [2]: cloudpickle.loads(cloudpickle.loads(str.format))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-610f6140b8f8> in <module>()
----> 1 cloudpickle.loads(cloudpickle.loads(str.format))

TypeError: 'method_descriptor' does not support the buffer interface

Pickle function with empty nonlocal

The following gives me an error. It's not a case I actually want to support, but it did take a while to figure out this was making the serialization fail.

 def test_empty_nonlocal(self):

     if False:
         bar = 100

     def foo():
         return 1 + bar or 0

     data = cloudpickle.dumps(foo)

======================================================================
ERROR: test_empty_nonlocal (tests.cloudpickle_test.CloudPickleTest)

Traceback (most recent call last):
File "/home/jlewis/workspace/cloudpickle/tests/cloudpickle_test.py", line 371, in test_empty_nonlocal
data = cloudpickle.dumps(foo)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/Users/jlewis/anaconda/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/Users/jlewis/anaconda/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 270, in save_function
self.save_function_tuple(obj)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 305, in save_function_tuple
code, f_globals, defaults, closure, dct, base_globals = self.extract_func_data(func)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 374, in extract_func_data
closure = [c.cell_contents for c in func.closure] if func.closure else []
ValueError: Cell is empty

Restore copyright info of original authors

The current LICENSE does not mention the copyright of the original authors of the module. It should be fixed.

Adding pickle.load and pickle.loads into cloudpickle namespace

I understand that you do not include an unpickler:

It does not include an unpickler, as standard python unpickling suffices.

However it would be convenient if injecting the pickle load() and loads() methods into your name space, thus it can be used as a drop in.

You have import pickle surely it's almost a simple as:

    load = pickle.load
    loads = pickle.loads

If I am honest, I am a little inconvenient to have to write:

    try:
        import cPickle as pickle
    except ImportError:
        import pickle

    import cloudpickle

when I need to read and write pickled files/objects etc., and have to remember to use cloudpickle.dump[s]() and pickle.load[s]().

Just having those functions in your namespace, just redirecting the work to pickle itself would make code cleaner, can therefore can be used as a drop in replacement:

    import cloudpickle as pickle

This is common practise with an alternative pickler dill:

    import dill as pickle

dill is great, but I have found that cloudpickle can pickle something I need that dill cannot.

I wouldn't mind even doing it myself if need be; however rather than doing the work (which may not be as simple as stated above), creating a pull request, and it being rejected because you do not want this feature; I thought I would ask first.

Is this something that you would be interested in doing yourselves, or accepting an pull request for? or am I barking up the wrong tree?

Supported Python versions

cloudpickle currently tries to support Python 2.6 and 3.3. Does anyone still need support for those older Python versions, or can we drop them?

Fails on memoryviews

The current implementation treats memoryview objects as strings. This both fails in Python 3 because the Pickler.save_string method does not exist (see Pickler.save_bytes instead), and because memoryviews are significantly more complex than their Python buffer cousins.

Python 3 support

Seems like this was started in depth upstream within pyspark, but we should bring it over.

ImportError: No module named '_pandasujson'

As I mentioned in #80, a change introduced after 0.2.2 breaks a previously working serialized function call that relied on importing pandas. My suspicion is that serialization of functions that utilize pandas may be broken in recent versions of cloudpickle.

Full backtrace from Dask/prep.py:

Traceback (most recent call last):
  File "prep.py", line 64, in <module>
    dask.compute(values)
  File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/base.py", line 204, in compute
    results = get(dsk, keys, **kwargs)
  File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/multiprocessing.py", line 177, in get
    raise_exception=reraise, **kwargs)
  File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/local.py", line 521, in get_async
    raise_exception(exc, tb)
  File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/compatibility.py", line 59, in reraise
    raise exc.with_traceback(tb)
  File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/local.py", line 289, in execute_task
    task, data = loads(task_info)
  File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 840, in subimport
    __import__(name)
ImportError: No module named '_pandasujson'

Unable to serialize function that instantiates a subclass

Using 0.3.1 the following code

import cloudpickle

class Base:
    def __init__(self, field1):
        self.field1 = field1

class Child(Base):
    def __init__(self, field2, field1):
        super().__init__(field1)
        self.field2 = field2

def test_function():
    _ = Child('field-2-value', 'field-1-value')

_ = cloudpickle.dumps(test_function)

results in stacktrace:

Traceback (most recent call last):
  File "<removed path>/experiment_cloudpickle.py", line 18, in <module>
    _ = cloudpickle.dumps(test_function)
  File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 829, in dumps
    cp.dump(obj)
  File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 233, in dump
    return Pickler.dump(self, obj)
  File <removed path>\Continuum\Miniconda3\lib\pickle.py", line 408, in dump
    self.save(obj)
  File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 475, in save
    f(self, obj) # Call unbound method with explicit self
  File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 354, in save_function
    self.save_function_tuple(obj)
  File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 436, in save_function_tuple
    save(f_globals)
  File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 475, in save
    f(self, obj) # Call unbound method with explicit self
  File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 814, in save_dict
    self._batch_setitems(obj.items())
  File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 845, in _batch_setitems
    save(v)
  File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 475, in save
    f(self, obj) # Call unbound method with explicit self
  File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 548, in save_global
    self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
  File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 713, in save_reduce
    self.memoize(obj)
  File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 429, in memoize
    assert id(obj) not in self.memo
AssertionError

Problems serializing self-referencing objects

Trying to serialize a larger object graph I stumbled upon this problem, here a (non-sensical, albeit error-showing) minimal working example to reproduce the problem. Note, that Python 3.5's built-in pickle works fine:

class SomeClass(object):
    def test(self):
        return SomeClass()

import pickle
import cloudpickle


test = SomeClass()

print(pickle.loads(pickle.dumps(test, pickle.HIGHEST_PROTOCOL)))

print(cloudpickle.loads(cloudpickle.dumps(test)))

Results in:

<__main__.SomeClass object at 0x7f85ab74e320>
Traceback (most recent call last):
  File "test.py", line 13, in <module>
    print(cloudpickle.loads(cloudpickle.dumps(test)))
  File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 629, in dumps
    cp.dump(obj)
  File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/usr/lib/python3.5/pickle.py", line 408, in dump
    self.save(obj)
  File "/usr/lib/python3.5/pickle.py", line 520, in save
    self.save_reduce(obj=obj, *rv)
  File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 514, in save_reduce
    save(cls)
  File "/usr/lib/python3.5/pickle.py", line 475, in save
    f(self, obj) # Call unbound method with explicit self
  File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 368, in save_global
    self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
  File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 533, in save_reduce
    self.memoize(obj)
  File "/usr/lib/python3.5/pickle.py", line 429, in memoize
    assert id(obj) not in self.memo
AssertionError

Tested with cloudpickle 0.2.1 as well as the github version. The problem might be "a different way" to trigger the same underlying problem as in #40 #53 #65.

Unable to pickle itertools.chain.from_iterable

Using cloudpickle as cloned from the repo just now, Python 2.7.11:

Python 2.7.11 (default, Mar  4 2016, 11:10:11)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cloudpickle
>>> import itertools
>>> cloudpickle.dumps(itertools.chain.from_iterable)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "cloudpickle/cloudpickle.py", line 629, in dumps
    cp.dump(obj)
  File "cloudpickle/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
    f(self, obj) # Call unbound method with explicit self
  File "cloudpickle/cloudpickle.py", line 330, in save_builtin_function
    return self.save_function(obj)
  File "cloudpickle/cloudpickle.py", line 203, in save_function
    or getattr(obj.__code__, 'co_filename', None) == '<stdin>'
AttributeError: 'builtin_function_or_method' object has no attribute '__code__'

I get the same result on Python 3.5.1.

Release?

I would love to see a release of cloudpickle next week. What is the procedure for this?

[SPARK-6883] Rely on cloudpickle in pyspark

This package was based on the version that was in pyspark, with some bug fixes contributed by folks as well as more comprehensive tests. I'm opening this issue to track the upstream work in PySpark.

NumPy arrays serialize more slowly with cloudpickle than pickle

I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

In [1]: import numpy as np

In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)

In [3]: import cloudpickle, pickle

In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
Wall time: 185 ms
Out[4]: 100000161

In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 125 ms, sys: 280 ms, total: 404 ms
Wall time: 405 ms
Out[5]: 100000161

cloudpickle breaks dill deserialization across servers.

Following up on this issue on Stackoverflow:

http://stackoverflow.com/questions/42960637/python-3-5-dill-pickling-unpickling-on-different-servers-keyerror-classtype/43006034#43006034

In a nutshell, with Python 3.5:

Server A imports cloudpickle this causes types.ClassType to become defined.

>>> import types
>>> dir(types)
  ['BuiltinFunctionType',
   'BuiltinMethodType',
   'ClassType',
   'CodeType',
   ...
  ]

Server B does not import cloudpickle, so types.ClassType is left undefined.

>>> import types
>>> dir(types)
  ['BuiltinFunctionType',
   'BuiltinMethodType',
   'CodeType',
   ...
  ]

Objects which are serialized in server A also seem to serialize a reference to ClassType. Then, when they are deserialized on server B, we encounter the following error:

Traceback (most recent call last):
 File "/home/streamsadmin/git/streamsx.topology/test/python/topology/deleteme2.py", line 40, in <module>
   a = dill.loads(base64.b64decode(a.encode()))
 File "/home/streamsadmin/anaconda3/lib/python3.5/site-packages/dill/dill.py", line 277, in loads
   return load(file)
 File "/home/streamsadmin/anaconda3/lib/python3.5/site-packages/dill/dill.py", line 266, in load
   obj = pik.load()
 File "/home/streamsadmin/anaconda3/lib/python3.5/site-packages/dill/dill.py", line 524, in _load_type
   return _reverse_typemap[name]
KeyError: 'ClassType'

I've found a workaround, which you can see on Stackoverflow.

Here's my question: types.ClassType was removed in 3.5, yet cloudpickle re-adds it. Is this strictly necessary? It seems to be having side effects.

Cannot pickle sympy.UndefinedFunction

sympy.UndefinedFunction creates a dynamic class via a metaclass. See https://github.com/sympy/sympy/blob/20872c3b27726825869876b2dbe38e2fcd3bef2a/sympy/core/function.py#L775

In Python 3, running cloudpickle from a file, I get

from sympy import Function, symbols

import cloudpickle

f = Function('f')
x = symbols('x')

d = cloudpickle.dumps(f(x))
print(cloudpickle.loads(d))
print(cloudpickle.loads(d) is f(x))

Traceback (most recent call last):
  File "test.py", line 10, in <module>
    d = cloudpickle.dumps(f(x))
_pickle.PicklingError: Can't pickle test: attribute lookup test on __main__ failed

If I replace cloudpickle with pickle, it works.

In Python 2, I get a longer traceback:

Traceback (most recent call last):
  File "test.py", line 8, in <module>
    d = cloudpickle.dumps(f(x))
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 629, in dumps
    cp.dump(obj)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 107, in dump
    return Pickler.dump(self, obj)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 224, in dump
    self.save(obj)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 331, in save
    self.save_reduce(obj=obj, *rv)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 528, in save_reduce
    save(func)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 300, in save
    self.save_global(obj)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 368, in save_global
    self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 533, in save_reduce
    self.memoize(obj)
  File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 244, in memoize
    assert id(obj) not in self.memo
AssertionError

Python 3.6 compatibility issue

A first quick testing with dask shows the bytecode analysis in cloudpickle isn't 3.6-compatible:

    @staticmethod
    def extract_code_globals(co):
        """
            Find all globals names read or written to by codeblock co
            """
    
        code = getattr(co, 'co_code', None)
        if code is None:
            return set()
        if not PY3:
            code = [ord(c) for c in code]
        names = co.co_names
        out_names = set()
    
        n = len(code)
        i = 0
        extended_arg = 0
        while i < n:
            op = code[i]
    
            i += 1
            if op >= HAVE_ARGUMENT:
                oparg = code[i] + code[i+1] * 256 + extended_arg
                extended_arg = 0
                i += 2
                if op == EXTENDED_ARG:
                    extended_arg = oparg*65536
                if op in GLOBAL_OPS:
>                   out_names.add(names[oparg])
E                   IndexError: tuple index out of range

cloudpipe / cloudpickle Goto Github PK

cloudpickle's Issues

Quote

====================================================================== ERROR: test_empty_nonlocal (tests.cloudpickle_test.CloudPickleTest)

Recommend Projects

Recommend Topics

Recommend Org

======================================================================
ERROR: test_empty_nonlocal (tests.cloudpickle_test.CloudPickleTest)