cloudpipe / cloudpickle Goto Github PK
View Code? Open in Web Editor NEWExtended pickling support for Python objects
License: Other
Extended pickling support for Python objects
License: Other
When setting up an inner class like the one below, cloudpickle
fails where dill
would succeed. Based on the rather lengthy traceback, this appears to be due to falling back to Python's standard pickling mechanism.
import cloudpickle
class A(object):
class B(object):
def __init__(self):
self.c = 0
def __init__(self):
self.b = A.B()
a = A()
cloudpickle.dumps(a)
We'll also need to modify our Travis setup to bring in numpy, scipy, etc. as part of the builds. Example from patsy.
Hi,
I ran into the following strange issue: I use cloudpickle to serialize my objects to send them to Kafka, and today I saw the following bug: calling cloudpickle.dumps on an object returns a malformed result that I can't load(), but calling it again on the very same object works!
Including the output from the debug console below; the object in question is a pretty big one using internal libraries.
Any idea what might be causing that?
import cloudpickle
a = cloudpickle.dumps(self)
b = cloudpickle.dumps(self)
cloudpickle.loads(b)
Out[5]:
<aiostreams.runner.SendTo at 0x15383bbba20>
cloudpickle.loads(a)
Traceback (most recent call last):
File "C:\Users\Egor\Anaconda2\envs\py3k\lib\site-packages\IPython\core\interactiveshell.py", line 2881, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-6-c33e741d5197>", line 1, in <module>
cloudpickle.loads(a)
EOFError: Ran out of input
I have the following problem with method descriptor objects in Python 2.7:
Python 2.7.10 |Continuum Analytics, Inc.| (default, Oct 19 2015, 18:04:42)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import cloudpickle, pickle
>>> cloudpickle.dumps(set.union)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 602, in dumps
cp.dump(obj)
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 111, in dump
raise pickle.PicklingError(msg)
pickle.PicklingError: Could not pickle object as excessively deep recursion required.
>>> cloudpickle.dumps(str.decode)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 602, in dumps
cp.dump(obj)
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 111, in dump
raise pickle.PicklingError(msg)
pickle.PicklingError: Could not pickle object as excessively deep recursion required.
>>> pickle.dumps(set.union)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/pickle.py", line 1374, in dumps
Pickler(file, protocol).dump(obj)
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/home/pmd/anaconda3/envs/python2/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle method_descriptor objects
>>> def f(x):
... return set.union(x)
...
>>> cloudpickle.dumps(f)
'\x80\x02ccloudpickle.cloudpickle\n_fill_function\nq\x00(ccloudpickle.cloudpickle\n_make_skel_func\nq\x01ccloudpickle.cloudpickle\n_builtin_type\nq\x02U\x08CodeTypeq\x03\x85q\x04Rq\x05(K\x01K\x01K\x02KCU\rt\x00\x00j\x01\x00|\x00\x00\x83\x01\x00Sq\x06N\x85q\x07U\x03setq\x08U\x05unionq\t\x86q\nU\x01xq\x0b\x85q\x0cU\x07<stdin>q\rU\x01fq\x0eK\x01U\x02\x00\x01q\x0f))tq\x10Rq\x11]q\x12}q\x13\x87q\x14Rq\x15}q\x16N}q\x17tR.'
>>> def g(x):
... return str.decode(x)
...
>>> cloudpickle.dumps(g)
'\x80\x02ccloudpickle.cloudpickle\n_fill_function\nq\x00(ccloudpickle.cloudpickle\n_make_skel_func\nq\x01ccloudpickle.cloudpickle\n_builtin_type\nq\x02U\x08CodeTypeq\x03\x85q\x04Rq\x05(K\x01K\x01K\x02KCU\rt\x00\x00j\x01\x00|\x00\x00\x83\x01\x00Sq\x06N\x85q\x07U\x03strq\x08U\x06decodeq\t\x86q\nU\x01xq\x0b\x85q\x0cU\x07<stdin>q\rU\x01gq\x0eK\x01U\x02\x00\x01q\x0f))tq\x10Rq\x11]q\x12}q\x13\x87q\x14Rq\x15}q\x16N}q\x17tR.'
>>>
Is this a known issue, and is there anything I can do about it so I don't have to wrap the method descritpors with another function? Thanks
The Spark folks went to the trouble of getting cloudpickle licensed as BSD (from LGPL), and have been improving on it directly within pyspark. Let's bring it over and maintain it as an overall library.
Register cloudpickle
on pypi
According to the Python documentation for the imp
package, "Deprecated since version 3.4: The imp package is pending deprecation in favor of importlib."
cloudpickle.py
uses imp.new_module()
(https://github.com/cloudpipe/cloudpickle/blob/master/cloudpickle/cloudpickle.py#L925) and imp.find_module()
(https://github.com/cloudpipe/cloudpickle/blob/master/cloudpickle/cloudpickle.py#L1075), each of which has an equivalent function in importlib
.
The following code does not work, and may be a potential bug:
>>> def f():
def g(): return g
return g
>>> import cloudpickle; cloudpickle.dumps(f())
Traceback (most recent call last):
File "<pyshell#2>", line 1, in <module>
import cloudpickle; cloudpickle.dumps(f())
File "cloudpickle.py", line 629, in dumps
cp.dump(obj)
File "cloudpickle.py", line 111, in dump
raise pickle.PicklingError(msg)
PicklingError: Could not pickle object as excessively deep recursion required.
Is it possible to fix this, or is this a fundamental limitation of closures in Python that cannot be worked around?
Current behaviour:
>>> import cloudpickle; cloudpickle.__version__
'0.2.2'
>>> from scipy.sparse import dok_matrix
>>> A = dok_matrix((2,2); A
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 0 stored elements in Dictionary Of Keys format>
>>> cloudpickle.loads(cloudpickle.dumps(A))
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 0 stored elements in Dictionary Of Keys format>
>>> A[0,0] = 1
>>> cloudpickle.loads(cloudpickle.dumps(A))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1204, in load_setitem
dict[key] = value
File "/usr/lib/python2.7/dist-packages/scipy/sparse/dok.py", line 235, in __setitem__
if (isintlike(i) and isintlike(j) and 0 <= i < self.shape[0]
File "/usr/lib/python2.7/dist-packages/scipy/sparse/base.py", line 525, in __getattr__
raise AttributeError(attr + " not found")
AttributeError: shape not found
Expected behaviour: an object of type dok_matrix
properly (de)serializes regardless of content. Note that pickle
and cPickle
both work:
>>> import pickle
>>> pickle.loads(pickle.dumps(A))
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 1 stored elements in Dictionary Of Keys format>
>>> import cPickle
>>> cPickle.loads(cPickle.dumps(A))
<2x2 sparse matrix of type '<type 'numpy.float64'>'
with 1 stored elements in Dictionary Of Keys format>
Built-in pickle
doesn't work on Logger objects, but cloudpickle could try to be a bit smarter. Upstream issue at https://bugs.python.org/issue30520
I have a very simple classmethod example that fails.
import cloudpickle
import pickle
class A(object):
@classmethod
def test(cls):
pass
a = A()
res = cloudpickle.dumps(a)
new_obj = pickle.loads(res)
new_obj.__class__.test()
This is on Python 3.5. It seems cloudpickle tries to support memoryviews (the traceback shows a dedicated save_memoryview
method, but fails:
>>> import cloudpickle
>>> m = memoryview(b"abc")
>>> cloudpickle.dumps(m)
Traceback (most recent call last):
File "<ipython-input-3-c69575090534>", line 1, in <module>
cloudpickle.dumps(m)
File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/pickle.py", line 408, in dump
self.save(obj)
File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/pickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "/home/antoine/miniconda3/envs/dask35/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 154, in save_memoryview
Pickler.save_string(self, str(obj))
AttributeError: type object '_Pickler' has no attribute 'save_string'
It seems it is not a standard Python dunder attribute. For what I have seen it is used to exclude some attributes from __dict__
before serialization.
Why is this used instead of __getstate__
?
It seems in save_inst()
you are actually trying to use __getstate__
first if it exists, and only if it does not, then you look for __transient__
. However, in save_reduce()
you are always directly trying to look for this attribute (if protocol version is >= 2). Is this necessary? Couldn't __getstate__
tried to be used as well first?
Cross reference: irmen/Pyro4#179
Encountering issues trying to pickle lock objects. Not sure if this is something that should be permissible or not. Seems cloudpickle
just falls back to pickle
in this case. Traceback shown below.
>>> import threading
>>> import cloudpickle
>>> l = threading.Lock()
>>> cloudpickle.pickle.dumps(l)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/zopt/conda2/envs/pickle_test/lib/python2.7/pickle.py", line 1380, in dumps
Pickler(file, protocol).dump(obj)
File "/zopt/conda2/envs/pickle_test/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/zopt/conda2/envs/pickle_test/lib/python2.7/pickle.py", line 306, in save
rv = reduce(self.proto)
File "/zopt/conda2/envs/pickle_test/lib/python2.7/copy_reg.py", line 70, in _reduce_ex
raise TypeError, "can't pickle %s objects" % base.__name__
TypeError: can't pickle lock objects
name: pickle_test
channels: !!python/tuple
- !!python/unicode
'conda-forge'
- !!python/unicode
'defaults'
dependencies:
- conda-forge::ca-certificates=2017.1.23=0
- conda-forge::cloudpickle=0.2.2=py27_2
- conda-forge::dill=0.2.6=py27_0
- conda-forge::ncurses=5.9=10
- conda-forge::openssl=1.0.2h=3
- conda-forge::python=2.7.12=2
- conda-forge::readline=6.2=0
- conda-forge::sqlite=3.13.0=1
- conda-forge::tk=8.5.19=1
- conda-forge::zlib=1.2.11=0
prefix: /zopt/conda2/envs/pickle_test
This problem is when a function refers (by attribute) to a sub-module of a package. Cloudpickle appears to pickle functions not by name, but by code plus (a subset of) globals. So the parent package is injected into the pickle, but the sub-module is not.
def func():
# import unittest.mock
x = unittest.TestCase
x = unittest.mock.Mock
import unittest.mock
import cloudpickle as pickle
s = pickle.dumps(func)
del unittest
import sys
del sys.modules['unittest']
del sys.modules['unittest.mock']
f = pickle.loads(s)
# import unittest.mock as anything
f()
AttributeError: module 'unittest' has no attribute 'mock'
This leads to non-intuitive bugs in applications such as cluster computing (e.g. with dask.distributed
).
Workarounds:
as
globals.__init__.py
).I assume cloudpickle checks whether a global is an imported module, and if so then stores the name (rather than pickling its attributes). Is it practical to also check (via sys.modules.keys()
) which sub-modules had previously been imported, and ensure every such module is subsequently initialised?
Appears that I cannot pickle an Ellipsis
object, but I can pickle slice
s. It would be nice to have support for pickling Ellipsis
. FWIW, this is solved by dill
.
>>> import cloudpickle
>>> cloudpickle.dumps(Ellipsis)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-cc84d30bf9cd> in <module>()
----> 1 cloudpickle.dumps(Ellipsis)
/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dumps(obj, protocol)
600
601 cp = CloudPickler(file,protocol)
--> 602 cp.dump(obj)
603
604 return file.getvalue()
/zopt/conda/envs/nanshenv/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dump(self, obj)
105 self.inject_addons()
106 try:
--> 107 return Pickler.dump(self, obj)
108 except RuntimeError as e:
109 if 'recursion' in e.args[0]:
/zopt/conda/envs/nanshenv/lib/python2.7/pickle.pyc in dump(self, obj)
222 if self.proto >= 2:
223 self.write(PROTO + chr(self.proto))
--> 224 self.save(obj)
225 self.write(STOP)
226
/zopt/conda/envs/nanshenv/lib/python2.7/pickle.pyc in save(self, obj)
304 reduce = getattr(obj, "__reduce_ex__", None)
305 if reduce:
--> 306 rv = reduce(self.proto)
307 else:
308 reduce = getattr(obj, "__reduce__", None)
TypeError: can't pickle ellipsis objects
Hello!
The current code from the master branch fails to work with classes that have the abc.ABCMeta
metaclass.
MWE:
class Q(object):
__metaclass__ = abc.ABCMeta
q = Q()
cloudpickle.loads(cloudpickle.dumps(q))
With python 2.7.3 this yields:
object.__new__(getset_descriptor) is not safe, use getset_descriptor.__new__()
With python 2.7.13:
TypeError: can't pickle wrapper_descriptor objects
This fails
import enum
class MyEnum(enum.Enum):
SPAM = 'SPAM'
import cloudpickle
cloudpickle.dumps(MyEnum.SPAM)
---------------------------------------------------------------------------
RecursionError Traceback (most recent call last)
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dump(self, obj)
145 try:
--> 146 return Pickler.dump(self, obj)
147 except RuntimeError as e:
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in dump(self, obj)
408 self.framer.start_framing()
--> 409 self.save(obj)
410 self.write(STOP)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
584 else:
--> 585 save(func)
586 save(args)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
489 if issc:
--> 490 self.save_global(obj)
491 return
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_global(self, obj, name, pack)
424
--> 425 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
426 else:
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
585 save(func)
--> 586 save(args)
587 write(pickle.REDUCE)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_tuple(self, obj)
735 for element in obj:
--> 736 save(element)
737 # Subtle. Same as in the big comment below.
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
846 save(k)
--> 847 save(v)
848 write(SETITEMS)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
600 if dictitems is not None:
--> 601 self._batch_setitems(dictitems)
602
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
851 save(k)
--> 852 save(v)
853 write(SETITEM)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
584 else:
--> 585 save(func)
586 save(args)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
489 if issc:
--> 490 self.save_global(obj)
491 return
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_global(self, obj, name, pack)
424
--> 425 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
426 else:
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
585 save(func)
--> 586 save(args)
587 write(pickle.REDUCE)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_tuple(self, obj)
735 for element in obj:
--> 736 save(element)
737 # Subtle. Same as in the big comment below.
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
846 save(k)
--> 847 save(v)
848 write(SETITEMS)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
851 save(k)
--> 852 save(v)
853 write(SETITEM)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
584 else:
--> 585 save(func)
586 save(args)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
489 if issc:
--> 490 self.save_global(obj)
491 return
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_global(self, obj, name, pack)
424
--> 425 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
426 else:
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in save_reduce(self, func, args, state, listitems, dictitems, obj)
585 save(func)
--> 586 save(args)
587 write(pickle.REDUCE)
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_tuple(self, obj)
735 for element in obj:
--> 736 save(element)
737 # Subtle. Same as in the big comment below.
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
475 if f is not None:
--> 476 f(self, obj) # Call unbound method with explicit self
477 return
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save_dict(self, obj)
820 self.memoize(obj)
--> 821 self._batch_setitems(obj.items())
822
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in _batch_setitems(self, items)
846 save(k)
--> 847 save(v)
848 write(SETITEMS)
... last 10 frames repeated, from the frame below ...
/home/mrocklin/Software/anaconda/lib/python3.6/pickle.py in save(self, obj, save_persistent_id)
520 # Save the reduce() output and finally memoize the object
--> 521 self.save_reduce(obj=obj, *rv)
522
RecursionError: maximum recursion depth exceeded
During handling of the above exception, another exception occurred:
PicklingError Traceback (most recent call last)
<ipython-input-3-d64a2267ce31> in <module>()
----> 1 cloudpickle.dumps(MyEnum.SPAM)
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dumps(obj, protocol)
704
705 cp = CloudPickler(file,protocol)
--> 706 cp.dump(obj)
707
708 return file.getvalue()
/home/mrocklin/Software/anaconda/lib/python3.6/site-packages/cloudpickle/cloudpickle.py in dump(self, obj)
148 if 'recursion' in e.args[0]:
149 msg = """Could not pickle object as excessively deep recursion required."""
--> 150 raise pickle.PicklingError(msg)
151
152 def save_memoryview(self, obj):
PicklingError: Could not pickle object as excessively deep recursion required.
Originally reported here: dask/distributed#1178 by @AndrewPashkin
Can anyone explain me why the following code isn't working?
from cloudpickle import pickle
namespace = {}
exec('def f(x): return x', namespace)
pickle.dumps(namespace['f'])
And if this is the expected behavior, you would make me very happy with a solution that uses exec('def f(x): return x', namespace)
and results in a serializable function f
and I prefer not to use globals()
cloudpickle/cloudpickle/cloudpickle.py
Line 12 in 62027de
Report from the failing test ran with the PyPy3 environment configured by tox (not available on travis-ci):
__________________________________________________________________________________ CloudPickleTest.test_method_descriptors __________________________________________________________________________________
self = <tests.cloudpickle_test.CloudPickleTest testMethod=test_method_descriptors>
def test_method_descriptors(self):
> f = pickle_depickle(str.upper)
tests/cloudpickle_test.py:241:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
tests/cloudpickle_test.py:37: in pickle_depickle
return pickle.loads(cloudpickle.dumps(obj))
cloudpickle/cloudpickle.py:605: in dumps
cp.dump(obj)
cloudpickle/cloudpickle.py:107: in dump
return Pickler.dump(self, obj)
../../opt/pypy3/lib-python/3/pickle.py:237: in dump
self.save(obj)
../../opt/pypy3/lib-python/3/pickle.py:299: in save
f(self, obj) # Call unbound method with explicit self
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <cloudpickle.cloudpickle.CloudPickler object at 0x0000000004568560>, obj = <function upper at 0x0000000001a32fc0>, name = 'upper'
def save_function(self, obj, name=None):
""" Registered with the dispatch to handle all function types.
Determines what kind of function obj is (e.g. lambda, defined at
interactive prompt, etc) and handles the pickling appropriately.
"""
write = self.write
if name is None:
name = obj.__name__
modname = pickle.whichmodule(obj, name)
# print('which gives %s %s %s' % (modname, obj, name))
try:
themodule = sys.modules[modname]
except KeyError:
# eval'd items such as namedtuple give invalid items for their function __module__
modname = '__main__'
if modname == '__main__':
themodule = None
if themodule:
self.modules.add(themodule)
if getattr(themodule, name, None) is obj:
return self.save_global(obj, name)
# if func is lambda, def'ed at prompt, is in main, or is nested, then
# we'll pickle the actual function object rather than simply saving a
# reference (as is done in default pickler), via save_function_tuple.
> if islambda(obj) or obj.__code__.co_filename == '<stdin>' or themodule is None:
E AttributeError: 'builtin-code' object has no attribute 'co_filename'
Hello!
The current upstream code can't handle namedtuples.
Here's an MWE:
import cloudpickle
from collections import namedtuple
X = namedtuple('X', ['a'])
cloudpickle.loads(cloudpickle.dumps(X))
Traceback:
Traceback (most recent call last):
File "t.py", line 6, in <module>
cloudpickle.loads(cloudpickle.dumps(X))
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1388, in loads
return Unpickler(file).load()
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 864, in load
dispatch[key](self)
File "/usr/local/Cellar/python/2.7.13/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 1139, in load_reduce
value = func(*args)
File "/Users/amatanhead/Documents/cloudpickle/cloudpickle/cloudpickle.py", line 1043, in _rehydrate_skeleton_class
setattr(skeleton_class, attrname, attr)
AttributeError: attribute '__dict__' of 'type' objects is not writable
I'm using dask and notices that cloudpickles performs very slow when pickling bigger lists or sets. Why is it so much slower and is there a way to avoid this?
In [1]: import cloudpickle
In [2]: import pickle
In [3]: data = set(range(100000))
In [4]: %%time
...: silent = pickle.dumps(data)
...:
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 3.34 ms
In [5]: %%time
...: silent = cloudpickle.dumps(data)
...:
CPU times: user 200 ms, sys: 0 ns, total: 200 ms
Wall time: 197 ms
In [6]: %%time
...: silent = pickle.dumps(list(data))
...:
CPU times: user 4 ms, sys: 0 ns, total: 4 ms
Wall time: 3.64 ms
In [7]: %%time
...: silent = cloudpickle.dumps(list(data))
...:
CPU times: user 192 ms, sys: 0 ns, total: 192 ms
Wall time: 194 ms
Here is an analysis from a colleague:
The speed-up for us seems to be coming from the fact that pickling modules takes a long time:
In [25]: %timeit cloudpickle.dumps(numpy, -1)
100 loops, best of 3: 3.03 ms per loop
It looks like _find_module()
will use imp.find_module()
which traverses sys.path
to look for things that look like numpy. In our environment, sys.path tends to be long and our filesystems tend to be slow, hence the 3.03 ms.
def save_module(self, obj):
"""
Save a module as an import
"""
mod_name = obj.__name__
# If module is successfully found then it is not a dynamically created module
try:
_find_module(mod_name) # EXPENSIVE!!!!!
is_dynamic = False
except ImportError:
is_dynamic = True
self.modules.add(obj)
if is_dynamic:
self.save_reduce(dynamic_subimport, (obj.__name__, vars(obj)), obj=obj)
else:
self.save_reduce(subimport, (obj.__name__,), obj=obj)
dispatch[types.ModuleType] = save_module
So it looks like cloudpickle is trying to allow for "dynamically created modules". If it didn't try to be this flexible, then the entire function should just be
self.save_reduce(subimport, (obj.__name__,), obj=obj)
So the danger is if people are using "dynamically created modules", which we don't tend to do.
Maybe an easy way out is to check if obj.__file__
exists (the attribute, not the file). If it does, then immediately assume that is_dynamic=False.
Fwiw, I think we're pickling numpy
because we're pickling functions that refer to numpy
. Not positive though.
I use map to execute some code.
############## Testing of IPyrallel on DEAP ###################################
creator.create("FitnessMin", base.Fitness, weights=(-1.0,))
creator.create("Individual", gp.PrimitiveTree, fitness=creator.FitnessMin)
toolbox = base.Toolbox()
**#Using Parallell Processing
import ipyparallel as ipp, time
rc= ipp.Client()
# pool = rc.load_balanced_view()
rc[:].use_cloudpickle()
pool= rc[:]
toolbox.register("map", pool.map)**
toolbox.register("expr", gp.genHalfAndHalf, pset=pset, min_=1, max_=2)
toolbox.register("individual", tools.initIterate, creator.Individual, toolbox.expr)
toolbox.register("population", tools.initRepeat, list, toolbox.individual)
toolbox.register("compile", gp.compile, pset=pset)
def evalSymbReg(individual, points):
func = toolbox.compile(expr=individual) # Transform the tree expression in a callable function
# and the real function : x**4 + x**3 + x**2 + x
sqerrors = ((func(x) - x**4 - x**3 - x**2 - x)**2 for x in points)
return math.fsum(sqerrors) / len(points),
toolbox.register("evaluate", evalSymbReg, points=[x/10. for x in range(-10,10)])
toolbox.register("select", tools.selTournament, tournsize=3)
toolbox.register("mate", gp.cxOnePoint)
toolbox.register("expr_mut", gp.genFull, min_=0, max_=2)
toolbox.register("mutate", gp.mutUniform, expr=toolbox.expr_mut, pset=pset)
toolbox.decorate("mate", gp.staticLimit(key=operator.attrgetter("height"), max_value=17))
toolbox.decorate("mutate", gp.staticLimit(key=operator.attrgetter("height"), max_value=17))
def main():
random.seed(318)
pop = toolbox.population(n=300)
hof = tools.HallOfFame(1)
stats_fit = tools.Statistics(lambda ind: ind.fitness.values)
stats_size = tools.Statistics(len)
mstats = tools.MultiStatistics(fitness=stats_fit, size=stats_size)
mstats.register("avg", np.mean)
mstats.register("std", np.std)
mstats.register("min", np.min)
mstats.register("max", np.max)
pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 40, stats=mstats,
halloffame=hof, verbose=True)
# print log
return pop, log, hof
I got this error :
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-978da9be5b87> in <module>()
1 if __name__ == "__main__":
----> 2 pop, log, hof= main()
<ipython-input-10-6ff69ab06682> in main()
15
16 pop, log = algorithms.eaSimple(pop, toolbox, 0.5, 0.1, 40, stats=mstats,
---> 17 halloffame=hof, verbose=True)
18 # print log
19 return pop, log, hof
D:\_devs\Python01\Anaconda27\lib\site-packages\deap\algorithms.pyc in eaSimple(population, toolbox, cxpb, mutpb, ngen, stats, halloffame, verbose)
145 # Evaluate the individuals with an invalid fitness
146 invalid_ind = [ind for ind in population if not ind.fitness.valid]
--> 147 fitnesses = toolbox.map(toolbox.evaluate, invalid_ind)
148 for ind, fit in zip(invalid_ind, fitnesses):
149 ind.fitness.values = fit
<decorator-gen-141> in map(self, f, *sequences, **kwargs)
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in sync_results(f, self, *args, **kwargs)
48 self._in_sync_results = True
49 try:
---> 50 ret = f(self, *args, **kwargs)
51 finally:
52 self._in_sync_results = False
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in map(self, f, *sequences, **kwargs)
613 assert len(sequences) > 0, "must have some sequences to map onto!"
614 pf = ParallelFunction(self, f, block=block, **kwargs)
--> 615 return pf.map(*sequences)
616
617 @sync_results
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\remotefunction.pyc in map(self, *sequences)
283 and mismatched sequence lengths will be padded with None.
284 """
--> 285 return self(*sequences, __ipp_mapping=True)
286
287 __all__ = ['remote', 'parallel', 'RemoteFunction', 'ParallelFunction']
<decorator-gen-131> in __call__(self, *sequences, **kwargs)
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\remotefunction.pyc in sync_view_results(f, self, *args, **kwargs)
74 view = self.view
75 if view._in_sync_results:
---> 76 return f(self, *args, **kwargs)
77 view._in_sync_results = True
78 try:
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\remotefunction.pyc in __call__(self, *sequences, **kwargs)
257 view = self.view if balanced else client[t]
258 with view.temp_flags(block=False, **self.flags):
--> 259 ar = view.apply(f, *args)
260 ar.owner = False
261
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in apply(self, f, *args, **kwargs)
209 ``f(*args, **kwargs)``.
210 """
--> 211 return self._really_apply(f, args, kwargs)
212
213 def apply_async(self, f, *args, **kwargs):
<decorator-gen-140> in _really_apply(self, f, args, kwargs, targets, block, track)
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in sync_results(f, self, *args, **kwargs)
48 self._in_sync_results = True
49 try:
---> 50 ret = f(self, *args, **kwargs)
51 finally:
52 self._in_sync_results = False
<decorator-gen-139> in _really_apply(self, f, args, kwargs, targets, block, track)
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in save_ids(f, self, *args, **kwargs)
33 n_previous = len(self.client.history)
34 try:
---> 35 ret = f(self, *args, **kwargs)
36 finally:
37 nmsgs = len(self.client.history) - n_previous
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\view.pyc in _really_apply(self, f, args, kwargs, targets, block, track)
555 for ident in _idents:
556 future = self.client.send_apply_request(self._socket, f, args, kwargs, track=track,
--> 557 ident=ident)
558 futures.append(future)
559 if track:
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\client\client.pyc in send_apply_request(self, socket, f, args, kwargs, metadata, track, ident)
1387 bufs = serialize.pack_apply_message(f, args, kwargs,
1388 buffer_threshold=self.session.buffer_threshold,
-> 1389 item_threshold=self.session.item_threshold,
1390 )
1391
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\serialize\serialize.pyc in pack_apply_message(f, args, kwargs, buffer_threshold, item_threshold)
164
165 arg_bufs = list(chain.from_iterable(
--> 166 serialize_object(arg, buffer_threshold, item_threshold) for arg in args))
167
168 kw_keys = sorted(kwargs.keys())
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\serialize\serialize.pyc in <genexpr>((arg,))
164
165 arg_bufs = list(chain.from_iterable(
--> 166 serialize_object(arg, buffer_threshold, item_threshold) for arg in args))
167
168 kw_keys = sorted(kwargs.keys())
D:\_devs\Python01\Anaconda27\lib\site-packages\ipyparallel\serialize\serialize.pyc in serialize_object(obj, buffer_threshold, item_threshold)
110 buffers.extend(_extract_buffers(cobj, buffer_threshold))
111
--> 112 buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
113 return buffers
114
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in dumps(obj, protocol)
627
628 cp = CloudPickler(file,protocol)
--> 629 cp.dump(obj)
630
631 return file.getvalue()
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in dump(self, obj)
105 self.inject_addons()
106 try:
--> 107 return Pickler.dump(self, obj)
108 except RuntimeError as e:
109 if 'recursion' in e.args[0]:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in dump(self, obj)
222 if self.proto >= 2:
223 self.write(PROTO + chr(self.proto))
--> 224 self.save(obj)
225 self.write(STOP)
226
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
329
330 # Save the reduce() output and finally memoize the object
--> 331 self.save_reduce(obj=obj, *rv)
332
333 def persistent_id(self, obj):
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
527 else:
528 save(func)
--> 529 save(args)
530 write(pickle.REDUCE)
531
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_function(self, obj, name)
203 or getattr(obj.__code__, 'co_filename', None) == '<stdin>'
204 or themodule is None):
--> 205 self.save_function_tuple(obj)
206 return
207 else:
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_function_tuple(self, func)
251
252 # save the rest of the func data needed by _fill_function
--> 253 save(f_globals)
254 save(defaults)
255 save(dct)
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
329
330 # Save the reduce() output and finally memoize the object
--> 331 self.save_reduce(obj=obj, *rv)
332
333 def persistent_id(self, obj):
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
545
546 if state is not None:
--> 547 save(state)
548 write(pickle.BUILD)
549
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
329
330 # Save the reduce() output and finally memoize the object
--> 331 self.save_reduce(obj=obj, *rv)
332
333 def persistent_id(self, obj):
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
545
546 if state is not None:
--> 547 save(state)
548 write(pickle.BUILD)
549
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_tuple(self, obj)
566 write(MARK)
567 for element in obj:
--> 568 save(element)
569
570 if id(obj) in memo:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
329
330 # Save the reduce() output and finally memoize the object
--> 331 self.save_reduce(obj=obj, *rv)
332
333 def persistent_id(self, obj):
D:\_devs\Python01\Anaconda27\lib\site-packages\cloudpickle\cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
545
546 if state is not None:
--> 547 save(state)
548 write(pickle.BUILD)
549
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in _batch_setitems(self, items)
684 write(MARK)
685 for k, v in tmp:
--> 686 save(k)
687 save(v)
688 write(SETITEMS)
D:\_devs\Python01\Anaconda27\lib\pickle.pyc in save(self, obj)
304 reduce = getattr(obj, "__reduce_ex__", None)
305 if reduce:
--> 306 rv = reduce(self.proto)
307 else:
308 reduce = getattr(obj, "__reduce__", None)
TypeError: can't pickle member_descriptor objects
I often need to serialize many small objects containing many python functions.
In [1]: def inc(x):
return x + 1
...:
In [2]: d = {i: (inc, i) for i in range(10000)}
Sometimes I do this all at once; this works great.
In [3]: from cloudpickle import dumps, loads
In [4]: %time len(dumps(d))
CPU times: user 118 ms, sys: 0 ns, total: 118 ms
Wall time: 117 ms
But sometimes I do this in several small batches, which is much slower.
In [5]: %time len([dumps(item) for item in d.items()])
CPU times: user 2.7 s, sys: 3.93 ms, total: 2.7 s
Wall time: 2.71 s
A quick profile shows that the majority of time is spent in save_function
In [7]: %prun -s cumtime len([dumps(item) for item in d.items()])
ncalls tottime percall cumtime percall filename:lineno(function)
1 0.000 0.000 4.782 4.782 {built-in method exec}
1 0.001 0.001 4.782 4.782 <string>:1(<module>)
1 0.038 0.038 4.782 4.782 <string>:1(<listcomp>)
10000 0.025 0.000 4.744 0.000 cloudpickle.py:598(dumps)
10000 0.011 0.000 4.658 0.000 cloudpickle.py:104(dump)
10000 0.030 0.000 4.646 0.000 pickle.py:401(dump)
450000/10000 0.894 0.000 4.597 0.000 pickle.py:460(save)
120000/10000 0.287 0.000 4.568 0.000 pickle.py:716(save_tuple)
50000/10000 0.115 0.000 4.296 0.000 cloudpickle.py:162(save_function)
10000 0.053 0.000 4.254 0.000 cloudpickle.py:214(save_function_tuple)
10000 0.020 0.000 2.834 0.000 cloudpickle.py:142(save_codeobject)
40000/10000 0.120 0.000 2.814 0.000 cloudpickle.py:470(save_reduce)
50000/40000 0.117 0.000 1.285 0.000 cloudpickle.py:318(save_global)
20000 0.039 0.000 1.058 0.000 pickle.py:680(save_bytes)
290000 0.519 0.000 1.044 0.000 pickle.py:416(memoize)
40000 0.257 0.000 0.716 0.000 pickle.py:898(save_global)
70000 0.147 0.000 0.479 0.000 pickle.py:698(save_str)
800000 0.270 0.000 0.392 0.000 pickle.py:212(write)
And so I'm tempted to memoize save_function
between dumps calls. Presumably with some sort of LRU mechanism, keying by object identity. This is unsafe if functions mutate in any way. I've never run into such a situation but I'm unsure if it's done elsewhere.
On looking into cloudpickle more deeply, it appears that Pickler
has a caching mechanism within it. Does anyone have experience with these memo
objects? I would need to clear out non-function elements from the cache between calls.
I'm happy to do the work here if we are able to agree on a good solution.
Hey cloudpipe team!
I'm doing an exploratory analysis for the gensim library to potentially use cloudpickle (here's the discussion), and noticed that 'regular' cloudpickle is consistently ~8x slower than python's pickle module for pretty much all the data structures I threw at it.
Is this the expected normal behavior, or am I doing something wrong in my tests? I'm using python2.7/3.4 on windows, without C-compilers (not using the optimized versions if there are any),
Would you guys have any ideas if we could modify the module selectively for certain tasks to improve performance on the most-used features?
Hi. I am trying to package cloudpickle for openSUSE, with unit tests to make sure everything is working properly. The unit tests works fine for python 2.x (2.6 and 2.7), but the unit tests fail for all versions of Python 3.x (3.3, 3.4 and 3.5). The packages are identical besides being python 2.x or python 3.x. All stated dependencies are included, and as near as I can tell my test invocation shouldn't have any substantial differences to how your travis tests are invoked. Here are the failures (this is for python 3.4, but identical failures occur in 3.3 and 3.5):
======================================================================
ERROR: test_pickling_special_file_handles (tests.cloudpickle_file_test.CloudPickleFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/tests/cloudpickle_file_test.py", line 102, in test_pickling_special_file_handles
self.assertEquals(out, pickle.loads(cloudpickle.dumps(out)))
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 602, in dumps
cp.dump(obj)
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 107, in dump
return Pickler.dump(self, obj)
File "/usr/lib64/python3.4/pickle.py", line 412, in dump
self.save(obj)
File "/usr/lib64/python3.4/pickle.py", line 479, in save
f(self, obj) # Call unbound method with explicit self
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 548, in save_file
raise pickle.PicklingError("Cannot pickle files that do not map to an actual file")
_pickle.PicklingError: Cannot pickle files that do not map to an actual file
======================================================================
ERROR: test_temp_file (tests.cloudpickle_file_test.CloudPickleFileTests)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/tests/cloudpickle_file_test.py", line 96, in test_temp_file
newfile = pickle.loads(cloudpickle.dumps(f))
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 602, in dumps
cp.dump(obj)
File "/home/abuild/rpmbuild/BUILD/cloudpickle-0.1.1/cloudpickle/cloudpickle.py", line 107, in dump
return Pickler.dump(self, obj)
File "/usr/lib64/python3.4/pickle.py", line 412, in dump
self.save(obj)
File "/usr/lib64/python3.4/pickle.py", line 499, in save
rv = reduce(self.proto)
TypeError: cannot serialize '_io.BufferedRandom' object
Hey @ogrisel, @mrocklin, @pitrou!
We should make a release. I've made the last few, I'd love to have someone else take the reigns on shipping. Since I'm no longer using this package directly, I don't have much vested interest in getting this shipped (other than as a user of dask and pyspark).
@pitrou - what is your username on PyPI?
> python --version
Python 2.7.11 :: Anaconda 4.0.0 (x86_64)
Also
>>> cloudpickle.__version__
'0.1.1'
Hi, I'm trying to pickle some stuff in the typing
module. I'm curious if there are fundamental limitations here or if this is out of scope for cloudpickle. Thanks for your help!
from typing import List, Callable
from cloudpickle import loads, dumps
This works.
>>> List
typing.List<~T>
>>> loads(dumps(List))
typing.List<~T>
This seems to lose some information.
>>> Callable[[int, str], float]
typing.Callable[[int, str], float]
>>> loads(dumps(Callable[[int, str], float]))
typing.Callable
This doesn't work
>>> dumps(List[int])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-15-f02da844db1c> in <module>()
----> 1 dumps(List[int])
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dumps(obj, protocol)
600
601 cp = CloudPickler(file,protocol)
--> 602 cp.dump(obj)
603
604 return file.getvalue()
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in dump(self, obj)
105 self.inject_addons()
106 try:
--> 107 return Pickler.dump(self, obj)
108 except RuntimeError as e:
109 if 'recursion' in e.args[0]:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in dump(self, obj)
222 if self.proto >= 2:
223 self.write(PROTO + chr(self.proto))
--> 224 self.save(obj)
225 self.write(STOP)
226
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
298 issc = 0
299 if issc:
--> 300 self.save_global(obj)
301 return
302
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
351 d['__new__'] = obj.__new__
352
--> 353 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
354 else:
355 raise pickle.PicklingError("Can't pickle %r" % obj)
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
509 else:
510 save(func)
--> 511 save(args)
512 write(pickle.REDUCE)
513
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
298 issc = 0
299 if issc:
--> 300 self.save_global(obj)
301 return
302
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
351 d['__new__'] = obj.__new__
352
--> 353 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
354 else:
355 raise pickle.PicklingError("Can't pickle %r" % obj)
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
509 else:
510 save(func)
--> 511 save(args)
512 write(pickle.REDUCE)
513
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
298 issc = 0
299 if issc:
--> 300 self.save_global(obj)
301 return
302
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
351 d['__new__'] = obj.__new__
352
--> 353 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
354 else:
355 raise pickle.PicklingError("Can't pickle %r" % obj)
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
509 else:
510 save(func)
--> 511 save(args)
512 write(pickle.REDUCE)
513
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
566 write(MARK)
567 for element in obj:
--> 568 save(element)
569
570 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
298 issc = 0
299 if issc:
--> 300 self.save_global(obj)
301 return
302
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
351 d['__new__'] = obj.__new__
352
--> 353 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
354 else:
355 raise pickle.PicklingError("Can't pickle %r" % obj)
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
509 else:
510 save(func)
--> 511 save(args)
512 write(pickle.REDUCE)
513
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
298 issc = 0
299 if issc:
--> 300 self.save_global(obj)
301 return
302
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
351 d['__new__'] = obj.__new__
352
--> 353 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
354 else:
355 raise pickle.PicklingError("Can't pickle %r" % obj)
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
509 else:
510 save(func)
--> 511 save(args)
512 write(pickle.REDUCE)
513
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
329
330 # Save the reduce() output and finally memoize the object
--> 331 self.save_reduce(obj=obj, *rv)
332
333 def persistent_id(self, obj):
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
527
528 if state is not None:
--> 529 save(state)
530 write(pickle.BUILD)
531
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_function(self, obj, name)
197 klass = getattr(themodule, name, None)
198 if klass is None or klass is not obj:
--> 199 self.save_function_tuple(obj)
200 return
201
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_function_tuple(self, func)
240 # save the rest of the func data needed by _fill_function
241 save(f_globals)
--> 242 save(defaults)
243 save(dct)
244 write(pickle.TUPLE)
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
329
330 # Save the reduce() output and finally memoize the object
--> 331 self.save_reduce(obj=obj, *rv)
332
333 def persistent_id(self, obj):
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
494 "args[0] from __newobj__ args has the wrong class")
495 args = args[1:]
--> 496 save(cls)
497
498 #Don't pickle transient entries
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_global(self, obj, name, pack)
351 d['__new__'] = obj.__new__
352
--> 353 self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
354 else:
355 raise pickle.PicklingError("Can't pickle %r" % obj)
/Users/rkn/anaconda/lib/python2.7/site-packages/cloudpickle/cloudpickle.pyc in save_reduce(self, func, args, state, listitems, dictitems, obj)
509 else:
510 save(func)
--> 511 save(args)
512 write(pickle.REDUCE)
513
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_tuple(self, obj)
552 if n <= 3 and proto >= 2:
553 for element in obj:
--> 554 save(element)
555 # Subtle. Same as in the big comment below.
556 if id(obj) in memo:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
284 f = self.dispatch.get(t)
285 if f:
--> 286 f(self, obj) # Call unbound method with explicit self
287 return
288
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save_dict(self, obj)
653
654 self.memoize(obj)
--> 655 self._batch_setitems(obj.iteritems())
656
657 dispatch[DictionaryType] = save_dict
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in _batch_setitems(self, items)
685 for k, v in tmp:
686 save(k)
--> 687 save(v)
688 write(SETITEMS)
689 elif n:
/Users/rkn/anaconda/lib/python2.7/pickle.pyc in save(self, obj)
304 reduce = getattr(obj, "__reduce_ex__", None)
305 if reduce:
--> 306 rv = reduce(self.proto)
307 else:
308 reduce = getattr(obj, "__reduce__", None)
TypeError: can't pickle wrapper_descriptor objects
This is on git master:
>>> import cloudpickle
>>> def f():
...: s = {1,2}
...: def g():
...: return len(s)
...: return g
...:
>>> g = f()
>>> g
<function __main__.f.<locals>.g>
>>> cloudpickle.dumps(g)
Traceback (most recent call last):
File "<ipython-input-5-3faa44bc74aa>", line 1, in <module>
cloudpickle.dumps(g)
File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 734, in dumps
cp.dump(obj)
File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/home/antoine/miniconda3/envs/dask36/lib/python3.6/pickle.py", line 409, in dump
self.save(obj)
File "/home/antoine/miniconda3/envs/dask36/lib/python3.6/pickle.py", line 476, in save
f(self, obj) # Call unbound method with explicit self
File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 267, in save_function
self.save_function_tuple(obj)
File "/home/antoine/cloudpickle/cloudpickle/cloudpickle.py", line 336, in save_function_tuple
self._save_subimports(code, set(f_globals.values()) | set(closure))
TypeError: unhashable type: 'set'
I'm getting the following error within the dask/distributed test suite on newer versions of cloudpickle. Sorry for the lack of a clean reproducible test case. This only appears to happen in odd situations. Hopefully the error message is somewhat informative.
> if cell_count >= 0 else
None
)
E TypeError: '>=' not supported between instances of 'list' and 'int'
cc @llllllllll
#Hi,
I had serialized a pipeline using cloudpickle. When I try to load it inside a docker container, I get the below error;
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/pickle.py", line 1378, in load
return Unpickler(file).load()
File "/usr/lib/python2.7/pickle.py", line 858, in load
dispatch[key](self)
File "/usr/lib/python2.7/pickle.py", line 1246, in load_build
for k, v in slotstate.items():
AttributeError: 'mtrand.RandomState' object has no attribute 'items'
Load seems to work on the same environment / host. It doesnt load in docker container running python 2.7.
I am using python2.7 , cloudpickle==0.2.2
In [1]: import cloudpickle
In [2]: cloudpickle.loads(cloudpickle.loads(str.format))
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-2-610f6140b8f8> in <module>()
----> 1 cloudpickle.loads(cloudpickle.loads(str.format))
TypeError: 'method_descriptor' does not support the buffer interface
The following gives me an error. It's not a case I actually want to support, but it did take a while to figure out this was making the serialization fail.
def test_empty_nonlocal(self):
if False:
bar = 100
def foo():
return 1 + bar or 0
data = cloudpickle.dumps(foo)
Traceback (most recent call last):
File "/home/jlewis/workspace/cloudpickle/tests/cloudpickle_test.py", line 371, in test_empty_nonlocal
data = cloudpickle.dumps(foo)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 706, in dumps
cp.dump(obj)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 146, in dump
return Pickler.dump(self, obj)
File "/Users/jlewis/anaconda/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/Users/jlewis/anaconda/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 270, in save_function
self.save_function_tuple(obj)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 305, in save_function_tuple
code, f_globals, defaults, closure, dct, base_globals = self.extract_func_data(func)
File "/home/jlewis/workspace/cloudpickle/cloudpickle/cloudpickle.py", line 374, in extract_func_data
closure = [c.cell_contents for c in func.closure] if func.closure else []
ValueError: Cell is empty
The current LICENSE
does not mention the copyright of the original authors of the module. It should be fixed.
I understand that you do not include an unpickler:
It does not include an unpickler, as standard python unpickling suffices.
However it would be convenient if injecting the pickle load()
and loads()
methods into your name space, thus it can be used as a drop in.
You have import pickle
surely it's almost a simple as:
load = pickle.load
loads = pickle.loads
If I am honest, I am a little inconvenient to have to write:
try:
import cPickle as pickle
except ImportError:
import pickle
import cloudpickle
when I need to read and write pickled files/objects etc., and have to remember to use cloudpickle.dump
[s
]()
and pickle.load
[s
]()
.
Just having those functions in your namespace, just redirecting the work to pickle
itself would make code cleaner, can therefore can be used as a drop in replacement:
import cloudpickle as pickle
This is common practise with an alternative pickler dill
:
import dill as pickle
dill
is great, but I have found that cloudpickle
can pickle something I need that dill
cannot.
I wouldn't mind even doing it myself if need be; however rather than doing the work (which may not be as simple as stated above), creating a pull request, and it being rejected because you do not want this feature; I thought I would ask first.
Is this something that you would be interested in doing yourselves, or accepting an pull request for? or am I barking up the wrong tree?
cloudpickle currently tries to support Python 2.6 and 3.3. Does anyone still need support for those older Python versions, or can we drop them?
The current implementation treats memoryview
objects as strings. This both fails in Python 3 because the Pickler.save_string
method does not exist (see Pickler.save_bytes
instead), and because memoryview
s are significantly more complex than their Python buffer
cousins.
Seems like this was started in depth upstream within pyspark, but we should bring it over.
As I mentioned in #80, a change introduced after 0.2.2 breaks a previously working serialized function call that relied on importing pandas. My suspicion is that serialization of functions that utilize pandas
may be broken in recent versions of cloudpickle.
Full backtrace from Dask/prep.py:
Traceback (most recent call last):
File "prep.py", line 64, in <module>
dask.compute(values)
File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/base.py", line 204, in compute
results = get(dsk, keys, **kwargs)
File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/multiprocessing.py", line 177, in get
raise_exception=reraise, **kwargs)
File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/local.py", line 521, in get_async
raise_exception(exc, tb)
File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/compatibility.py", line 59, in reraise
raise exc.with_traceback(tb)
File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/dask/local.py", line 289, in execute_task
task, data = loads(task_info)
File "/Users/aron/anaconda3/envs/parallel/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 840, in subimport
__import__(name)
ImportError: No module named '_pandasujson'
Using 0.3.1 the following code
import cloudpickle
class Base:
def __init__(self, field1):
self.field1 = field1
class Child(Base):
def __init__(self, field2, field1):
super().__init__(field1)
self.field2 = field2
def test_function():
_ = Child('field-2-value', 'field-1-value')
_ = cloudpickle.dumps(test_function)
results in stacktrace:
Traceback (most recent call last):
File "<removed path>/experiment_cloudpickle.py", line 18, in <module>
_ = cloudpickle.dumps(test_function)
File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 829, in dumps
cp.dump(obj)
File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 233, in dump
return Pickler.dump(self, obj)
File <removed path>\Continuum\Miniconda3\lib\pickle.py", line 408, in dump
self.save(obj)
File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 354, in save_function
self.save_function_tuple(obj)
File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 436, in save_function_tuple
save(f_globals)
File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 814, in save_dict
self._batch_setitems(obj.items())
File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 845, in _batch_setitems
save(v)
File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 548, in save_global
self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
File "<removed path>\Continuum\Miniconda3\lib\site-packages\cloudpickle\cloudpickle.py", line 713, in save_reduce
self.memoize(obj)
File "<removed path>\Continuum\Miniconda3\lib\pickle.py", line 429, in memoize
assert id(obj) not in self.memo
AssertionError
Trying to serialize a larger object graph I stumbled upon this problem, here a (non-sensical, albeit error-showing) minimal working example to reproduce the problem. Note, that Python 3.5's built-in pickle
works fine:
class SomeClass(object):
def test(self):
return SomeClass()
import pickle
import cloudpickle
test = SomeClass()
print(pickle.loads(pickle.dumps(test, pickle.HIGHEST_PROTOCOL)))
print(cloudpickle.loads(cloudpickle.dumps(test)))
Results in:
<__main__.SomeClass object at 0x7f85ab74e320>
Traceback (most recent call last):
File "test.py", line 13, in <module>
print(cloudpickle.loads(cloudpickle.dumps(test)))
File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 629, in dumps
cp.dump(obj)
File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 107, in dump
return Pickler.dump(self, obj)
File "/usr/lib/python3.5/pickle.py", line 408, in dump
self.save(obj)
File "/usr/lib/python3.5/pickle.py", line 520, in save
self.save_reduce(obj=obj, *rv)
File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 514, in save_reduce
save(cls)
File "/usr/lib/python3.5/pickle.py", line 475, in save
f(self, obj) # Call unbound method with explicit self
File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 368, in save_global
self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
File "/home/sachs/.local/lib/python3.5/site-packages/cloudpickle/cloudpickle.py", line 533, in save_reduce
self.memoize(obj)
File "/usr/lib/python3.5/pickle.py", line 429, in memoize
assert id(obj) not in self.memo
AssertionError
Tested with cloudpickle 0.2.1 as well as the github version. The problem might be "a different way" to trigger the same underlying problem as in #40 #53 #65.
Using cloudpickle as cloned from the repo just now, Python 2.7.11:
Python 2.7.11 (default, Mar 4 2016, 11:10:11)
[GCC 4.2.1 Compatible Apple LLVM 7.0.2 (clang-700.1.81)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import cloudpickle
>>> import itertools
>>> cloudpickle.dumps(itertools.chain.from_iterable)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "cloudpickle/cloudpickle.py", line 629, in dumps
cp.dump(obj)
File "cloudpickle/cloudpickle.py", line 107, in dump
return Pickler.dump(self, obj)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/usr/local/Cellar/python/2.7.11/Frameworks/Python.framework/Versions/2.7/lib/python2.7/pickle.py", line 286, in save
f(self, obj) # Call unbound method with explicit self
File "cloudpickle/cloudpickle.py", line 330, in save_builtin_function
return self.save_function(obj)
File "cloudpickle/cloudpickle.py", line 203, in save_function
or getattr(obj.__code__, 'co_filename', None) == '<stdin>'
AttributeError: 'builtin_function_or_method' object has no attribute '__code__'
I get the same result on Python 3.5.1.
I would love to see a release of cloudpickle next week. What is the procedure for this?
This package was based on the version that was in pyspark, with some bug fixes contributed by folks as well as more comprehensive tests. I'm opening this issue to track the upstream work in PySpark.
I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.
In [1]: import numpy as np
In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)
In [3]: import cloudpickle, pickle
In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
Wall time: 185 ms
Out[4]: 100000161
In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
CPU times: user 125 ms, sys: 280 ms, total: 404 ms
Wall time: 405 ms
Out[5]: 100000161
Following up on this issue on Stackoverflow:
In a nutshell, with Python 3.5:
Server A imports cloudpickle
this causes types.ClassType
to become defined.
>>> import types
>>> dir(types)
['BuiltinFunctionType',
'BuiltinMethodType',
'ClassType',
'CodeType',
...
]
Server B does not import cloudpickle
, so types.ClassType
is left undefined.
>>> import types
>>> dir(types)
['BuiltinFunctionType',
'BuiltinMethodType',
'CodeType',
...
]
Objects which are serialized in server A also seem to serialize a reference to ClassType
. Then, when they are deserialized on server B, we encounter the following error:
Traceback (most recent call last):
File "/home/streamsadmin/git/streamsx.topology/test/python/topology/deleteme2.py", line 40, in <module>
a = dill.loads(base64.b64decode(a.encode()))
File "/home/streamsadmin/anaconda3/lib/python3.5/site-packages/dill/dill.py", line 277, in loads
return load(file)
File "/home/streamsadmin/anaconda3/lib/python3.5/site-packages/dill/dill.py", line 266, in load
obj = pik.load()
File "/home/streamsadmin/anaconda3/lib/python3.5/site-packages/dill/dill.py", line 524, in _load_type
return _reverse_typemap[name]
KeyError: 'ClassType'
I've found a workaround, which you can see on Stackoverflow.
Here's my question: types.ClassType
was removed in 3.5, yet cloudpickle re-adds it. Is this strictly necessary? It seems to be having side effects.
sympy.UndefinedFunction creates a dynamic class via a metaclass. See https://github.com/sympy/sympy/blob/20872c3b27726825869876b2dbe38e2fcd3bef2a/sympy/core/function.py#L775
In Python 3, running cloudpickle from a file, I get
from sympy import Function, symbols
import cloudpickle
f = Function('f')
x = symbols('x')
d = cloudpickle.dumps(f(x))
print(cloudpickle.loads(d))
print(cloudpickle.loads(d) is f(x))
Traceback (most recent call last):
File "test.py", line 10, in <module>
d = cloudpickle.dumps(f(x))
_pickle.PicklingError: Can't pickle test: attribute lookup test on __main__ failed
If I replace cloudpickle
with pickle
, it works.
In Python 2, I get a longer traceback:
Traceback (most recent call last):
File "test.py", line 8, in <module>
d = cloudpickle.dumps(f(x))
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 629, in dumps
cp.dump(obj)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 107, in dump
return Pickler.dump(self, obj)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 224, in dump
self.save(obj)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 331, in save
self.save_reduce(obj=obj, *rv)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 528, in save_reduce
save(func)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 300, in save
self.save_global(obj)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 368, in save_global
self.save_reduce(typ, (obj.__name__, obj.__bases__, d), obj=obj)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/site-packages/cloudpickle/cloudpickle.py", line 533, in save_reduce
self.memoize(obj)
File "/Users/aaronmeurer/anaconda3/envs/python2/lib/python2.7/pickle.py", line 244, in memoize
assert id(obj) not in self.memo
AssertionError
A first quick testing with dask shows the bytecode analysis in cloudpickle isn't 3.6-compatible:
@staticmethod
def extract_code_globals(co):
"""
Find all globals names read or written to by codeblock co
"""
code = getattr(co, 'co_code', None)
if code is None:
return set()
if not PY3:
code = [ord(c) for c in code]
names = co.co_names
out_names = set()
n = len(code)
i = 0
extended_arg = 0
while i < n:
op = code[i]
i += 1
if op >= HAVE_ARGUMENT:
oparg = code[i] + code[i+1] * 256 + extended_arg
extended_arg = 0
i += 2
if op == EXTENDED_ARG:
extended_arg = oparg*65536
if op in GLOBAL_OPS:
> out_names.add(names[oparg])
E IndexError: tuple index out of range
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.