sPickle

The module sPickle is an enhanced version of the standard module pickle. It provides an improved Pickler class and a utility class SPickleTools.

The sPickle package tries to push the limits for pickling. The implementation tries to create correct pickles, but it does not try to be efficient or portable or nice to read or ... Consider it a proof of concept, a demonstration, that shows what could be done.

Warning

Although the author is using the sPickle package in production, it is more or less untested outside the specific environment it was written for.

Note

The sPickle package is currently requires Python 2.7.

sPickle.MODULE_TO_BE_PICKLED_FLAG_NAME = '__module_must_be_pickled__'

If global (=module level) variable __module_must_be_pickled__ is true, the module gets pickled by value.

class sPickle.Pickler(file, protocol=2, serializeableModules=None, mangleModuleName=None, logger=None, object_dispatch=None)

Bases: pickle.Pickler

The sPickle Pickler.

This Pickler is a subclass of pickle.Pickler that adds the ability to pickle modules, most classes and program state. It is intended to be API-compatible with pickle.Pickler so you can use it as a plug in replacement. However its constructor has more optional arguments.

Parameters:
  • file – The file argument must be either an instance of collections.MutableSequence or have a write(str) - method that accepts a single string argument. It can thus be an open file object, a StringIO object, or any other custom object that meets this interface. As an alternative you can use a list or any other instance of collections.MutableSequence.
  • protocol (int) – The optional protocol argument tells the pickler to use the given protocol; For this implementation, the only supported protocol is 2 or pickle.HIGHEST_PROTOCOL. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.
  • serializeableModules

    The optional argument serializeableModules must be an iterable collection of modules and strings. If the pickler needs to serialize a module, it checks this collection to decide, if the module needs to be pickled by value or by name. The module gets pickled by value, if at least one of the following conditions is true. Otherwise it gets pickled by reference:

    • The module has a global variable named __module_must_be_pickled__ and the value of this variable is true.
    • The module object is contained in serializeableModules.
    • The the name of the module starts with a string contained in serializeableModules.
    • The module has the attribute __file__ and serializeableModules contains a string, that is a substring of __file__ after applying path and case normalization as appropriate for the current system.
  • logger (logging.Logger) – The optional argument logger must be an instance of class logging.Logger. If given, it is used instead of the default logger.
  • mangleModuleName

    Experimental feature: the optional argument mangleModuleName must be a callable with three arguments. The first argument is this pickler, the second the name of module and the third is None or - if the caller is going to pickle a module reference - the module object. The callable must return a pickleable object that unpickles as a string. You can use this callable to rename modules in the pickle. For instance you may want to replace “posixpath” by “os.path”.

    Note

    In order to be able to unpickle a module pickled by name, the module must be importable. If this is not the case or if the content of the module might change, you should tell the pickler to pickle the module by value.

  • object_dispatch – Experimental feature: the optional argument object_dispatch must be either an ObjectDispatchBuilder or a MutableMapping from numeric object ids - as returned by id() - to callables, which take two arguments, first the pickler and then the object to be pickled. It is used to initialize the attributes object_dispatch_builder and object_dispatch. If no value is given, the Pickler updates the global default ObjectDispatchBuilder (as returned by get_default_instance()) and then sets object_dispatch to a shallow copy of the global ObjectDispatchBuilder.

Attributes and Methods

dispatch

A per instance version of the global dispatch table pickle.Pickler.dispatch. Using a per instance dispatch table keeps the global table unchanged.

object_dispatch

Certain “global” objects require a special treatment, because the values of their attributes __module__ and / or __name__ are missing, wrong or otherwise not useful. Examples include platform dependent implementations of standard functions like os.getcwd(), which reports to be nt.getcwd or various types from the module types. The attribute object_dispatch is a mapping from a numeric object id - as returned by id() - to a callable, which takes two arguments: first the pickler and then the object to be pickled. If the pickler finds the id of an object to be pickled in object_dispatch, it dispatches pickling to the callable.

If the constructor argument object_dispatch was a MutableMapping, the pickler sets this attribute to object_dispatch. Otherwise the pickler sets object_dispatch to object_dispatch.object_dispatch. Finally the pickler adds a few additional entries to the mapping for special cases.

object_dispatch_builder

If the constructor argument object_dispatch was not a MutableMapping, this attribute is the value of object_dispatch or, if object_dispatch was None a copy of the default ObjectDispatchBuilder as returned by get_default_instance(). Otherwise object_dispatch_builder is None.

classmethod analysePicklerStack(traceback_or_frame, stopObjectId=None)

Analyse the stack of a Pickler.

This method creates a list of dictionaries, one for each object currently being serialised. (That is, objects already serialised or objects not yet started are not in this list.) The first list item represents the object whose processing started last, the last entry represents the object whose processing started first. The pickler reorders the sequence of of objects to be pickled if required. Therefore it is not guaranteed that the last list item represents the object, that was initially given to the pickler.

Possible entries of the dictionaries in the returned list are

Key ANALYSE_OBJECT_KEY
the object to be pickled. This item is always present.
Key ANALYSE_DICT_OF_KEY
This item is present, if the object to be pickled is the __dict__ attribute of a another object. The value is the object, that has the __dict__ attribute.
Key ANALYSE_MEMO_KEY
If the object to be pickled has already been added to the memo, the value of this item is the memo key.
Parameters:
  • traceback_or_frame (types.TracebackType or types.FrameType) – a traceback object or a frame object. In case of a traceback object, the method follows the chain of traceback objects and extracts the innermost frame object.
  • stopObjectId (int) – the id of the top most object, the caller is interested in. If this method encounters an object with the given id, it stops building the result list.
Returns:

a list of dictionaries

Return type:

list

class sPickle.ObjectDispatchBuilder(modules=None)

Bases: object

A builder for the object_dispatch table of a Pickler

A ObjectDispatchBuilder has a list of names of API-modules. It inspects theses modules and looks for public objects, that won’t be pickled by value (strings, numbers, lists, dicts, sets, modules) and have unusable values of their attributes __module__ and/or __name__. The builder adds an item to the object_dispatch mapping for each problematic object.

Parameters:modules – see method extend_pending_analysis_queue()
ALL_PORTABLE_STDLIB_MODULES

A constant list of portable modules in the Python 2.7.9 standard library.

DEFAULT_PRIO1

The default priority for module entries, which were enumerated in the module variable __all__. Because the author of the module explicitly declared those names, they get a higher priority.

DEFAULT_PRIO2

The default priority for public module entries from a module without a __all__ variable.

object_dispatch

The attribute object_dispatch is a mapping from a numeric object id - as returned by id() - to a callable. The callable takes two arguments: first the pickler and then the object to be pickled. The callable must use the pickler to pickle the object.

The callable has a numerical priority. The priority is the value of the callable´s attribute priority or, if the callable lacks the attribute, 1000. The priority indicates how “good” a API module is. This is a heuristic approach to the problem, that many API modules without an __all__ variable accidently export objects imported from other modules. If an object is exported by more than one API-module, the ObjectDispatchBuilder uses the module with the highest priority.

acceptable_module_names

Intentionally undocumented.

append_pending_analysis_queue(name, prio1=None, prio2=None)

Append a single module to the list of API modules.

Parameters:
  • name (str) – the module name
  • prio1 (int) – the priority of the module, if the module has the variable __all__. Default is DEFAULT_PRIO1`.
  • prio2 (int) – the priority of the module, if the module lacks the variable __all__. Default is the value of prio1.
build()

Update the content of object_dispatch.

extend_pending_analysis_queue(modules)

Extend the list of API modules by modules.

Parameters:modules (an iterable collection) –

the iterable collection of module specifications to be considered. A module specification is one of

  • name (str) of a module
  • a sequence (name, priority) of a module
  • a sequence (name, prio1, prio2) of a module

For the meaning of the priority values see method append_pending_analysis_queue().

classmethod get_default_instance()

Return a global default instance of class ObjectDispatchBuilder.

Returns:the default object dispatch builder
Return type:ObjectDispatchBuilder
class sPickle.SPickleTools(serializeableModules=None, pickler_class=None)

Bases: object

A collection of simple utility methods.

Warning

This class is still under development. Don’t rely on its methods. If you need a stable API use the class Pickler directly or copy the code.

The optional argument serializeableModules is passed on to the class Pickler.

The optional argument pickler_class can be used to set a different pickler class. It must accept the same arguments as class Pickler.

classmethod dis(str_, out=None, memo=None, indentlevel=4)

Disassemble an optionally compressed pickle.

See function pickletools.dis() for details.

dumps(obj, persistent_id=None, persistent_id_method=None, doCompress=True, mangleModuleName=None, object_dispatch=None)

Pickle an object and return the pickle

This method works similar to the regular dumps method, but also optimizes and optionally compresses the pickle.

Parameters:
  • obj (object) – object to be pickled
  • persistent_id – the persistent_id function (or another callable) for the pickler. The function is called with a single positional argument, and must return None`or the persistent id for its argument. See the section “Pickling and unpickling external objects” of the documentation of module :mod:`Pickle.
  • persistent_id_method – a variant of the persistent_id function, that takes the pickler object as its first argument and an object as its second argument.
  • doCompress – If doCompress yields True in a boolean context, the pickle will be compressed, if the compression actually reduces the size of the pickle. The compression method depends on the exact value of doCompress. If doCompress is callable, it is called to perform the compression. doCompress must be a function (or method), that takes a single string parameter and returns a compressed version. Otherwise, if doCompress is not callable the function bz2.compress() is used.
  • mangleModuleName

    Unless mangleModuleName is None, it must be a callable with 3 arguments: the first receives the pickler, the second the module name of the object to be pickled. If the caller is going to save a module reference, the third argument is the module. The callable must return an object to be pickled instead of the module name. This can be a different string or a object that gets unpickled as a string.

    Example:

    import os.path
    
    def mangleOsPath(pickler, name, module)
        '''use 'os.path' instead of the platform specific module name'''
        if module is os.path:
            return "os.path"
        return name
    
    spt = SPickleTools()
    p = spt.dumps(object_to_be_pickled, mangleModuleName=mangleOsPath)
    
  • object_dispatch – the optional argument object_dispatch must be either a MutableMapping or an ObjectDispatchBuilder. It is passed on to the constructor of the pickler. See Pickler for details.
Returns:

the pickle, optionally compressed

Return type:

str

dumps_with_external_ids(obj, idmap, matchResources=False, matchNetref=False, additionalResourceObjects=(), **kw)

Pickle an object, that references objects that can’t be pickled.

If you want to pickle an object, that references a resource (files, sockets, etc) or references a RPyC-proxy for an object on a remote system you can’t pickle the referenced object. But if you are going to transfer the pickle to a remote system using the package RPyC, you can replace the resources by an RPyC proxy objects and replace RPyC proxy objects by the real objects.

This method creates an Pickler object with a persistent_id method that optionally replaces resources and proxy objects by their object id. It stores the mapping between ids and objects in the idmap dictionary (or any other mutable mapping).

Parameters:
  • obj (object) – the object to be pickled
  • idmap (dict) – receives the id to object mapping
  • matchResources (object) – if true in a boolean context, replace resource objects.
  • matchNetref (object) – if true in a boolean context, replace RPyC proxies (technically objects of class rpyc.core.netref.BaseNetref).
  • additionalResourceObjects – a collection of objects that encapsulate some kind of resource and must be replaced by an RPyC proxy.
  • kw – other keyword arguments that are passed on to dumps().
classmethod getImportList(str_)

Return a list containing all imported modules from the pickle str_.

Somtimes useful for debuging.

classmethod loads(str_, persistent_load=None, useCPickle=True, unpickler_class=None)

Unpickle an object from a string.

Parameters:
  • str (str) – the pickle
  • persistent_load – The persistent_load method for the unpickler. See the section “Pickling and unpickling external objects” of the documentation of module Pickle.
  • useCPickle (object) – if True in a boolean context, use the Unpickler from the module cPickle. Otherwise use the much slower Unpickler from the module pickle.
  • unpickler_class – the unpickler class to be used. If this parameter is given, the value of useCPickle is ignored.
Returns:

the reconstructed object

Return type:

object

classmethod loads_with_external_ids(str_, idmap, useCPickle=True, unpickler_class=None)

Unpickle an object from a string.

Replace ids for external objects with the objects provided in idmap.

Parameters:
  • str (str) – the pickle
  • idmap (dict) – the mapping, that contains the objects for the id values used in the pickle
  • useCPickle (object) – if True in a boolean context, use the Unpickler from the module cPickle. Otherwise use the much slower Unpickler from the module pickle.
  • unpickler_class – the unpickler class to be used. If this parameter is given, the value of useCPickle is ignored.
Returns:

the reconstructed object

Return type:

object

classmethod module_for_globals(callable_or_moduledict, withDefiningModules=False)

Get the module associated with a callable or a module dictionary.

If you pickle a module, make sure to keep a reference to the unpickled module. Otherwise the destruction of the module will clear the modules dictionary. Usually, the sPickle code for serializing modules, preserves a reference to modules created from a pickle but not imported into sys.modules. However, there might be cases, where you need to identify relevant modules yourself. This method can be used, to find the relevant module(s).

Parameters:
  • callable_or_moduledict – a function or a method or a module dictionary
  • withDefiningModules (object) – if True and callable_or_moduledict is a callable, return also the module defining the callable.
Returns:

None, or a single module or a set of modules

classmethod reducer(*args)

Get an object with a method __reduce__.

This method creates an object that has a custom method __reduce__. The __reduce__ method returns the given arguments when called.

This method can be used to implement complex __reduce__ method that need more than one function call on unpickling.

remotemethod(rpycconnection, method=None, create_only_once=None, **kw)

Create a remote function.

This method takes an active RPyC connection and a locally defined function (or method) and returns a proxy for an equivalent function on the remote side. If you invoke the proxy, it will create a pickle containing the function, transfer this pickle to the remote side, unpickle it and invoke the function. It then pickles the result and transfers the result back to the local side. It will not pickle the function arguments. If you need to transfer the function arguments by value, use functools.partial() to apply them to your function prior to the call of remotemethod.

Parameters:
  • rpycconnection (rpyc.core.protocol.Connection) – an active RPyC connection. If set to None, execute method localy.
  • method (object) – a callable object. If you do not give this argument, you can use remotemethod as a decorator.
  • create_only_once – controlls the creation of the function on the remote side. If you want to create the function during the execution of remotemethod(), pass CREATE_IMMEDIATELY. Otherwise, if you want to create the remote function on its first invokation, set create_only_once to a value that is True in a boolean context. Otherwise, if you set create_only_once evaluates to False, the local proxy creates the create the remote function on every invocation.
  • kw – other keyword arguments that are passed on to dumps_with_external_ids().
Returns:

the proxy for the remote function

Note

If you use remotemethod as a decorator, do not apply it on regular methods of a class. It does not work in the desired way, because decorators work on the underlying function object, not on the method object. Therefore you will end up with a remote function, that recives a RPyC proxy for self.

CREATE_EVERYTIME = False

Constant to be given to the create_only_once argument of method remotemethod(). Create the function on the remote side on every invocation of the function returned by remotemethod(). The actual value is False.

CREATE_IMMEDIATELY = 'immedately'

Constant to be given to the create_only_once argument of method remotemethod(). Create the function on the remote side during the invocation of remotemethod().

CREATE_LAZY = True

Constant to be given to the create_only_once argument of method remotemethod(). Create the function on the remote side on the first invocation of the function returned by remotemethod(). The actual value is True.

class sPickle.FailSavePickler(file, protocol=2, serializeableModules=None, mangleModuleName=None, logger=None, object_dispatch=None)

Bases: sPickle._sPickle.Pickler

A failsave variant of class Pickler.

If this pickler detects an unpickleable object, it calls its method get_replacement() to retrieve a surrogate object to be pickled instead of the unpickleable object.

To use this feature you must either assign a suitable callable as attribute ‘get_replacement’ or derive a create your own subclass of FailSavePickler and override method get_replacement().

Parameters:
  • file – The file argument must be either an instance of collections.MutableSequence or have a write(str) - method that accepts a single string argument. It can thus be an open file object, a StringIO object, or any other custom object that meets this interface. As an alternative you can use a list or any other instance of collections.MutableSequence.
  • protocol (int) – The optional protocol argument tells the pickler to use the given protocol; For this implementation, the only supported protocol is 2 or pickle.HIGHEST_PROTOCOL. Specifying a negative protocol version selects the highest protocol version supported. The higher the protocol used, the more recent the version of Python needed to read the pickle produced.
  • serializeableModules

    The optional argument serializeableModules must be an iterable collection of modules and strings. If the pickler needs to serialize a module, it checks this collection to decide, if the module needs to be pickled by value or by name. The module gets pickled by value, if at least one of the following conditions is true. Otherwise it gets pickled by reference:

    • The module has a global variable named __module_must_be_pickled__ and the value of this variable is true.
    • The module object is contained in serializeableModules.
    • The the name of the module starts with a string contained in serializeableModules.
    • The module has the attribute __file__ and serializeableModules contains a string, that is a substring of __file__ after applying path and case normalization as appropriate for the current system.
  • logger (logging.Logger) – The optional argument logger must be an instance of class logging.Logger. If given, it is used instead of the default logger.
  • mangleModuleName

    Experimental feature: the optional argument mangleModuleName must be a callable with three arguments. The first argument is this pickler, the second the name of module and the third is None or - if the caller is going to pickle a module reference - the module object. The callable must return a pickleable object that unpickles as a string. You can use this callable to rename modules in the pickle. For instance you may want to replace “posixpath” by “os.path”.

    Note

    In order to be able to unpickle a module pickled by name, the module must be importable. If this is not the case or if the content of the module might change, you should tell the pickler to pickle the module by value.

  • object_dispatch – Experimental feature: the optional argument object_dispatch must be either an ObjectDispatchBuilder or a MutableMapping from numeric object ids - as returned by id() - to callables, which take two arguments, first the pickler and then the object to be pickled. It is used to initialize the attributes object_dispatch_builder and object_dispatch. If no value is given, the Pickler updates the global default ObjectDispatchBuilder (as returned by get_default_instance()) and then sets object_dispatch to a shallow copy of the global ObjectDispatchBuilder.

Attributes and Methods

dispatch

A per instance version of the global dispatch table pickle.Pickler.dispatch. Using a per instance dispatch table keeps the global table unchanged.

object_dispatch

Certain “global” objects require a special treatment, because the values of their attributes __module__ and / or __name__ are missing, wrong or otherwise not useful. Examples include platform dependent implementations of standard functions like os.getcwd(), which reports to be nt.getcwd or various types from the module types. The attribute object_dispatch is a mapping from a numeric object id - as returned by id() - to a callable, which takes two arguments: first the pickler and then the object to be pickled. If the pickler finds the id of an object to be pickled in object_dispatch, it dispatches pickling to the callable.

If the constructor argument object_dispatch was a MutableMapping, the pickler sets this attribute to object_dispatch. Otherwise the pickler sets object_dispatch to object_dispatch.object_dispatch. Finally the pickler adds a few additional entries to the mapping for special cases.

object_dispatch_builder

If the constructor argument object_dispatch was not a MutableMapping, this attribute is the value of object_dispatch or, if object_dispatch was None a copy of the default ObjectDispatchBuilder as returned by get_default_instance(). Otherwise object_dispatch_builder is None.

get_replacement(pickler, obj, exception)

Get a surrogate for an unpicklable object.

This method is called if the pickler encounters an otherwise unpickleable object. The method can return an replacement object or its argument ‘exception’, if the function is unwilling to profide a replacement.

This implementation always returns ‘exception’.

Parameters:
  • pickler (FailSavePickler or a subclass thereof) – the pickler
  • obj – the unpickleable object
  • exception – the exception raised on pickling obj
Returns:

a pickleable surrogate for obj or ‘exception’.

exception sPickle.RecursionDetectedError(msg, oid, level)

Bases: pickle.PicklingError

Raised by FailSavePickler on infinite recursions

Parameters:
  • msg (str) – the message
  • oid (int) – the id() value of the object, that caused the recursion.
  • level – intentionally undocumented
exception sPickle.UnpicklingWillFailError

Bases: pickle.PicklingError

This error indicates, that an object can be pickled, but unpickling will probably fail.

This usually caused by an incomplete implementation of the pickling protocol or by a hostile __getattr__ or __getattribute__ method.