mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 13:41:24 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			852 lines
		
	
	
	
		
			36 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			852 lines
		
	
	
	
		
			36 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \section{\module{pickle} --- Python object serialization}
 | |
| 
 | |
| \declaremodule{standard}{pickle}
 | |
| \modulesynopsis{Convert Python objects to streams of bytes and back.}
 | |
| % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
 | |
| % Rewritten by Barry Warsaw <barry@zope.com>
 | |
| 
 | |
| \index{persistence}
 | |
| \indexii{persistent}{objects}
 | |
| \indexii{serializing}{objects}
 | |
| \indexii{marshalling}{objects}
 | |
| \indexii{flattening}{objects}
 | |
| \indexii{pickling}{objects}
 | |
| 
 | |
| The \module{pickle} module implements a fundamental, but powerful
 | |
| algorithm for serializing and de-serializing a Python object
 | |
| structure.  ``Pickling'' is the process whereby a Python object
 | |
| hierarchy is converted into a byte stream, and ``unpickling'' is the
 | |
| inverse operation, whereby a byte stream is converted back into an
 | |
| object hierarchy.  Pickling (and unpickling) is alternatively known as
 | |
| ``serialization'', ``marshalling,''\footnote{Don't confuse this with
 | |
| the \refmodule{marshal} module} or ``flattening'',
 | |
| however the preferred term used here is ``pickling'' and
 | |
| ``unpickling'' to avoid confusing.
 | |
| 
 | |
| This documentation describes both the \module{pickle} module and the 
 | |
| \refmodule{cPickle} module.
 | |
| 
 | |
| \subsection{Relationship to other Python modules}
 | |
| 
 | |
| The \module{pickle} module has an optimized cousin called the
 | |
| \module{cPickle} module.  As its name implies, \module{cPickle} is
 | |
| written in C, so it can be up to 1000 times faster than
 | |
| \module{pickle}.  However it does not support subclassing of the
 | |
| \function{Pickler()} and \function{Unpickler()} classes, because in
 | |
| \module{cPickle} these are functions, not classes.  Most applications
 | |
| have no need for this functionality, and can benefit from the improved
 | |
| performance of \module{cPickle}.  Other than that, the interfaces of
 | |
| the two modules are nearly identical; the common interface is
 | |
| described in this manual and differences are pointed out where
 | |
| necessary.  In the following discussions, we use the term ``pickle''
 | |
| to collectively describe the \module{pickle} and
 | |
| \module{cPickle} modules.
 | |
| 
 | |
| The data streams the two modules produce are guaranteed to be
 | |
| interchangeable.
 | |
| 
 | |
| Python has a more primitive serialization module called
 | |
| \refmodule{marshal}, but in general
 | |
| \module{pickle} should always be the preferred way to serialize Python
 | |
| objects.  \module{marshal} exists primarily to support Python's
 | |
| \file{.pyc} files.
 | |
| 
 | |
| The \module{pickle} module differs from \refmodule{marshal} several
 | |
| significant ways:
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item The \module{pickle} module keeps track of the objects it has
 | |
|       already serialized, so that later references to the same object
 | |
|       won't be serialized again.  \module{marshal} doesn't do this.
 | |
| 
 | |
|       This has implications both for recursive objects and object
 | |
|       sharing.  Recursive objects are objects that contain references
 | |
|       to themselves.  These are not handled by marshal, and in fact,
 | |
|       attempting to marshal recursive objects will crash your Python
 | |
|       interpreter.  Object sharing happens when there are multiple
 | |
|       references to the same object in different places in the object
 | |
|       hierarchy being serialized.  \module{pickle} stores such objects
 | |
|       only once, and ensures that all other references point to the
 | |
|       master copy.  Shared objects remain shared, which can be very
 | |
|       important for mutable objects.
 | |
| 
 | |
| \item \module{marshal} cannot be used to serialize user-defined
 | |
|       classes and their instances.  \module{pickle} can save and
 | |
|       restore class instances transparently, however the class
 | |
|       definition must be importable and live in the same module as
 | |
|       when the object was stored.
 | |
| 
 | |
| \item The \module{marshal} serialization format is not guaranteed to
 | |
|       be portable across Python versions.  Because its primary job in
 | |
|       life is to support \file{.pyc} files, the Python implementers
 | |
|       reserve the right to change the serialization format in
 | |
|       non-backwards compatible ways should the need arise.  The
 | |
|       \module{pickle} serialization format is guaranteed to be
 | |
|       backwards compatible across Python releases.
 | |
| 
 | |
| \item The \module{pickle} module doesn't handle code objects, which
 | |
|       the \module{marshal} module does.  This avoids the possibility
 | |
|       of smuggling Trojan horses into a program through the
 | |
|       \module{pickle} module\footnote{This doesn't necessarily imply
 | |
|       that \module{pickle} is inherently secure.  See
 | |
|       section~\ref{pickle-sec} for a more detailed discussion on
 | |
|       \module{pickle} module security.  Besides, it's possible that
 | |
|       \module{pickle} will eventually support serializing code
 | |
|       objects.}.
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| Note that serialization is a more primitive notion than persistence;
 | |
| although
 | |
| \module{pickle} reads and writes file objects, it does not handle the
 | |
| issue of naming persistent objects, nor the (even more complicated)
 | |
| issue of concurrent access to persistent objects.  The \module{pickle}
 | |
| module can transform a complex object into a byte stream and it can
 | |
| transform the byte stream into an object with the same internal
 | |
| structure.  Perhaps the most obvious thing to do with these byte
 | |
| streams is to write them onto a file, but it is also conceivable to
 | |
| send them across a network or store them in a database.  The module
 | |
| \refmodule{shelve} provides a simple interface
 | |
| to pickle and unpickle objects on DBM-style database files.
 | |
| 
 | |
| \subsection{Data stream format}
 | |
| 
 | |
| The data format used by \module{pickle} is Python-specific.  This has
 | |
| the advantage that there are no restrictions imposed by external
 | |
| standards such as XDR\index{XDR}\index{External Data Representation}
 | |
| (which can't represent pointer sharing); however it means that
 | |
| non-Python programs may not be able to reconstruct pickled Python
 | |
| objects.
 | |
| 
 | |
| By default, the \module{pickle} data format uses a printable \ASCII{}
 | |
| representation.  This is slightly more voluminous than a binary
 | |
| representation.  The big advantage of using printable \ASCII{} (and of
 | |
| some other characteristics of \module{pickle}'s representation) is that
 | |
| for debugging or recovery purposes it is possible for a human to read
 | |
| the pickled file with a standard text editor.
 | |
| 
 | |
| There are currently 3 different protocols which can be used for pickling.
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item Protocol version 0 is the original ASCII protocol and is backwards
 | |
| compatible with earlier versions of Python.
 | |
| 
 | |
| \item Protocol version 1 is the old binary format which is also compatible
 | |
| with earlier versions of Python.
 | |
| 
 | |
| \item Protocol version 2 was introduced in Python 2.3.  It provides
 | |
| much more efficient pickling of new-style classes.
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| Refer to PEP 307 for more information.
 | |
| 
 | |
| If a \var{protocol} is not specified, protocol 0 is used.
 | |
| If \var{protocol} is specified as a negative value
 | |
| or \constant{HIGHEST_PROTOCOL},
 | |
| the highest protocol version available will be used.
 | |
| 
 | |
| \versionchanged[The \var{bin} parameter is deprecated and only provided
 | |
| for backwards compatibility.  You should use the \var{protocol}
 | |
| parameter instead]{2.3}
 | |
| 
 | |
| A binary format, which is slightly more efficient, can be chosen by
 | |
| specifying a true value for the \var{bin} argument to the
 | |
| \class{Pickler} constructor or the \function{dump()} and \function{dumps()}
 | |
| functions.  A \var{protocol} version >= 1 implies use of a binary format.
 | |
| 
 | |
| \subsection{Usage}
 | |
| 
 | |
| To serialize an object hierarchy, you first create a pickler, then you
 | |
| call the pickler's \method{dump()} method.  To de-serialize a data
 | |
| stream, you first create an unpickler, then you call the unpickler's
 | |
| \method{load()} method.  The \module{pickle} module provides the
 | |
| following constant:
 | |
| 
 | |
| \begin{datadesc}{HIGHEST_PROTOCOL}
 | |
| The highest protocol version available.  This value can be passed
 | |
| as a \var{protocol} value.
 | |
| \end{datadesc}
 | |
| 
 | |
| The \module{pickle} module provides the
 | |
| following functions to make this process more convenient:
 | |
| 
 | |
| \begin{funcdesc}{dump}{object, file\optional{, protocol\optional{, bin}}}
 | |
| Write a pickled representation of \var{object} to the open file object
 | |
| \var{file}.  This is equivalent to
 | |
| \code{Pickler(\var{file}, \var{protocol}, \var{bin}).dump(\var{object})}.
 | |
| 
 | |
| If the \var{protocol} parameter is ommitted, protocol 0 is used.
 | |
| If \var{protocol} is specified as a negative value
 | |
| or \constant{HIGHEST_PROTOCOL},
 | |
| the highest protocol version will be used.
 | |
| 
 | |
| \versionchanged[The \var{protocol} parameter was added.
 | |
| The \var{bin} parameter is deprecated and only provided
 | |
| for backwards compatibility.  You should use the \var{protocol}
 | |
| parameter instead]{2.3}
 | |
| 
 | |
| If the optional \var{bin} argument is true, the binary pickle format
 | |
| is used; otherwise the (less efficient) text pickle format is used
 | |
| (for backwards compatibility, this is the default).
 | |
| 
 | |
| \var{file} must have a \method{write()} method that accepts a single
 | |
| string argument.  It can thus be a file object opened for writing, a
 | |
| \refmodule{StringIO} object, or any other custom
 | |
| object that meets this interface.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{load}{file}
 | |
| Read a string from the open file object \var{file} and interpret it as
 | |
| a pickle data stream, reconstructing and returning the original object
 | |
| hierarchy.  This is equivalent to \code{Unpickler(\var{file}).load()}.
 | |
| 
 | |
| \var{file} must have two methods, a \method{read()} method that takes
 | |
| an integer argument, and a \method{readline()} method that requires no
 | |
| arguments.  Both methods should return a string.  Thus \var{file} can
 | |
| be a file object opened for reading, a
 | |
| \module{StringIO} object, or any other custom
 | |
| object that meets this interface.
 | |
| 
 | |
| This function automatically determines whether the data stream was
 | |
| written in binary mode or not.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{dumps}{object\optional{, protocol\optional{, bin}}}
 | |
| Return the pickled representation of the object as a string, instead
 | |
| of writing it to a file.
 | |
| 
 | |
| If the \var{protocol} parameter is ommitted, protocol 0 is used.
 | |
| If \var{protocol} is specified as a negative value
 | |
| or \constant{HIGHEST_PROTOCOL},
 | |
| the highest protocol version will be used.
 | |
| 
 | |
| \versionchanged[The \var{protocol} parameter was added.
 | |
| The \var{bin} parameter is deprecated and only provided
 | |
| for backwards compatibility.  You should use the \var{protocol}
 | |
| parameter instead]{2.3}
 | |
| 
 | |
| If the optional \var{bin} argument is
 | |
| true, the binary pickle format is used; otherwise the (less efficient)
 | |
| text pickle format is used (this is the default).
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{loads}{string}
 | |
| Read a pickled object hierarchy from a string.  Characters in the
 | |
| string past the pickled object's representation are ignored.
 | |
| \end{funcdesc}
 | |
| 
 | |
| The \module{pickle} module also defines three exceptions:
 | |
| 
 | |
| \begin{excdesc}{PickleError}
 | |
| A common base class for the other exceptions defined below.  This
 | |
| inherits from \exception{Exception}.
 | |
| \end{excdesc}
 | |
| 
 | |
| \begin{excdesc}{PicklingError}
 | |
| This exception is raised when an unpicklable object is passed to
 | |
| the \method{dump()} method.
 | |
| \end{excdesc}
 | |
| 
 | |
| \begin{excdesc}{UnpicklingError}
 | |
| This exception is raised when there is a problem unpickling an object,
 | |
| such as a security violation.  Note that other exceptions may also be
 | |
| raised during unpickling, including (but not necessarily limited to)
 | |
| \exception{AttributeError}, \exception{EOFError},
 | |
| \exception{ImportError}, and \exception{IndexError}.
 | |
| \end{excdesc}
 | |
| 
 | |
| The \module{pickle} module also exports two callables\footnote{In the
 | |
| \module{pickle} module these callables are classes, which you could
 | |
| subclass to customize the behavior.  However, in the \module{cPickle}
 | |
| modules these callables are factory functions and so cannot be
 | |
| subclassed.  One of the common reasons to subclass is to control what
 | |
| objects can actually be unpickled.  See section~\ref{pickle-sec} for
 | |
| more details on security concerns.}, \class{Pickler} and
 | |
| \class{Unpickler}:
 | |
| 
 | |
| \begin{classdesc}{Pickler}{file\optional{, protocol\optional{, bin}}}
 | |
| This takes a file-like object to which it will write a pickle data
 | |
| stream.  
 | |
| 
 | |
| If the \var{protocol} parameter is ommitted, protocol 0 is used.
 | |
| If \var{protocol} is specified as a negative value,
 | |
| the highest protocol version will be used.
 | |
| 
 | |
| \versionchanged[The \var{bin} parameter is deprecated and only provided
 | |
| for backwards compatibility.  You should use the \var{protocol}
 | |
| parameter instead]{2.3}
 | |
| 
 | |
| Optional \var{bin} if true, tells the pickler to use the more
 | |
| efficient binary pickle format, otherwise the \ASCII{} format is used
 | |
| (this is the default).
 | |
| 
 | |
| \var{file} must have a \method{write()} method that accepts a single
 | |
| string argument.  It can thus be an open file object, a
 | |
| \module{StringIO} object, or any other custom
 | |
| object that meets this interface.
 | |
| \end{classdesc}
 | |
| 
 | |
| \class{Pickler} objects define one (or two) public methods:
 | |
| 
 | |
| \begin{methoddesc}[Pickler]{dump}{object}
 | |
| Write a pickled representation of \var{object} to the open file object
 | |
| given in the constructor.  Either the binary or \ASCII{} format will
 | |
| be used, depending on the value of the \var{bin} flag passed to the
 | |
| constructor.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}[Pickler]{clear_memo}{}
 | |
| Clears the pickler's ``memo''.  The memo is the data structure that
 | |
| remembers which objects the pickler has already seen, so that shared
 | |
| or recursive objects pickled by reference and not by value.  This
 | |
| method is useful when re-using picklers.
 | |
| 
 | |
| \begin{notice}
 | |
| Prior to Python 2.3, \method{clear_memo()} was only available on the
 | |
| picklers created by \refmodule{cPickle}.  In the \module{pickle} module,
 | |
| picklers have an instance variable called \member{memo} which is a
 | |
| Python dictionary.  So to clear the memo for a \module{pickle} module
 | |
| pickler, you could do the following:
 | |
| 
 | |
| \begin{verbatim}
 | |
| mypickler.memo.clear()
 | |
| \end{verbatim}
 | |
| 
 | |
| Code that does not need to support older versions of Python should
 | |
| simply use \method{clear_memo()}.
 | |
| \end{notice}
 | |
| \end{methoddesc}
 | |
| 
 | |
| It is possible to make multiple calls to the \method{dump()} method of
 | |
| the same \class{Pickler} instance.  These must then be matched to the
 | |
| same number of calls to the \method{load()} method of the
 | |
| corresponding \class{Unpickler} instance.  If the same object is
 | |
| pickled by multiple \method{dump()} calls, the \method{load()} will
 | |
| all yield references to the same object\footnote{\emph{Warning}: this
 | |
| is intended for pickling multiple objects without intervening
 | |
| modifications to the objects or their parts.  If you modify an object
 | |
| and then pickle it again using the same \class{Pickler} instance, the
 | |
| object is not pickled again --- a reference to it is pickled and the
 | |
| \class{Unpickler} will return the old value, not the modified one.
 | |
| There are two problems here: (1) detecting changes, and (2)
 | |
| marshalling a minimal set of changes.  Garbage Collection may also
 | |
| become a problem here.}.
 | |
| 
 | |
| \class{Unpickler} objects are defined as:
 | |
| 
 | |
| \begin{classdesc}{Unpickler}{file}
 | |
| This takes a file-like object from which it will read a pickle data
 | |
| stream.  This class automatically determines whether the data stream
 | |
| was written in binary mode or not, so it does not need a flag as in
 | |
| the \class{Pickler} factory.
 | |
| 
 | |
| \var{file} must have two methods, a \method{read()} method that takes
 | |
| an integer argument, and a \method{readline()} method that requires no
 | |
| arguments.  Both methods should return a string.  Thus \var{file} can
 | |
| be a file object opened for reading, a
 | |
| \module{StringIO} object, or any other custom
 | |
| object that meets this interface.
 | |
| \end{classdesc}
 | |
| 
 | |
| \class{Unpickler} objects have one (or two) public methods:
 | |
| 
 | |
| \begin{methoddesc}[Unpickler]{load}{}
 | |
| Read a pickled object representation from the open file object given
 | |
| in the constructor, and return the reconstituted object hierarchy
 | |
| specified therein.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}[Unpickler]{noload}{}
 | |
| This is just like \method{load()} except that it doesn't actually
 | |
| create any objects.  This is useful primarily for finding what's
 | |
| called ``persistent ids'' that may be referenced in a pickle data
 | |
| stream.  See section~\ref{pickle-protocol} below for more details.
 | |
| 
 | |
| \strong{Note:} the \method{noload()} method is currently only
 | |
| available on \class{Unpickler} objects created with the
 | |
| \module{cPickle} module.  \module{pickle} module \class{Unpickler}s do
 | |
| not have the \method{noload()} method.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \subsection{What can be pickled and unpickled?}
 | |
| 
 | |
| The following types can be pickled:
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item \code{None}, \code{True}, and \code{False}
 | |
| 
 | |
| \item integers, long integers, floating point numbers, complex numbers
 | |
| 
 | |
| \item normal and Unicode strings
 | |
| 
 | |
| \item tuples, lists, and dictionaries containing only picklable objects
 | |
| 
 | |
| \item functions defined at the top level of a module
 | |
| 
 | |
| \item built-in functions defined at the top level of a module
 | |
| 
 | |
| \item classes that are defined at the top level of a module
 | |
| 
 | |
| \item instances of such classes whose \member{__dict__} or
 | |
| \method{__setstate__()} is picklable  (see
 | |
| section~\ref{pickle-protocol} for details)
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| Attempts to pickle unpicklable objects will raise the
 | |
| \exception{PicklingError} exception; when this happens, an unspecified
 | |
| number of bytes may have already been written to the underlying file.
 | |
| 
 | |
| Note that functions (built-in and user-defined) are pickled by ``fully
 | |
| qualified'' name reference, not by value.  This means that only the
 | |
| function name is pickled, along with the name of module the function
 | |
| is defined in.  Neither the function's code, nor any of its function
 | |
| attributes are pickled.  Thus the defining module must be importable
 | |
| in the unpickling environment, and the module must contain the named
 | |
| object, otherwise an exception will be raised\footnote{The exception
 | |
| raised will likely be an \exception{ImportError} or an
 | |
| \exception{AttributeError} but it could be something else.}.
 | |
| 
 | |
| Similarly, classes are pickled by named reference, so the same
 | |
| restrictions in the unpickling environment apply.  Note that none of
 | |
| the class's code or data is pickled, so in the following example the
 | |
| class attribute \code{attr} is not restored in the unpickling
 | |
| environment:
 | |
| 
 | |
| \begin{verbatim}
 | |
| class Foo:
 | |
|     attr = 'a class attr'
 | |
| 
 | |
| picklestring = pickle.dumps(Foo)
 | |
| \end{verbatim}
 | |
| 
 | |
| These restrictions are why picklable functions and classes must be
 | |
| defined in the top level of a module.
 | |
| 
 | |
| Similarly, when class instances are pickled, their class's code and
 | |
| data are not pickled along with them.  Only the instance data are
 | |
| pickled.  This is done on purpose, so you can fix bugs in a class or
 | |
| add methods to the class and still load objects that were created with
 | |
| an earlier version of the class.  If you plan to have long-lived
 | |
| objects that will see many versions of a class, it may be worthwhile
 | |
| to put a version number in the objects so that suitable conversions
 | |
| can be made by the class's \method{__setstate__()} method.
 | |
| 
 | |
| \subsection{The pickle protocol
 | |
| \label{pickle-protocol}}\setindexsubitem{(pickle protocol)}
 | |
| 
 | |
| This section describes the ``pickling protocol'' that defines the
 | |
| interface between the pickler/unpickler and the objects that are being
 | |
| serialized.  This protocol provides a standard way for you to define,
 | |
| customize, and control how your objects are serialized and
 | |
| de-serialized.  The description in this section doesn't cover specific
 | |
| customizations that you can employ to make the unpickling environment
 | |
| safer from untrusted pickle data streams; see section~\ref{pickle-sec}
 | |
| for more details.
 | |
| 
 | |
| \subsubsection{Pickling and unpickling normal class
 | |
|     instances\label{pickle-inst}}
 | |
| 
 | |
| When a pickled class instance is unpickled, its \method{__init__()}
 | |
| method is normally \emph{not} invoked.  If it is desirable that the
 | |
| \method{__init__()} method be called on unpickling, a class can define
 | |
| a method \method{__getinitargs__()}, which should return a
 | |
| \emph{tuple} containing the arguments to be passed to the class
 | |
| constructor (i.e. \method{__init__()}).  The
 | |
| \method{__getinitargs__()} method is called at
 | |
| pickle time; the tuple it returns is incorporated in the pickle for
 | |
| the instance.
 | |
| \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}}
 | |
| \withsubitem{(instance constructor)}{\ttindex{__init__()}}
 | |
| 
 | |
| \withsubitem{(copy protocol)}{
 | |
|   \ttindex{__getstate__()}\ttindex{__setstate__()}}
 | |
| \withsubitem{(instance attribute)}{
 | |
|   \ttindex{__dict__}}
 | |
| 
 | |
| Classes can further influence how their instances are pickled; if the
 | |
| class defines the method \method{__getstate__()}, it is called and the
 | |
| return state is pickled as the contents for the instance, instead of
 | |
| the contents of the instance's dictionary.  If there is no
 | |
| \method{__getstate__()} method, the instance's \member{__dict__} is
 | |
| pickled.
 | |
| 
 | |
| Upon unpickling, if the class also defines the method
 | |
| \method{__setstate__()}, it is called with the unpickled
 | |
| state\footnote{These methods can also be used to implement copying
 | |
| class instances.}.  If there is no \method{__setstate__()} method, the
 | |
| pickled state must be a dictionary and its items are assigned to the
 | |
| new instance's dictionary.  If a class defines both
 | |
| \method{__getstate__()} and \method{__setstate__()}, the state object
 | |
| needn't be a dictionary and these methods can do what they
 | |
| want.\footnote{This protocol is also used by the shallow and deep
 | |
| copying operations defined in the
 | |
| \refmodule{copy} module.}
 | |
| 
 | |
| \begin{notice}[warning]
 | |
|   For new-style classes, if \method{__getstate__()} returns a false
 | |
|   value, the \method{__setstate__()} method will not be called.
 | |
| \end{notice}
 | |
| 
 | |
| 
 | |
| \subsubsection{Pickling and unpickling extension types}
 | |
| 
 | |
| When the \class{Pickler} encounters an object of a type it knows
 | |
| nothing about --- such as an extension type --- it looks in two places
 | |
| for a hint of how to pickle it.  One alternative is for the object to
 | |
| implement a \method{__reduce__()} method.  If provided, at pickling
 | |
| time \method{__reduce__()} will be called with no arguments, and it
 | |
| must return either a string or a tuple.
 | |
| 
 | |
| If a string is returned, it names a global variable whose contents are
 | |
| pickled as normal.  When a tuple is returned, it must be of length two
 | |
| or three, with the following semantics:
 | |
| 
 | |
| \begin{itemize}
 | |
| 
 | |
| \item A callable object, which in the unpickling environment must be
 | |
|       either a class, a callable registered as a ``safe constructor''
 | |
|       (see below), or it must have an attribute
 | |
|       \member{__safe_for_unpickling__} with a true value.  Otherwise,
 | |
|       an \exception{UnpicklingError} will be raised in the unpickling
 | |
|       environment.  Note that as usual, the callable itself is pickled
 | |
|       by name.
 | |
| 
 | |
| \item A tuple of arguments for the callable object, or \code{None}.
 | |
| \deprecated{2.3}{Use the tuple of arguments instead}								
 | |
| 
 | |
| \item Optionally, the object's state, which will be passed to
 | |
|       the object's \method{__setstate__()} method as described in
 | |
|       section~\ref{pickle-inst}.  If the object has no
 | |
|       \method{__setstate__()} method, then, as above, the value must
 | |
|       be a dictionary and it will be added to the object's
 | |
|       \member{__dict__}.
 | |
| 
 | |
| \end{itemize}
 | |
| 
 | |
| Upon unpickling, the callable will be called (provided that it meets
 | |
| the above criteria), passing in the tuple of arguments; it should
 | |
| return the unpickled object.
 | |
| 
 | |
| If the second item was \code{None}, then instead of calling the
 | |
| callable directly, its \method{__basicnew__()} method is called
 | |
| without arguments.  It should also return the unpickled object.
 | |
| 
 | |
| \deprecated{2.3}{Use the tuple of arguments instead}
 | |
| 
 | |
| An alternative to implementing a \method{__reduce__()} method on the
 | |
| object to be pickled, is to register the callable with the
 | |
| \refmodule[copyreg]{copy_reg} module.  This module provides a way
 | |
| for programs to register ``reduction functions'' and constructors for
 | |
| user-defined types.   Reduction functions have the same semantics and
 | |
| interface as the \method{__reduce__()} method described above, except
 | |
| that they are called with a single argument, the object to be pickled.
 | |
| 
 | |
| The registered constructor is deemed a ``safe constructor'' for purposes
 | |
| of unpickling as described above.
 | |
| 
 | |
| \subsubsection{Pickling and unpickling external objects}
 | |
| 
 | |
| For the benefit of object persistence, the \module{pickle} module
 | |
| supports the notion of a reference to an object outside the pickled
 | |
| data stream.  Such objects are referenced by a ``persistent id'',
 | |
| which is just an arbitrary string of printable \ASCII{} characters.
 | |
| The resolution of such names is not defined by the \module{pickle}
 | |
| module; it will delegate this resolution to user defined functions on
 | |
| the pickler and unpickler\footnote{The actual mechanism for
 | |
| associating these user defined functions is slightly different for
 | |
| \module{pickle} and \module{cPickle}.  The description given here
 | |
| works the same for both implementations.  Users of the \module{pickle}
 | |
| module could also use subclassing to effect the same results,
 | |
| overriding the \method{persistent_id()} and \method{persistent_load()}
 | |
| methods in the derived classes.}.
 | |
| 
 | |
| To define external persistent id resolution, you need to set the
 | |
| \member{persistent_id} attribute of the pickler object and the
 | |
| \member{persistent_load} attribute of the unpickler object.
 | |
| 
 | |
| To pickle objects that have an external persistent id, the pickler
 | |
| must have a custom \function{persistent_id()} method that takes an
 | |
| object as an argument and returns either \code{None} or the persistent
 | |
| id for that object.  When \code{None} is returned, the pickler simply
 | |
| pickles the object as normal.  When a persistent id string is
 | |
| returned, the pickler will pickle that string, along with a marker
 | |
| so that the unpickler will recognize the string as a persistent id.
 | |
| 
 | |
| To unpickle external objects, the unpickler must have a custom
 | |
| \function{persistent_load()} function that takes a persistent id
 | |
| string and returns the referenced object.
 | |
| 
 | |
| Here's a silly example that \emph{might} shed more light:
 | |
| 
 | |
| \begin{verbatim}
 | |
| import pickle
 | |
| from cStringIO import StringIO
 | |
| 
 | |
| src = StringIO()
 | |
| p = pickle.Pickler(src)
 | |
| 
 | |
| def persistent_id(obj):
 | |
|     if hasattr(obj, 'x'):
 | |
|         return 'the value %d' % obj.x
 | |
|     else:
 | |
|         return None
 | |
| 
 | |
| p.persistent_id = persistent_id
 | |
| 
 | |
| class Integer:
 | |
|     def __init__(self, x):
 | |
|         self.x = x
 | |
|     def __str__(self):
 | |
|         return 'My name is integer %d' % self.x
 | |
| 
 | |
| i = Integer(7)
 | |
| print i
 | |
| p.dump(i)
 | |
| 
 | |
| datastream = src.getvalue()
 | |
| print repr(datastream)
 | |
| dst = StringIO(datastream)
 | |
| 
 | |
| up = pickle.Unpickler(dst)
 | |
| 
 | |
| class FancyInteger(Integer):
 | |
|     def __str__(self):
 | |
|         return 'I am the integer %d' % self.x
 | |
| 
 | |
| def persistent_load(persid):
 | |
|     if persid.startswith('the value '):
 | |
|         value = int(persid.split()[2])
 | |
|         return FancyInteger(value)
 | |
|     else:
 | |
|         raise pickle.UnpicklingError, 'Invalid persistent id'
 | |
| 
 | |
| up.persistent_load = persistent_load
 | |
| 
 | |
| j = up.load()
 | |
| print j
 | |
| \end{verbatim}
 | |
| 
 | |
| In the \module{cPickle} module, the unpickler's
 | |
| \member{persistent_load} attribute can also be set to a Python
 | |
| list, in which case, when the unpickler reaches a persistent id, the
 | |
| persistent id string will simply be appended to this list.  This
 | |
| functionality exists so that a pickle data stream can be ``sniffed''
 | |
| for object references without actually instantiating all the objects
 | |
| in a pickle\footnote{We'll leave you with the image of Guido and Jim
 | |
| sitting around sniffing pickles in their living rooms.}.  Setting
 | |
| \member{persistent_load} to a list is usually used in conjunction with
 | |
| the \method{noload()} method on the Unpickler.
 | |
| 
 | |
| % BAW: Both pickle and cPickle support something called
 | |
| % inst_persistent_id() which appears to give unknown types a second
 | |
| % shot at producing a persistent id.  Since Jim Fulton can't remember
 | |
| % why it was added or what it's for, I'm leaving it undocumented.
 | |
| 
 | |
| \subsection{Security \label{pickle-sec}}
 | |
| 
 | |
| Most of the security issues surrounding the \module{pickle} and
 | |
| \module{cPickle} module involve unpickling.  There are no known
 | |
| security vulnerabilities
 | |
| related to pickling because you (the programmer) control the objects
 | |
| that \module{pickle} will interact with, and all it produces is a
 | |
| string.
 | |
| 
 | |
| However, for unpickling, it is \strong{never} a good idea to unpickle
 | |
| an untrusted string whose origins are dubious, for example, strings
 | |
| read from a socket.  This is because unpickling can create unexpected
 | |
| objects and even potentially run methods of those objects, such as
 | |
| their class constructor or destructor\footnote{A special note of
 | |
| caution is worth raising about the \refmodule{Cookie}
 | |
| module.  By default, the \class{Cookie.Cookie} class is an alias for
 | |
| the \class{Cookie.SmartCookie} class, which ``helpfully'' attempts to
 | |
| unpickle any cookie data string it is passed.  This is a huge security
 | |
| hole because cookie data typically comes from an untrusted source.
 | |
| You should either explicitly use the \class{Cookie.SimpleCookie} class
 | |
| --- which doesn't attempt to unpickle its string --- or you should
 | |
| implement the defensive programming steps described later on in this
 | |
| section.}.
 | |
| 
 | |
| You can defend against this by customizing your unpickler so that you
 | |
| can control exactly what gets unpickled and what gets called.
 | |
| Unfortunately, exactly how you do this is different depending on
 | |
| whether you're using \module{pickle} or \module{cPickle}.
 | |
| 
 | |
| One common feature that both modules implement is the
 | |
| \member{__safe_for_unpickling__} attribute.  Before calling a callable
 | |
| which is not a class, the unpickler will check to make sure that the
 | |
| callable has either been registered as a safe callable via the
 | |
| \refmodule[copyreg]{copy_reg} module, or that it has an
 | |
| attribute \member{__safe_for_unpickling__} with a true value.  This
 | |
| prevents the unpickling environment from being tricked into doing
 | |
| evil things like call \code{os.unlink()} with an arbitrary file name.
 | |
| See section~\ref{pickle-protocol} for more details.
 | |
| 
 | |
| For safely unpickling class instances, you need to control exactly
 | |
| which classes will get created.  Be aware that a class's constructor
 | |
| could be called (if the pickler found a \method{__getinitargs__()}
 | |
| method) and the the class's destructor (i.e. its \method{__del__()} method)
 | |
| might get called when the object is garbage collected.  Depending on
 | |
| the class, it isn't very heard to trick either method into doing bad
 | |
| things, such as removing a file.  The way to
 | |
| control the classes that are safe to instantiate differs in
 | |
| \module{pickle} and \module{cPickle}\footnote{A word of caution: the
 | |
| mechanisms described here use internal attributes and methods, which
 | |
| are subject to change in future versions of Python.  We intend to
 | |
| someday provide a common interface for controlling this behavior,
 | |
| which will work in either \module{pickle} or \module{cPickle}.}.
 | |
| 
 | |
| In the \module{pickle} module, you need to derive a subclass from
 | |
| \class{Unpickler}, overriding the \method{load_global()}
 | |
| method.  \method{load_global()} should read two lines from the pickle
 | |
| data stream where the first line will the the name of the module
 | |
| containing the class and the second line will be the name of the
 | |
| instance's class.  It then look up the class, possibly importing the
 | |
| module and digging out the attribute, then it appends what it finds to
 | |
| the unpickler's stack.  Later on, this class will be assigned to the
 | |
| \member{__class__} attribute of an empty class, as a way of magically
 | |
| creating an instance without calling its class's \method{__init__()}.
 | |
| You job (should you choose to accept it), would be to have
 | |
| \method{load_global()} push onto the unpickler's stack, a known safe
 | |
| version of any class you deem safe to unpickle.  It is up to you to
 | |
| produce such a class.  Or you could raise an error if you want to
 | |
| disallow all unpickling of instances.  If this sounds like a hack,
 | |
| you're right.  UTSL.
 | |
| 
 | |
| Things are a little cleaner with \module{cPickle}, but not by much.
 | |
| To control what gets unpickled, you can set the unpickler's
 | |
| \member{find_global} attribute to a function or \code{None}.  If it is
 | |
| \code{None} then any attempts to unpickle instances will raise an
 | |
| \exception{UnpicklingError}.  If it is a function,
 | |
| then it should accept a module name and a class name, and return the
 | |
| corresponding class object.  It is responsible for looking up the
 | |
| class, again performing any necessary imports, and it may raise an
 | |
| error to prevent instances of the class from being unpickled.
 | |
| 
 | |
| The moral of the story is that you should be really careful about the
 | |
| source of the strings your application unpickles.
 | |
| 
 | |
| \subsection{Example \label{pickle-example}}
 | |
| 
 | |
| Here's a simple example of how to modify pickling behavior for a
 | |
| class.  The \class{TextReader} class opens a text file, and returns
 | |
| the line number and line contents each time its \method{readline()}
 | |
| method is called. If a \class{TextReader} instance is pickled, all
 | |
| attributes \emph{except} the file object member are saved. When the
 | |
| instance is unpickled, the file is reopened, and reading resumes from
 | |
| the last location. The \method{__setstate__()} and
 | |
| \method{__getstate__()} methods are used to implement this behavior.
 | |
| 
 | |
| \begin{verbatim}
 | |
| class TextReader:
 | |
|     """Print and number lines in a text file."""
 | |
|     def __init__(self, file):
 | |
|         self.file = file
 | |
|         self.fh = open(file)
 | |
|         self.lineno = 0
 | |
| 
 | |
|     def readline(self):
 | |
|         self.lineno = self.lineno + 1
 | |
|         line = self.fh.readline()
 | |
|         if not line:
 | |
|             return None
 | |
|         if line.endswith("\n"):
 | |
|             line = line[:-1]
 | |
|         return "%d: %s" % (self.lineno, line)
 | |
| 
 | |
|     def __getstate__(self):
 | |
|         odict = self.__dict__.copy() # copy the dict since we change it
 | |
|         del odict['fh']              # remove filehandle entry
 | |
|         return odict
 | |
| 
 | |
|     def __setstate__(self,dict):
 | |
|         fh = open(dict['file'])      # reopen file
 | |
|         count = dict['lineno']       # read from file...
 | |
|         while count:                 # until line count is restored
 | |
|             fh.readline()
 | |
|             count = count - 1
 | |
|         self.__dict__.update(dict)   # update attributes
 | |
|         self.fh = fh                 # save the file object
 | |
| \end{verbatim}
 | |
| 
 | |
| A sample usage might be something like this:
 | |
| 
 | |
| \begin{verbatim}
 | |
| >>> import TextReader
 | |
| >>> obj = TextReader.TextReader("TextReader.py")
 | |
| >>> obj.readline()
 | |
| '1: #!/usr/local/bin/python'
 | |
| >>> # (more invocations of obj.readline() here)
 | |
| ... obj.readline()
 | |
| '7: class TextReader:'
 | |
| >>> import pickle
 | |
| >>> pickle.dump(obj,open('save.p','w'))
 | |
| \end{verbatim}
 | |
| 
 | |
| If you want to see that \refmodule{pickle} works across Python
 | |
| processes, start another Python session, before continuing.  What
 | |
| follows can happen from either the same process or a new process.
 | |
| 
 | |
| \begin{verbatim}
 | |
| >>> import pickle
 | |
| >>> reader = pickle.load(open('save.p'))
 | |
| >>> reader.readline()
 | |
| '8:     "Print and number lines in a text file."'
 | |
| \end{verbatim}
 | |
| 
 | |
| 
 | |
| \begin{seealso}
 | |
|   \seemodule[copyreg]{copy_reg}{Pickle interface constructor
 | |
|                                 registration for extension types.}
 | |
| 
 | |
|   \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.}
 | |
| 
 | |
|   \seemodule{copy}{Shallow and deep object copying.}
 | |
| 
 | |
|   \seemodule{marshal}{High-performance serialization of built-in types.}
 | |
| \end{seealso}
 | |
| 
 | |
| 
 | |
| \section{\module{cPickle} --- A faster \module{pickle}}
 | |
| 
 | |
| \declaremodule{builtin}{cPickle}
 | |
| \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.}
 | |
| \moduleauthor{Jim Fulton}{jfulton@digicool.com}
 | |
| \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org}
 | |
| 
 | |
| The \module{cPickle} module supports serialization and
 | |
| de-serialization of Python objects, providing an interface and
 | |
| functionality nearly identical to the
 | |
| \refmodule{pickle}\refstmodindex{pickle} module.  There are several
 | |
| differences, the most important being performance and subclassability.
 | |
| 
 | |
| First, \module{cPickle} can be up to 1000 times faster than
 | |
| \module{pickle} because the former is implemented in C.  Second, in
 | |
| the \module{cPickle} module the callables \function{Pickler()} and
 | |
| \function{Unpickler()} are functions, not classes.  This means that
 | |
| you cannot use them to derive custom pickling and unpickling
 | |
| subclasses.  Most applications have no need for this functionality and
 | |
| should benefit from the greatly improved performance of the
 | |
| \module{cPickle} module.
 | |
| 
 | |
| The pickle data stream produced by \module{pickle} and
 | |
| \module{cPickle} are identical, so it is possible to use
 | |
| \module{pickle} and \module{cPickle} interchangeably with existing
 | |
| pickles\footnote{Since the pickle data format is actually a tiny
 | |
| stack-oriented programming language, and some freedom is taken in the
 | |
| encodings of certain objects, it is possible that the two modules
 | |
| produce different data streams for the same input objects.  However it
 | |
| is guaranteed that they will always be able to read each other's
 | |
| data streams.}.
 | |
| 
 | |
| There are additional minor differences in API between \module{cPickle}
 | |
| and \module{pickle}, however for most applications, they are
 | |
| interchangable.  More documentation is provided in the
 | |
| \module{pickle} module documentation, which
 | |
| includes a list of the documented differences.
 | |
| 
 | |
| 
 | 
