| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \section{\module{pickle} --- Python object serialization} | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | \declaremodule{standard}{pickle} | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | \modulesynopsis{Convert Python objects to streams of bytes and back.} | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | % Substantial improvements by Jim Kerr <jbkerr@sr.hp.com>.
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | % Rewritten by Barry Warsaw <barry@zope.com>
 | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2000-07-16 19:01:10 +00:00
										 |  |  | \index{persistence} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | \indexii{persistent}{objects} | 
					
						
							|  |  |  | \indexii{serializing}{objects} | 
					
						
							|  |  |  | \indexii{marshalling}{objects} | 
					
						
							|  |  |  | \indexii{flattening}{objects} | 
					
						
							|  |  |  | \indexii{pickling}{objects} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | The \module{pickle} module implements a fundamental, but powerful | 
					
						
							|  |  |  | algorithm for serializing and de-serializing a Python object | 
					
						
							|  |  |  | structure.  ``Pickling'' is the process whereby a Python object | 
					
						
							|  |  |  | hierarchy is converted into a byte stream, and ``unpickling'' is the | 
					
						
							|  |  |  | inverse operation, whereby a byte stream is converted back into an | 
					
						
							|  |  |  | object hierarchy.  Pickling (and unpickling) is alternatively known as | 
					
						
							| 
									
										
										
										
											2001-11-26 21:30:36 +00:00
										 |  |  | ``serialization'', ``marshalling,''\footnote{Don't confuse this with | 
					
						
							|  |  |  | the \refmodule{marshal} module} or ``flattening'', | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | however the preferred term used here is ``pickling'' and | 
					
						
							|  |  |  | ``unpickling'' to avoid confusing. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This documentation describes both the \module{pickle} module and the  | 
					
						
							| 
									
										
										
										
											2001-11-26 21:30:36 +00:00
										 |  |  | \refmodule{cPickle} module. | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \subsection{Relationship to other Python modules} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The \module{pickle} module has an optimized cousin called the | 
					
						
							|  |  |  | \module{cPickle} module.  As its name implies, \module{cPickle} is | 
					
						
							|  |  |  | written in C, so it can be up to 1000 times faster than | 
					
						
							|  |  |  | \module{pickle}.  However it does not support subclassing of the | 
					
						
							|  |  |  | \function{Pickler()} and \function{Unpickler()} classes, because in | 
					
						
							|  |  |  | \module{cPickle} these are functions, not classes.  Most applications | 
					
						
							|  |  |  | have no need for this functionality, and can benefit from the improved | 
					
						
							|  |  |  | performance of \module{cPickle}.  Other than that, the interfaces of | 
					
						
							|  |  |  | the two modules are nearly identical; the common interface is | 
					
						
							|  |  |  | described in this manual and differences are pointed out where | 
					
						
							|  |  |  | necessary.  In the following discussions, we use the term ``pickle'' | 
					
						
							|  |  |  | to collectively describe the \module{pickle} and | 
					
						
							|  |  |  | \module{cPickle} modules. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The data streams the two modules produce are guaranteed to be | 
					
						
							|  |  |  | interchangeable. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Python has a more primitive serialization module called | 
					
						
							| 
									
										
										
										
											2001-11-26 21:30:36 +00:00
										 |  |  | \refmodule{marshal}, but in general | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \module{pickle} should always be the preferred way to serialize Python | 
					
						
							|  |  |  | objects.  \module{marshal} exists primarily to support Python's | 
					
						
							|  |  |  | \file{.pyc} files. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The \module{pickle} module differs from \refmodule{marshal} several | 
					
						
							|  |  |  | significant ways: | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{itemize} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item The \module{pickle} module keeps track of the objects it has | 
					
						
							|  |  |  |       already serialized, so that later references to the same object | 
					
						
							|  |  |  |       won't be serialized again.  \module{marshal} doesn't do this. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |       This has implications both for recursive objects and object | 
					
						
							|  |  |  |       sharing.  Recursive objects are objects that contain references | 
					
						
							|  |  |  |       to themselves.  These are not handled by marshal, and in fact, | 
					
						
							|  |  |  |       attempting to marshal recursive objects will crash your Python | 
					
						
							|  |  |  |       interpreter.  Object sharing happens when there are multiple | 
					
						
							|  |  |  |       references to the same object in different places in the object | 
					
						
							|  |  |  |       hierarchy being serialized.  \module{pickle} stores such objects | 
					
						
							|  |  |  |       only once, and ensures that all other references point to the | 
					
						
							|  |  |  |       master copy.  Shared objects remain shared, which can be very | 
					
						
							|  |  |  |       important for mutable objects. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \item \module{marshal} cannot be used to serialize user-defined | 
					
						
							|  |  |  |       classes and their instances.  \module{pickle} can save and | 
					
						
							|  |  |  |       restore class instances transparently, however the class | 
					
						
							|  |  |  |       definition must be importable and live in the same module as | 
					
						
							|  |  |  |       when the object was stored. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \item The \module{marshal} serialization format is not guaranteed to | 
					
						
							|  |  |  |       be portable across Python versions.  Because its primary job in | 
					
						
							|  |  |  |       life is to support \file{.pyc} files, the Python implementers | 
					
						
							|  |  |  |       reserve the right to change the serialization format in | 
					
						
							|  |  |  |       non-backwards compatible ways should the need arise.  The | 
					
						
							|  |  |  |       \module{pickle} serialization format is guaranteed to be | 
					
						
							|  |  |  |       backwards compatible across Python releases. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \item The \module{pickle} module doesn't handle code objects, which | 
					
						
							|  |  |  |       the \module{marshal} module does.  This avoids the possibility | 
					
						
							|  |  |  |       of smuggling Trojan horses into a program through the | 
					
						
							|  |  |  |       \module{pickle} module\footnote{This doesn't necessarily imply | 
					
						
							|  |  |  |       that \module{pickle} is inherently secure.  See | 
					
						
							|  |  |  |       section~\ref{pickle-sec} for a more detailed discussion on | 
					
						
							|  |  |  |       \module{pickle} module security.  Besides, it's possible that | 
					
						
							|  |  |  |       \module{pickle} will eventually support serializing code | 
					
						
							|  |  |  |       objects.}. | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \end{itemize} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | Note that serialization is a more primitive notion than persistence; | 
					
						
							|  |  |  | although | 
					
						
							|  |  |  | \module{pickle} reads and writes file objects, it does not handle the | 
					
						
							|  |  |  | issue of naming persistent objects, nor the (even more complicated) | 
					
						
							|  |  |  | issue of concurrent access to persistent objects.  The \module{pickle} | 
					
						
							|  |  |  | module can transform a complex object into a byte stream and it can | 
					
						
							|  |  |  | transform the byte stream into an object with the same internal | 
					
						
							|  |  |  | structure.  Perhaps the most obvious thing to do with these byte | 
					
						
							|  |  |  | streams is to write them onto a file, but it is also conceivable to | 
					
						
							|  |  |  | send them across a network or store them in a database.  The module | 
					
						
							|  |  |  | \refmodule{shelve} provides a simple interface | 
					
						
							|  |  |  | to pickle and unpickle objects on DBM-style database files. | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \subsection{Data stream format} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | The data format used by \module{pickle} is Python-specific.  This has | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | the advantage that there are no restrictions imposed by external | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | standards such as XDR\index{XDR}\index{External Data Representation} | 
					
						
							|  |  |  | (which can't represent pointer sharing); however it means that | 
					
						
							|  |  |  | non-Python programs may not be able to reconstruct pickled Python | 
					
						
							|  |  |  | objects. | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | By default, the \module{pickle} data format uses a printable \ASCII{} | 
					
						
							| 
									
										
										
										
											1997-12-09 20:45:08 +00:00
										 |  |  | representation.  This is slightly more voluminous than a binary | 
					
						
							|  |  |  | representation.  The big advantage of using printable \ASCII{} (and of | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | some other characteristics of \module{pickle}'s representation) is that | 
					
						
							| 
									
										
										
										
											1997-12-09 20:45:08 +00:00
										 |  |  | for debugging or recovery purposes it is possible for a human to read | 
					
						
							|  |  |  | the pickled file with a standard text editor. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A binary format, which is slightly more efficient, can be chosen by | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | specifying a true value for the \var{bin} argument to the | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | \class{Pickler} constructor or the \function{dump()} and \function{dumps()} | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | functions. | 
					
						
							| 
									
										
										
										
											1999-07-02 14:25:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \subsection{Usage} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | To serialize an object hierarchy, you first create a pickler, then you | 
					
						
							|  |  |  | call the pickler's \method{dump()} method.  To de-serialize a data | 
					
						
							|  |  |  | stream, you first create an unpickler, then you call the unpickler's | 
					
						
							|  |  |  | \method{load()} method.  The \module{pickle} module provides the | 
					
						
							|  |  |  | following functions to make this process more convenient: | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{funcdesc}{dump}{object, file\optional{, bin}} | 
					
						
							|  |  |  | Write a pickled representation of \var{object} to the open file object | 
					
						
							|  |  |  | \var{file}.  This is equivalent to | 
					
						
							|  |  |  | \code{Pickler(\var{file}, \var{bin}).dump(\var{object})}. | 
					
						
							|  |  |  | If the optional \var{bin} argument is true, the binary pickle format | 
					
						
							|  |  |  | is used; otherwise the (less efficient) text pickle format is used | 
					
						
							|  |  |  | (for backwards compatibility, this is the default). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \var{file} must have a \method{write()} method that accepts a single | 
					
						
							|  |  |  | string argument.  It can thus be a file object opened for writing, a | 
					
						
							|  |  |  | \refmodule{StringIO} object, or any other custom | 
					
						
							|  |  |  | object that meets this interface. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{funcdesc}{load}{file} | 
					
						
							|  |  |  | Read a string from the open file object \var{file} and interpret it as | 
					
						
							|  |  |  | a pickle data stream, reconstructing and returning the original object | 
					
						
							|  |  |  | hierarchy.  This is equivalent to \code{Unpickler(\var{file}).load()}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \var{file} must have two methods, a \method{read()} method that takes | 
					
						
							|  |  |  | an integer argument, and a \method{readline()} method that requires no | 
					
						
							|  |  |  | arguments.  Both methods should return a string.  Thus \var{file} can | 
					
						
							|  |  |  | be a file object opened for reading, a | 
					
						
							|  |  |  | \module{StringIO} object, or any other custom | 
					
						
							|  |  |  | object that meets this interface. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This function automatically determines whether the data stream was | 
					
						
							|  |  |  | written in binary mode or not. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{funcdesc}{dumps}{object\optional{, bin}} | 
					
						
							|  |  |  | Return the pickled representation of the object as a string, instead | 
					
						
							|  |  |  | of writing it to a file.  If the optional \var{bin} argument is | 
					
						
							|  |  |  | true, the binary pickle format is used; otherwise the (less efficient) | 
					
						
							|  |  |  | text pickle format is used (this is the default). | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{funcdesc}{loads}{string} | 
					
						
							|  |  |  | Read a pickled object hierarchy from a string.  Characters in the | 
					
						
							|  |  |  | string past the pickled object's representation are ignored. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | The \module{pickle} module also defines three exceptions: | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{excdesc}{PickleError} | 
					
						
							|  |  |  | A common base class for the other exceptions defined below.  This | 
					
						
							|  |  |  | inherits from \exception{Exception}. | 
					
						
							|  |  |  | \end{excdesc} | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{excdesc}{PicklingError} | 
					
						
							|  |  |  | This exception is raised when an unpicklable object is passed to | 
					
						
							|  |  |  | the \method{dump()} method. | 
					
						
							|  |  |  | \end{excdesc} | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{excdesc}{UnpicklingError} | 
					
						
							|  |  |  | This exception is raised when there is a problem unpickling an object, | 
					
						
							|  |  |  | such as a security violation.  Note that other exceptions may also be | 
					
						
							|  |  |  | raised during unpickling, including (but not necessarily limited to) | 
					
						
							| 
									
										
										
										
											2002-03-22 22:16:03 +00:00
										 |  |  | \exception{AttributeError}, \exception{EOFError}, | 
					
						
							|  |  |  | \exception{ImportError}, and \exception{IndexError}. | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \end{excdesc} | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | The \module{pickle} module also exports two callables\footnote{In the | 
					
						
							|  |  |  | \module{pickle} module these callables are classes, which you could | 
					
						
							|  |  |  | subclass to customize the behavior.  However, in the \module{cPickle} | 
					
						
							|  |  |  | modules these callables are factory functions and so cannot be | 
					
						
							|  |  |  | subclassed.  One of the common reasons to subclass is to control what | 
					
						
							|  |  |  | objects can actually be unpickled.  See section~\ref{pickle-sec} for | 
					
						
							|  |  |  | more details on security concerns.}, \class{Pickler} and | 
					
						
							|  |  |  | \class{Unpickler}: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{classdesc}{Pickler}{file\optional{, bin}} | 
					
						
							|  |  |  | This takes a file-like object to which it will write a pickle data | 
					
						
							|  |  |  | stream.  Optional \var{bin} if true, tells the pickler to use the more | 
					
						
							|  |  |  | efficient binary pickle format, otherwise the \ASCII{} format is used | 
					
						
							|  |  |  | (this is the default). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \var{file} must have a \method{write()} method that accepts a single | 
					
						
							|  |  |  | string argument.  It can thus be an open file object, a | 
					
						
							|  |  |  | \module{StringIO} object, or any other custom | 
					
						
							|  |  |  | object that meets this interface. | 
					
						
							|  |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \class{Pickler} objects define one (or two) public methods: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Pickler]{dump}{object} | 
					
						
							|  |  |  | Write a pickled representation of \var{object} to the open file object | 
					
						
							|  |  |  | given in the constructor.  Either the binary or \ASCII{} format will | 
					
						
							|  |  |  | be used, depending on the value of the \var{bin} flag passed to the | 
					
						
							|  |  |  | constructor. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Pickler]{clear_memo}{} | 
					
						
							|  |  |  | Clears the pickler's ``memo''.  The memo is the data structure that | 
					
						
							|  |  |  | remembers which objects the pickler has already seen, so that shared | 
					
						
							|  |  |  | or recursive objects pickled by reference and not by value.  This | 
					
						
							|  |  |  | method is useful when re-using picklers. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \strong{Note:} \method{clear_memo()} is only available on the picklers | 
					
						
							|  |  |  | created by \module{cPickle}.  In the \module{pickle} module, picklers | 
					
						
							|  |  |  | have an instance variable called \member{memo} which is a Python | 
					
						
							|  |  |  | dictionary.  So to clear the memo for a \module{pickle} module | 
					
						
							|  |  |  | pickler, you could do the following: | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | mypickler.memo.clear() | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \end{methoddesc} | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | It is possible to make multiple calls to the \method{dump()} method of | 
					
						
							|  |  |  | the same \class{Pickler} instance.  These must then be matched to the | 
					
						
							|  |  |  | same number of calls to the \method{load()} method of the | 
					
						
							|  |  |  | corresponding \class{Unpickler} instance.  If the same object is | 
					
						
							|  |  |  | pickled by multiple \method{dump()} calls, the \method{load()} will | 
					
						
							|  |  |  | all yield references to the same object\footnote{\emph{Warning}: this | 
					
						
							|  |  |  | is intended for pickling multiple objects without intervening | 
					
						
							|  |  |  | modifications to the objects or their parts.  If you modify an object | 
					
						
							|  |  |  | and then pickle it again using the same \class{Pickler} instance, the | 
					
						
							|  |  |  | object is not pickled again --- a reference to it is pickled and the | 
					
						
							|  |  |  | \class{Unpickler} will return the old value, not the modified one. | 
					
						
							|  |  |  | There are two problems here: (1) detecting changes, and (2) | 
					
						
							|  |  |  | marshalling a minimal set of changes.  Garbage Collection may also | 
					
						
							|  |  |  | become a problem here.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \class{Unpickler} objects are defined as: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{classdesc}{Unpickler}{file} | 
					
						
							|  |  |  | This takes a file-like object from which it will read a pickle data | 
					
						
							|  |  |  | stream.  This class automatically determines whether the data stream | 
					
						
							|  |  |  | was written in binary mode or not, so it does not need a flag as in | 
					
						
							|  |  |  | the \class{Pickler} factory. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \var{file} must have two methods, a \method{read()} method that takes | 
					
						
							|  |  |  | an integer argument, and a \method{readline()} method that requires no | 
					
						
							|  |  |  | arguments.  Both methods should return a string.  Thus \var{file} can | 
					
						
							|  |  |  | be a file object opened for reading, a | 
					
						
							|  |  |  | \module{StringIO} object, or any other custom | 
					
						
							|  |  |  | object that meets this interface. | 
					
						
							|  |  |  | \end{classdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \class{Unpickler} objects have one (or two) public methods: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Unpickler]{load}{} | 
					
						
							|  |  |  | Read a pickled object representation from the open file object given | 
					
						
							|  |  |  | in the constructor, and return the reconstituted object hierarchy | 
					
						
							|  |  |  | specified therein. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[Unpickler]{noload}{} | 
					
						
							|  |  |  | This is just like \method{load()} except that it doesn't actually | 
					
						
							|  |  |  | create any objects.  This is useful primarily for finding what's | 
					
						
							|  |  |  | called ``persistent ids'' that may be referenced in a pickle data | 
					
						
							|  |  |  | stream.  See section~\ref{pickle-protocol} below for more details. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \strong{Note:} the \method{noload()} method is currently only | 
					
						
							|  |  |  | available on \class{Unpickler} objects created with the | 
					
						
							|  |  |  | \module{cPickle} module.  \module{pickle} module \class{Unpickler}s do | 
					
						
							|  |  |  | not have the \method{noload()} method. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \subsection{What can be pickled and unpickled?} | 
					
						
							| 
									
										
										
										
											1997-12-09 20:45:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | The following types can be pickled: | 
					
						
							| 
									
										
										
										
											1999-07-02 14:25:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | \begin{itemize} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \item \code{None} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item integers, long integers, floating point numbers, complex numbers | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2000-04-06 15:04:30 +00:00
										 |  |  | \item normal and Unicode strings | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item tuples, lists, and dictionaries containing only picklable objects | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item functions defined at the top level of a module | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item built-in functions defined at the top level of a module | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item classes that are defined at the top level of a module | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | \item instances of such classes whose \member{__dict__} or | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \method{__setstate__()} is picklable  (see | 
					
						
							|  |  |  | section~\ref{pickle-protocol} for details) | 
					
						
							| 
									
										
										
										
											1995-02-15 15:53:08 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \end{itemize} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | Attempts to pickle unpicklable objects will raise the | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | \exception{PicklingError} exception; when this happens, an unspecified | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | number of bytes may have already been written to the underlying file. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note that functions (built-in and user-defined) are pickled by ``fully | 
					
						
							|  |  |  | qualified'' name reference, not by value.  This means that only the | 
					
						
							|  |  |  | function name is pickled, along with the name of module the function | 
					
						
							|  |  |  | is defined in.  Neither the function's code, nor any of its function | 
					
						
							|  |  |  | attributes are pickled.  Thus the defining module must be importable | 
					
						
							|  |  |  | in the unpickling environment, and the module must contain the named | 
					
						
							|  |  |  | object, otherwise an exception will be raised\footnote{The exception | 
					
						
							|  |  |  | raised will likely be an \exception{ImportError} or an | 
					
						
							|  |  |  | \exception{AttributeError} but it could be something else.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Similarly, classes are pickled by named reference, so the same | 
					
						
							|  |  |  | restrictions in the unpickling environment apply.  Note that none of | 
					
						
							|  |  |  | the class's code or data is pickled, so in the following example the | 
					
						
							|  |  |  | class attribute \code{attr} is not restored in the unpickling | 
					
						
							|  |  |  | environment: | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							|  |  |  | class Foo: | 
					
						
							|  |  |  |     attr = 'a class attr' | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | picklestring = pickle.dumps(Foo) | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | These restrictions are why picklable functions and classes must be | 
					
						
							|  |  |  | defined in the top level of a module. | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | Similarly, when class instances are pickled, their class's code and | 
					
						
							|  |  |  | data are not pickled along with them.  Only the instance data are | 
					
						
							|  |  |  | pickled.  This is done on purpose, so you can fix bugs in a class or | 
					
						
							|  |  |  | add methods to the class and still load objects that were created with | 
					
						
							|  |  |  | an earlier version of the class.  If you plan to have long-lived | 
					
						
							|  |  |  | objects that will see many versions of a class, it may be worthwhile | 
					
						
							|  |  |  | to put a version number in the objects so that suitable conversions | 
					
						
							|  |  |  | can be made by the class's \method{__setstate__()} method. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \subsection{The pickle protocol | 
					
						
							|  |  |  | \label{pickle-protocol}}\setindexsubitem{(pickle protocol)} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This section describes the ``pickling protocol'' that defines the | 
					
						
							|  |  |  | interface between the pickler/unpickler and the objects that are being | 
					
						
							|  |  |  | serialized.  This protocol provides a standard way for you to define, | 
					
						
							|  |  |  | customize, and control how your objects are serialized and | 
					
						
							|  |  |  | de-serialized.  The description in this section doesn't cover specific | 
					
						
							|  |  |  | customizations that you can employ to make the unpickling environment | 
					
						
							|  |  |  | safer from untrusted pickle data streams; see section~\ref{pickle-sec} | 
					
						
							|  |  |  | for more details. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \subsubsection{Pickling and unpickling normal class | 
					
						
							|  |  |  |     instances\label{pickle-inst}} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When a pickled class instance is unpickled, its \method{__init__()} | 
					
						
							|  |  |  | method is normally \emph{not} invoked.  If it is desirable that the | 
					
						
							|  |  |  | \method{__init__()} method be called on unpickling, a class can define | 
					
						
							|  |  |  | a method \method{__getinitargs__()}, which should return a | 
					
						
							|  |  |  | \emph{tuple} containing the arguments to be passed to the class | 
					
						
							|  |  |  | constructor (i.e. \method{__init__()}).  The | 
					
						
							|  |  |  | \method{__getinitargs__()} method is called at | 
					
						
							|  |  |  | pickle time; the tuple it returns is incorporated in the pickle for | 
					
						
							|  |  |  | the instance. | 
					
						
							|  |  |  | \withsubitem{(copy protocol)}{\ttindex{__getinitargs__()}} | 
					
						
							|  |  |  | \withsubitem{(instance constructor)}{\ttindex{__init__()}} | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \withsubitem{(copy protocol)}{ | 
					
						
							|  |  |  |   \ttindex{__getstate__()}\ttindex{__setstate__()}} | 
					
						
							|  |  |  | \withsubitem{(instance attribute)}{ | 
					
						
							|  |  |  |   \ttindex{__dict__}} | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | Classes can further influence how their instances are pickled; if the | 
					
						
							|  |  |  | class defines the method \method{__getstate__()}, it is called and the | 
					
						
							|  |  |  | return state is pickled as the contents for the instance, instead of | 
					
						
							|  |  |  | the contents of the instance's dictionary.  If there is no | 
					
						
							|  |  |  | \method{__getstate__()} method, the instance's \member{__dict__} is | 
					
						
							|  |  |  | pickled. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Upon unpickling, if the class also defines the method | 
					
						
							|  |  |  | \method{__setstate__()}, it is called with the unpickled | 
					
						
							|  |  |  | state\footnote{These methods can also be used to implement copying | 
					
						
							|  |  |  | class instances.}.  If there is no \method{__setstate__()} method, the | 
					
						
							|  |  |  | pickled object must be a dictionary and its items are assigned to the | 
					
						
							|  |  |  | new instance's dictionary.  If a class defines both | 
					
						
							|  |  |  | \method{__getstate__()} and \method{__setstate__()}, the state object | 
					
						
							|  |  |  | needn't be a dictionary and these methods can do what they | 
					
						
							|  |  |  | want\footnote{This protocol is also used by the shallow and deep | 
					
						
							|  |  |  | copying operations defined in the | 
					
						
							|  |  |  | \refmodule{copy} module.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \subsubsection{Pickling and unpickling extension types} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | When the \class{Pickler} encounters an object of a type it knows | 
					
						
							|  |  |  | nothing about --- such as an extension type --- it looks in two places | 
					
						
							|  |  |  | for a hint of how to pickle it.  One alternative is for the object to | 
					
						
							|  |  |  | implement a \method{__reduce__()} method.  If provided, at pickling | 
					
						
							|  |  |  | time \method{__reduce__()} will be called with no arguments, and it | 
					
						
							|  |  |  | must return either a string or a tuple. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If a string is returned, it names a global variable whose contents are | 
					
						
							|  |  |  | pickled as normal.  When a tuple is returned, it must be of length two | 
					
						
							|  |  |  | or three, with the following semantics: | 
					
						
							| 
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{itemize} | 
					
						
							| 
									
										
										
										
											1998-03-06 21:27:14 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item A callable object, which in the unpickling environment must be | 
					
						
							|  |  |  |       either a class, a callable registered as a ``safe constructor'' | 
					
						
							|  |  |  |       (see below), or it must have an attribute | 
					
						
							|  |  |  |       \member{__safe_for_unpickling__} with a true value.  Otherwise, | 
					
						
							|  |  |  |       an \exception{UnpicklingError} will be raised in the unpickling | 
					
						
							|  |  |  |       environment.  Note that as usual, the callable itself is pickled | 
					
						
							|  |  |  |       by name. | 
					
						
							| 
									
										
										
										
											1998-03-06 21:27:14 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item A tuple of arguments for the callable object, or \code{None}. | 
					
						
							| 
									
										
										
										
											1998-04-04 06:20:28 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \item Optionally, the object's state, which will be passed to | 
					
						
							|  |  |  |       the object's \method{__setstate__()} method as described in | 
					
						
							|  |  |  |       section~\ref{pickle-inst}.  If the object has no | 
					
						
							|  |  |  |       \method{__setstate__()} method, then, as above, the value must | 
					
						
							|  |  |  |       be a dictionary and it will be added to the object's | 
					
						
							|  |  |  |       \member{__dict__}. | 
					
						
							| 
									
										
										
										
											1998-04-11 20:43:51 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \end{itemize} | 
					
						
							| 
									
										
										
										
											1998-04-11 20:43:51 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | Upon unpickling, the callable will be called (provided that it meets | 
					
						
							|  |  |  | the above criteria), passing in the tuple of arguments; it should | 
					
						
							|  |  |  | return the unpickled object.  If the second item was \code{None}, then | 
					
						
							|  |  |  | instead of calling the callable directly, its \method{__basicnew__()} | 
					
						
							|  |  |  | method is called without arguments.  It should also return the | 
					
						
							|  |  |  | unpickled object. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | An alternative to implementing a \method{__reduce__()} method on the | 
					
						
							|  |  |  | object to be pickled, is to register the callable with the | 
					
						
							| 
									
										
										
										
											2001-11-26 21:30:36 +00:00
										 |  |  | \refmodule[copyreg]{copy_reg} module.  This module provides a way | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | for programs to register ``reduction functions'' and constructors for | 
					
						
							|  |  |  | user-defined types.   Reduction functions have the same semantics and | 
					
						
							|  |  |  | interface as the \method{__reduce__()} method described above, except | 
					
						
							|  |  |  | that they are called with a single argument, the object to be pickled. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The registered constructor is deemed a ``safe constructor'' for purposes | 
					
						
							|  |  |  | of unpickling as described above. | 
					
						
							| 
									
										
										
										
											1998-04-11 20:05:43 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \subsubsection{Pickling and unpickling external objects} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For the benefit of object persistence, the \module{pickle} module | 
					
						
							|  |  |  | supports the notion of a reference to an object outside the pickled | 
					
						
							|  |  |  | data stream.  Such objects are referenced by a ``persistent id'', | 
					
						
							|  |  |  | which is just an arbitrary string of printable \ASCII{} characters. | 
					
						
							|  |  |  | The resolution of such names is not defined by the \module{pickle} | 
					
						
							|  |  |  | module; it will delegate this resolution to user defined functions on | 
					
						
							|  |  |  | the pickler and unpickler\footnote{The actual mechanism for | 
					
						
							|  |  |  | associating these user defined functions is slightly different for | 
					
						
							|  |  |  | \module{pickle} and \module{cPickle}.  The description given here | 
					
						
							|  |  |  | works the same for both implementations.  Users of the \module{pickle} | 
					
						
							|  |  |  | module could also use subclassing to effect the same results, | 
					
						
							|  |  |  | overriding the \method{persistent_id()} and \method{persistent_load()} | 
					
						
							|  |  |  | methods in the derived classes.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To define external persistent id resolution, you need to set the | 
					
						
							|  |  |  | \member{persistent_id} attribute of the pickler object and the | 
					
						
							|  |  |  | \member{persistent_load} attribute of the unpickler object. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To pickle objects that have an external persistent id, the pickler | 
					
						
							|  |  |  | must have a custom \function{persistent_id()} method that takes an | 
					
						
							|  |  |  | object as an argument and returns either \code{None} or the persistent | 
					
						
							|  |  |  | id for that object.  When \code{None} is returned, the pickler simply | 
					
						
							|  |  |  | pickles the object as normal.  When a persistent id string is | 
					
						
							|  |  |  | returned, the pickler will pickle that string, along with a marker | 
					
						
							|  |  |  | so that the unpickler will recognize the string as a persistent id. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | To unpickle external objects, the unpickler must have a custom | 
					
						
							|  |  |  | \function{persistent_load()} function that takes a persistent id | 
					
						
							|  |  |  | string and returns the referenced object. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Here's a silly example that \emph{might} shed more light: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | import pickle | 
					
						
							|  |  |  | from cStringIO import StringIO | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | src = StringIO() | 
					
						
							|  |  |  | p = pickle.Pickler(src) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | def persistent_id(obj): | 
					
						
							|  |  |  |     if hasattr(obj, 'x'): | 
					
						
							|  |  |  |         return 'the value %d' % obj.x
 | 
					
						
							|  |  |  |     else: | 
					
						
							|  |  |  |         return None | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | p.persistent_id = persistent_id | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | class Integer: | 
					
						
							|  |  |  |     def __init__(self, x): | 
					
						
							|  |  |  |         self.x = x | 
					
						
							|  |  |  |     def __str__(self): | 
					
						
							|  |  |  |         return 'My name is integer %d' % self.x
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | i = Integer(7) | 
					
						
							|  |  |  | print i | 
					
						
							|  |  |  | p.dump(i) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | datastream = src.getvalue() | 
					
						
							|  |  |  | print repr(datastream) | 
					
						
							|  |  |  | dst = StringIO(datastream) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | up = pickle.Unpickler(dst) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | class FancyInteger(Integer): | 
					
						
							|  |  |  |     def __str__(self): | 
					
						
							|  |  |  |         return 'I am the integer %d' % self.x
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | def persistent_load(persid): | 
					
						
							|  |  |  |     if persid.startswith('the value '): | 
					
						
							|  |  |  |         value = int(persid.split()[2]) | 
					
						
							|  |  |  |         return FancyInteger(value) | 
					
						
							|  |  |  |     else: | 
					
						
							|  |  |  |         raise pickle.UnpicklingError, 'Invalid persistent id' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | up.persistent_load = persistent_load | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | j = up.load() | 
					
						
							|  |  |  | print j | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In the \module{cPickle} module, the unpickler's | 
					
						
							|  |  |  | \member{persistent_load} attribute can also be set to a Python | 
					
						
							|  |  |  | list, in which case, when the unpickler reaches a persistent id, the | 
					
						
							|  |  |  | persistent id string will simply be appended to this list.  This | 
					
						
							|  |  |  | functionality exists so that a pickle data stream can be ``sniffed'' | 
					
						
							|  |  |  | for object references without actually instantiating all the objects | 
					
						
							|  |  |  | in a pickle\footnote{We'll leave you with the image of Guido and Jim | 
					
						
							|  |  |  | sitting around sniffing pickles in their living rooms.}.  Setting | 
					
						
							|  |  |  | \member{persistent_load} to a list is usually used in conjunction with | 
					
						
							|  |  |  | the \method{noload()} method on the Unpickler. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | % BAW: Both pickle and cPickle support something called
 | 
					
						
							|  |  |  | % inst_persistent_id() which appears to give unknown types a second
 | 
					
						
							|  |  |  | % shot at producing a persistent id.  Since Jim Fulton can't remember
 | 
					
						
							|  |  |  | % why it was added or what it's for, I'm leaving it undocumented.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \subsection{Security \label{pickle-sec}} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Most of the security issues surrounding the \module{pickle} and | 
					
						
							|  |  |  | \module{cPickle} module involve unpickling.  There are no known | 
					
						
							|  |  |  | security vulnerabilities | 
					
						
							|  |  |  | related to pickling because you (the programmer) control the objects | 
					
						
							|  |  |  | that \module{pickle} will interact with, and all it produces is a | 
					
						
							|  |  |  | string. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | However, for unpickling, it is \strong{never} a good idea to unpickle | 
					
						
							|  |  |  | an untrusted string whose origins are dubious, for example, strings | 
					
						
							|  |  |  | read from a socket.  This is because unpickling can create unexpected | 
					
						
							|  |  |  | objects and even potentially run methods of those objects, such as | 
					
						
							|  |  |  | their class constructor or destructor\footnote{A special note of | 
					
						
							|  |  |  | caution is worth raising about the \refmodule{Cookie} | 
					
						
							|  |  |  | module.  By default, the \class{Cookie.Cookie} class is an alias for | 
					
						
							|  |  |  | the \class{Cookie.SmartCookie} class, which ``helpfully'' attempts to | 
					
						
							|  |  |  | unpickle any cookie data string it is passed.  This is a huge security | 
					
						
							|  |  |  | hole because cookie data typically comes from an untrusted source. | 
					
						
							|  |  |  | You should either explicitly use the \class{Cookie.SimpleCookie} class | 
					
						
							|  |  |  | --- which doesn't attempt to unpickle its string --- or you should | 
					
						
							|  |  |  | implement the defensive programming steps described later on in this | 
					
						
							|  |  |  | section.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | You can defend against this by customizing your unpickler so that you | 
					
						
							|  |  |  | can control exactly what gets unpickled and what gets called. | 
					
						
							|  |  |  | Unfortunately, exactly how you do this is different depending on | 
					
						
							|  |  |  | whether you're using \module{pickle} or \module{cPickle}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | One common feature that both modules implement is the | 
					
						
							|  |  |  | \member{__safe_for_unpickling__} attribute.  Before calling a callable | 
					
						
							|  |  |  | which is not a class, the unpickler will check to make sure that the | 
					
						
							|  |  |  | callable has either been registered as a safe callable via the | 
					
						
							| 
									
										
										
										
											2001-11-26 21:30:36 +00:00
										 |  |  | \refmodule[copyreg]{copy_reg} module, or that it has an | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | attribute \member{__safe_for_unpickling__} with a true value.  This | 
					
						
							|  |  |  | prevents the unpickling environment from being tricked into doing | 
					
						
							|  |  |  | evil things like call \code{os.unlink()} with an arbitrary file name. | 
					
						
							|  |  |  | See section~\ref{pickle-protocol} for more details. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For safely unpickling class instances, you need to control exactly | 
					
						
							| 
									
										
										
										
											2001-11-18 16:24:01 +00:00
										 |  |  | which classes will get created.  Be aware that a class's constructor | 
					
						
							|  |  |  | could be called (if the pickler found a \method{__getinitargs__()} | 
					
						
							|  |  |  | method) and the the class's destructor (i.e. its \method{__del__()} method) | 
					
						
							|  |  |  | might get called when the object is garbage collected.  Depending on | 
					
						
							|  |  |  | the class, it isn't very heard to trick either method into doing bad | 
					
						
							|  |  |  | things, such as removing a file.  The way to | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | control the classes that are safe to instantiate differs in | 
					
						
							|  |  |  | \module{pickle} and \module{cPickle}\footnote{A word of caution: the | 
					
						
							|  |  |  | mechanisms described here use internal attributes and methods, which | 
					
						
							|  |  |  | are subject to change in future versions of Python.  We intend to | 
					
						
							|  |  |  | someday provide a common interface for controlling this behavior, | 
					
						
							|  |  |  | which will work in either \module{pickle} or \module{cPickle}.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In the \module{pickle} module, you need to derive a subclass from | 
					
						
							|  |  |  | \class{Unpickler}, overriding the \method{load_global()} | 
					
						
							|  |  |  | method.  \method{load_global()} should read two lines from the pickle | 
					
						
							|  |  |  | data stream where the first line will the the name of the module | 
					
						
							|  |  |  | containing the class and the second line will be the name of the | 
					
						
							|  |  |  | instance's class.  It then look up the class, possibly importing the | 
					
						
							|  |  |  | module and digging out the attribute, then it appends what it finds to | 
					
						
							|  |  |  | the unpickler's stack.  Later on, this class will be assigned to the | 
					
						
							|  |  |  | \member{__class__} attribute of an empty class, as a way of magically | 
					
						
							|  |  |  | creating an instance without calling its class's \method{__init__()}. | 
					
						
							|  |  |  | You job (should you choose to accept it), would be to have | 
					
						
							|  |  |  | \method{load_global()} push onto the unpickler's stack, a known safe | 
					
						
							|  |  |  | version of any class you deem safe to unpickle.  It is up to you to | 
					
						
							|  |  |  | produce such a class.  Or you could raise an error if you want to | 
					
						
							|  |  |  | disallow all unpickling of instances.  If this sounds like a hack, | 
					
						
							|  |  |  | you're right.  UTSL. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Things are a little cleaner with \module{cPickle}, but not by much. | 
					
						
							|  |  |  | To control what gets unpickled, you can set the unpickler's | 
					
						
							|  |  |  | \member{find_global} attribute to a function or \code{None}.  If it is | 
					
						
							|  |  |  | \code{None} then any attempts to unpickle instances will raise an | 
					
						
							|  |  |  | \exception{UnpicklingError}.  If it is a function, | 
					
						
							|  |  |  | then it should accept a module name and a class name, and return the | 
					
						
							|  |  |  | corresponding class object.  It is responsible for looking up the | 
					
						
							|  |  |  | class, again performing any necessary imports, and it may raise an | 
					
						
							|  |  |  | error to prevent instances of the class from being unpickled. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The moral of the story is that you should be really careful about the | 
					
						
							|  |  |  | source of the strings your application unpickles. | 
					
						
							| 
									
										
										
										
											1998-04-11 20:05:43 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | \subsection{Example \label{pickle-example}} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Here's a simple example of how to modify pickling behavior for a | 
					
						
							|  |  |  | class.  The \class{TextReader} class opens a text file, and returns | 
					
						
							|  |  |  | the line number and line contents each time its \method{readline()} | 
					
						
							|  |  |  | method is called. If a \class{TextReader} instance is pickled, all | 
					
						
							|  |  |  | attributes \emph{except} the file object member are saved. When the | 
					
						
							|  |  |  | instance is unpickled, the file is reopened, and reading resumes from | 
					
						
							|  |  |  | the last location. The \method{__setstate__()} and | 
					
						
							|  |  |  | \method{__getstate__()} methods are used to implement this behavior. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | class TextReader: | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  |     """Print and number lines in a text file.""" | 
					
						
							|  |  |  |     def __init__(self, file): | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  |         self.file = file | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  |         self.fh = open(file) | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  |         self.lineno = 0 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def readline(self): | 
					
						
							|  |  |  |         self.lineno = self.lineno + 1 | 
					
						
							|  |  |  |         line = self.fh.readline() | 
					
						
							|  |  |  |         if not line: | 
					
						
							|  |  |  |             return None | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  |         if line.endswith("\n"): | 
					
						
							|  |  |  |             line = line[:-1] | 
					
						
							|  |  |  |         return "%d: %s" % (self.lineno, line)
 | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |     def __getstate__(self): | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  |         odict = self.__dict__.copy() # copy the dict since we change it | 
					
						
							|  |  |  |         del odict['fh']              # remove filehandle entry | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  |         return odict | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def __setstate__(self,dict): | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  |         fh = open(dict['file'])      # reopen file | 
					
						
							|  |  |  |         count = dict['lineno']       # read from file... | 
					
						
							|  |  |  |         while count:                 # until line count is restored | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  |             fh.readline() | 
					
						
							|  |  |  |             count = count - 1 | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  |         self.__dict__.update(dict)   # update attributes | 
					
						
							|  |  |  |         self.fh = fh                 # save the file object | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | A sample usage might be something like this: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{verbatim} | 
					
						
							|  |  |  | >>> import TextReader | 
					
						
							|  |  |  | >>> obj = TextReader.TextReader("TextReader.py") | 
					
						
							|  |  |  | >>> obj.readline() | 
					
						
							|  |  |  | '1: #!/usr/local/bin/python' | 
					
						
							|  |  |  | >>> # (more invocations of obj.readline() here) | 
					
						
							|  |  |  | ... obj.readline() | 
					
						
							|  |  |  | '7: class TextReader:' | 
					
						
							|  |  |  | >>> import pickle | 
					
						
							|  |  |  | >>> pickle.dump(obj,open('save.p','w')) | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  | If you want to see that \refmodule{pickle} works across Python | 
					
						
							|  |  |  | processes, start another Python session, before continuing.  What | 
					
						
							|  |  |  | follows can happen from either the same process or a new process. | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-09-25 16:29:17 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | >>> import pickle | 
					
						
							|  |  |  | >>> reader = pickle.load(open('save.p')) | 
					
						
							|  |  |  | >>> reader.readline() | 
					
						
							|  |  |  | '8:     "Print and number lines in a text file."' | 
					
						
							|  |  |  | \end{verbatim} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | \begin{seealso} | 
					
						
							|  |  |  |   \seemodule[copyreg]{copy_reg}{Pickle interface constructor | 
					
						
							|  |  |  |                                 registration for extension types.} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   \seemodule{shelve}{Indexed databases of objects; uses \module{pickle}.} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   \seemodule{copy}{Shallow and deep object copying.} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |   \seemodule{marshal}{High-performance serialization of built-in types.} | 
					
						
							|  |  |  | \end{seealso} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \section{\module{cPickle} --- A faster \module{pickle}} | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | \declaremodule{builtin}{cPickle} | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | \modulesynopsis{Faster version of \refmodule{pickle}, but not subclassable.} | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | \moduleauthor{Jim Fulton}{jfulton@digicool.com} | 
					
						
							|  |  |  | \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org} | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2001-11-15 23:39:07 +00:00
										 |  |  | The \module{cPickle} module supports serialization and | 
					
						
							|  |  |  | de-serialization of Python objects, providing an interface and | 
					
						
							|  |  |  | functionality nearly identical to the | 
					
						
							|  |  |  | \refmodule{pickle}\refstmodindex{pickle} module.  There are several | 
					
						
							|  |  |  | differences, the most important being performance and subclassability. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | First, \module{cPickle} can be up to 1000 times faster than | 
					
						
							|  |  |  | \module{pickle} because the former is implemented in C.  Second, in | 
					
						
							|  |  |  | the \module{cPickle} module the callables \function{Pickler()} and | 
					
						
							|  |  |  | \function{Unpickler()} are functions, not classes.  This means that | 
					
						
							|  |  |  | you cannot use them to derive custom pickling and unpickling | 
					
						
							|  |  |  | subclasses.  Most applications have no need for this functionality and | 
					
						
							|  |  |  | should benefit from the greatly improved performance of the | 
					
						
							|  |  |  | \module{cPickle} module. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The pickle data stream produced by \module{pickle} and | 
					
						
							|  |  |  | \module{cPickle} are identical, so it is possible to use | 
					
						
							|  |  |  | \module{pickle} and \module{cPickle} interchangeably with existing | 
					
						
							|  |  |  | pickles\footnote{Since the pickle data format is actually a tiny | 
					
						
							|  |  |  | stack-oriented programming language, and some freedom is taken in the | 
					
						
							|  |  |  | encodings of certain objects, it is possible that the two modules | 
					
						
							|  |  |  | produce different data streams for the same input objects.  However it | 
					
						
							|  |  |  | is guaranteed that they will always be able to read each other's | 
					
						
							|  |  |  | data streams.}. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | There are additional minor differences in API between \module{cPickle} | 
					
						
							|  |  |  | and \module{pickle}, however for most applications, they are | 
					
						
							|  |  |  | interchangable.  More documentation is provided in the | 
					
						
							|  |  |  | \module{pickle} module documentation, which | 
					
						
							|  |  |  | includes a list of the documented differences. | 
					
						
							| 
									
										
										
										
											1998-04-11 20:05:43 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-01-06 23:34:39 +00:00
										 |  |  | 
 |