cpython/Doc/lib/libstruct.tex
Thomas Wouters b213704f3c Merged revisions 53451-53537 via svnmerge from
svn+ssh://pythondev@svn.python.org/python/trunk

........
  r53454 | brett.cannon | 2007-01-15 20:12:08 +0100 (Mon, 15 Jan 2007) | 3 lines

  Add a note for strptime that just because strftime supports some extra
  directive that is not documented that strptime will as well.
........
  r53458 | vinay.sajip | 2007-01-16 10:50:07 +0100 (Tue, 16 Jan 2007) | 1 line

  Updated rotating file handlers to use _open().
........
  r53459 | marc-andre.lemburg | 2007-01-16 14:03:06 +0100 (Tue, 16 Jan 2007) | 2 lines

  Add news items for the recent pybench and platform changes.
........
  r53460 | sjoerd.mullender | 2007-01-16 17:42:38 +0100 (Tue, 16 Jan 2007) | 4 lines

  Fixed ntpath.expandvars to not replace references to non-existing
  variables with nothing.  Also added tests.
  This fixes bug #494589.
........
  r53464 | neal.norwitz | 2007-01-17 07:23:51 +0100 (Wed, 17 Jan 2007) | 1 line

  Give Calvin Spealman access for python-dev summaries.
........
  r53465 | neal.norwitz | 2007-01-17 09:37:26 +0100 (Wed, 17 Jan 2007) | 1 line

  Remove Calvin since he only has access to the website currently.
........
  r53466 | thomas.heller | 2007-01-17 10:40:34 +0100 (Wed, 17 Jan 2007) | 2 lines

  Replace C++ comments with C comments.
........
  r53472 | andrew.kuchling | 2007-01-17 20:55:06 +0100 (Wed, 17 Jan 2007) | 1 line

  [Part of bug #1599254] Add suggestion to Mailbox docs to use Maildir, and warn user to lock/unlock mailboxes when modifying them
........
  r53475 | georg.brandl | 2007-01-17 22:09:04 +0100 (Wed, 17 Jan 2007) | 2 lines

  Bug #1637967: missing //= operator in list.
........
  r53477 | georg.brandl | 2007-01-17 22:19:58 +0100 (Wed, 17 Jan 2007) | 2 lines

  Bug #1629125: fix wrong data type (int -> Py_ssize_t) in PyDict_Next docs.
........
  r53481 | neal.norwitz | 2007-01-18 06:40:58 +0100 (Thu, 18 Jan 2007) | 1 line

  Try reverting part of r53145 that seems to cause the Windows buildbots to fail in test_uu.UUFileTest.test_encode
........
  r53482 | fred.drake | 2007-01-18 06:42:30 +0100 (Thu, 18 Jan 2007) | 1 line

  add missing version entry
........
  r53483 | neal.norwitz | 2007-01-18 07:20:55 +0100 (Thu, 18 Jan 2007) | 7 lines

  This test doesn't pass on Windows.  The cause seems to be that chmod
  doesn't support the same funcationality as on Unix.  I'm not sure if
  this fix is the best (or if it will even work)--it's a test to see
  if the buildbots start passing again.

  It might be better to not even run this test if it's windows (or non-posix).
........
  r53488 | neal.norwitz | 2007-01-19 06:53:33 +0100 (Fri, 19 Jan 2007) | 1 line

  SF #1635217, Fix unbalanced paren
........
  r53489 | martin.v.loewis | 2007-01-19 07:42:22 +0100 (Fri, 19 Jan 2007) | 3 lines

  Prefix AST symbols with _Py_. Fixes #1637022.
  Will backport.
........
  r53497 | martin.v.loewis | 2007-01-19 19:01:38 +0100 (Fri, 19 Jan 2007) | 2 lines

  Add UUIDs for 2.5.1 and 2.5.2
........
  r53499 | raymond.hettinger | 2007-01-19 19:07:18 +0100 (Fri, 19 Jan 2007) | 1 line

  SF# 1635892:  Fix docs for betavariate's input parameters .
........
  r53503 | martin.v.loewis | 2007-01-20 15:05:39 +0100 (Sat, 20 Jan 2007) | 2 lines

  Merge 53501 and 53502 from 25 branch:
  Add /GS- for AMD64 and Itanium builds where missing.
........
  r53504 | walter.doerwald | 2007-01-20 18:28:31 +0100 (Sat, 20 Jan 2007) | 2 lines

  Port test_resource.py to unittest.
........
  r53505 | walter.doerwald | 2007-01-20 19:19:33 +0100 (Sat, 20 Jan 2007) | 2 lines

  Add argument tests an calls of resource.getrusage().
........
  r53506 | walter.doerwald | 2007-01-20 20:03:17 +0100 (Sat, 20 Jan 2007) | 2 lines

  resource.RUSAGE_BOTH might not exist.
........
  r53507 | walter.doerwald | 2007-01-21 00:07:28 +0100 (Sun, 21 Jan 2007) | 2 lines

  Port test_new.py to unittest.
........
  r53508 | martin.v.loewis | 2007-01-21 10:33:07 +0100 (Sun, 21 Jan 2007) | 2 lines

  Patch #1610575: Add support for _Bool to struct.
........
  r53509 | georg.brandl | 2007-01-21 11:28:43 +0100 (Sun, 21 Jan 2007) | 3 lines

  Bug #1486663: don't reject keyword arguments for subclasses of builtin
  types.
........
  r53511 | georg.brandl | 2007-01-21 11:35:10 +0100 (Sun, 21 Jan 2007) | 2 lines

  Patch #1627441: close sockets properly in urllib2.
........
  r53517 | georg.brandl | 2007-01-22 20:40:21 +0100 (Mon, 22 Jan 2007) | 3 lines

  Use new email module names (#1637162, #1637159, #1637157).
........
  r53518 | andrew.kuchling | 2007-01-22 21:26:40 +0100 (Mon, 22 Jan 2007) | 1 line

  Improve pattern used for mbox 'From' lines; add a simple test
........
  r53519 | andrew.kuchling | 2007-01-22 21:27:50 +0100 (Mon, 22 Jan 2007) | 1 line

  Make comment match the code
........
  r53522 | georg.brandl | 2007-01-22 22:10:33 +0100 (Mon, 22 Jan 2007) | 2 lines

  Bug #1249573: fix rfc822.parsedate not accepting a certain date format
........
  r53524 | georg.brandl | 2007-01-22 22:23:41 +0100 (Mon, 22 Jan 2007) | 2 lines

  Bug #1627316: handle error in condition/ignore pdb commands more gracefully.
........
  r53526 | lars.gustaebel | 2007-01-23 12:17:33 +0100 (Tue, 23 Jan 2007) | 4 lines

  Patch #1507247: tarfile.py: use current umask for intermediate
  directories.
........
  r53527 | thomas.wouters | 2007-01-23 14:42:00 +0100 (Tue, 23 Jan 2007) | 13 lines


  SF patch #1630975: Fix crash when replacing sys.stdout in sitecustomize

  When running the interpreter in an environment that would cause it to set
  stdout/stderr/stdin's encoding, having a sitecustomize that would replace
  them with something other than PyFile objects would crash the interpreter.
  Fix it by simply ignoring the encoding-setting for non-files.

  This could do with a test, but I can think of no maintainable and portable
  way to test this bug, short of adding a sitecustomize.py to the buildsystem
  and have it always run with it (hmmm....)
........
  r53528 | thomas.wouters | 2007-01-23 14:50:49 +0100 (Tue, 23 Jan 2007) | 4 lines


  Add news entry about last checkin (oops.)
........
  r53531 | martin.v.loewis | 2007-01-23 22:11:47 +0100 (Tue, 23 Jan 2007) | 4 lines

  Make PyTraceBack_Here use the current thread, not the
  frame's thread state. Fixes #1579370.
  Will backport.
........
  r53535 | brett.cannon | 2007-01-24 00:21:22 +0100 (Wed, 24 Jan 2007) | 5 lines

  Fix crasher for when an object's __del__ creates a new weakref to itself.
  Patch only fixes new-style classes; classic classes still buggy.

  Closes bug #1377858.  Already backported.
........
  r53536 | walter.doerwald | 2007-01-24 01:42:19 +0100 (Wed, 24 Jan 2007) | 2 lines

  Port test_popen.py to unittest.
........
2007-02-01 18:02:27 +00:00

210 lines
8.4 KiB
TeX

\section{\module{struct} ---
Interpret strings as packed binary data}
\declaremodule{builtin}{struct}
\modulesynopsis{Interpret strings as packed binary data.}
\indexii{C}{structures}
\indexiii{packing}{binary}{data}
This module performs conversions between Python values and C
structs represented as Python strings. It uses \dfn{format strings}
(explained below) as compact descriptions of the lay-out of the C
structs and the intended conversion to/from Python values. This can
be used in handling binary data stored in files or from network
connections, among other sources.
The module defines the following exception and functions:
\begin{excdesc}{error}
Exception raised on various occasions; argument is a string
describing what is wrong.
\end{excdesc}
\begin{funcdesc}{pack}{fmt, v1, v2, \textrm{\ldots}}
Return a string containing the values
\code{\var{v1}, \var{v2}, \textrm{\ldots}} packed according to the given
format. The arguments must match the values required by the format
exactly.
\end{funcdesc}
\begin{funcdesc}{unpack}{fmt, string}
Unpack the string (presumably packed by \code{pack(\var{fmt},
\textrm{\ldots})}) according to the given format. The result is a
tuple even if it contains exactly one item. The string must contain
exactly the amount of data required by the format
(\code{len(\var{string})} must equal \code{calcsize(\var{fmt})}).
\end{funcdesc}
\begin{funcdesc}{calcsize}{fmt}
Return the size of the struct (and hence of the string)
corresponding to the given format.
\end{funcdesc}
Format characters have the following meaning; the conversion between
C and Python values should be obvious given their types:
\begin{tableiv}{c|l|l|c}{samp}{Format}{C Type}{Python}{Notes}
\lineiv{x}{pad byte}{no value}{}
\lineiv{c}{\ctype{char}}{string of length 1}{}
\lineiv{b}{\ctype{signed char}}{integer}{}
\lineiv{B}{\ctype{unsigned char}}{integer}{}
\lineiv{t}{\ctype{_Bool}}{bool}{(1)}
\lineiv{h}{\ctype{short}}{integer}{}
\lineiv{H}{\ctype{unsigned short}}{integer}{}
\lineiv{i}{\ctype{int}}{integer}{}
\lineiv{I}{\ctype{unsigned int}}{long}{}
\lineiv{l}{\ctype{long}}{integer}{}
\lineiv{L}{\ctype{unsigned long}}{long}{}
\lineiv{q}{\ctype{long long}}{long}{(2)}
\lineiv{Q}{\ctype{unsigned long long}}{long}{(2)}
\lineiv{f}{\ctype{float}}{float}{}
\lineiv{d}{\ctype{double}}{float}{}
\lineiv{s}{\ctype{char[]}}{string}{}
\lineiv{p}{\ctype{char[]}}{string}{}
\lineiv{P}{\ctype{void *}}{integer}{}
\end{tableiv}
\noindent
Notes:
\begin{description}
\item[(1)]
The \character{t} conversion code corresponds to the \ctype{_Bool} type
defined by C99. If this type is not available, it is simulated using a
\ctype{char}. In standard mode, it is always represented by one byte.
\versionadded{2.6}
\item[(2)]
The \character{q} and \character{Q} conversion codes are available in
native mode only if the platform C compiler supports C \ctype{long long},
or, on Windows, \ctype{__int64}. They are always available in standard
modes.
\versionadded{2.2}
\end{description}
A format character may be preceded by an integral repeat count. For
example, the format string \code{'4h'} means exactly the same as
\code{'hhhh'}.
Whitespace characters between formats are ignored; a count and its
format must not contain whitespace though.
For the \character{s} format character, the count is interpreted as the
size of the string, not a repeat count like for the other format
characters; for example, \code{'10s'} means a single 10-byte string, while
\code{'10c'} means 10 characters. For packing, the string is
truncated or padded with null bytes as appropriate to make it fit.
For unpacking, the resulting string always has exactly the specified
number of bytes. As a special case, \code{'0s'} means a single, empty
string (while \code{'0c'} means 0 characters).
The \character{p} format character encodes a "Pascal string", meaning
a short variable-length string stored in a fixed number of bytes.
The count is the total number of bytes stored. The first byte stored is
the length of the string, or 255, whichever is smaller. The bytes
of the string follow. If the string passed in to \function{pack()} is too
long (longer than the count minus 1), only the leading count-1 bytes of the
string are stored. If the string is shorter than count-1, it is padded
with null bytes so that exactly count bytes in all are used. Note that
for \function{unpack()}, the \character{p} format character consumes count
bytes, but that the string returned can never contain more than 255
characters.
For the \character{I}, \character{L}, \character{q} and \character{Q}
format characters, the return value is a Python long integer.
For the \character{P} format character, the return value is a Python
integer or long integer, depending on the size needed to hold a
pointer when it has been cast to an integer type. A \NULL{} pointer will
always be returned as the Python integer \code{0}. When packing pointer-sized
values, Python integer or long integer objects may be used. For
example, the Alpha and Merced processors use 64-bit pointer values,
meaning a Python long integer will be used to hold the pointer; other
platforms use 32-bit pointers and will use a Python integer.
For the \character{t} format character, the return value is either
\constant{True} or \constant{False}. When packing, the truth value
of the argument object is used. Either 0 or 1 in the native or standard
bool representation will be packed, and any non-zero value will be True
when unpacking.
By default, C numbers are represented in the machine's native format
and byte order, and properly aligned by skipping pad bytes if
necessary (according to the rules used by the C compiler).
Alternatively, the first character of the format string can be used to
indicate the byte order, size and alignment of the packed data,
according to the following table:
\begin{tableiii}{c|l|l}{samp}{Character}{Byte order}{Size and alignment}
\lineiii{@}{native}{native}
\lineiii{=}{native}{standard}
\lineiii{<}{little-endian}{standard}
\lineiii{>}{big-endian}{standard}
\lineiii{!}{network (= big-endian)}{standard}
\end{tableiii}
If the first character is not one of these, \character{@} is assumed.
Native byte order is big-endian or little-endian, depending on the
host system. For example, Motorola and Sun processors are big-endian;
Intel and DEC processors are little-endian.
Native size and alignment are determined using the C compiler's
\keyword{sizeof} expression. This is always combined with native byte
order.
Standard size and alignment are as follows: no alignment is required
for any type (so you have to use pad bytes);
\ctype{short} is 2 bytes;
\ctype{int} and \ctype{long} are 4 bytes;
\ctype{long long} (\ctype{__int64} on Windows) is 8 bytes;
\ctype{float} and \ctype{double} are 32-bit and 64-bit
IEEE floating point numbers, respectively.
\ctype{_Bool} is 1 byte.
Note the difference between \character{@} and \character{=}: both use
native byte order, but the size and alignment of the latter is
standardized.
The form \character{!} is available for those poor souls who claim they
can't remember whether network byte order is big-endian or
little-endian.
There is no way to indicate non-native byte order (force
byte-swapping); use the appropriate choice of \character{<} or
\character{>}.
The \character{P} format character is only available for the native
byte ordering (selected as the default or with the \character{@} byte
order character). The byte order character \character{=} chooses to
use little- or big-endian ordering based on the host system. The
struct module does not interpret this as native ordering, so the
\character{P} format is not available.
Examples (all using native byte order, size and alignment, on a
big-endian machine):
\begin{verbatim}
>>> from struct import *
>>> pack('hhl', 1, 2, 3)
'\x00\x01\x00\x02\x00\x00\x00\x03'
>>> unpack('hhl', '\x00\x01\x00\x02\x00\x00\x00\x03')
(1, 2, 3)
>>> calcsize('hhl')
8
\end{verbatim}
Hint: to align the end of a structure to the alignment requirement of
a particular type, end the format with the code for that type with a
repeat count of zero. For example, the format \code{'llh0l'}
specifies two pad bytes at the end, assuming longs are aligned on
4-byte boundaries. This only works when native size and alignment are
in effect; standard size and alignment does not enforce any alignment.
\begin{seealso}
\seemodule{array}{Packed binary storage of homogeneous data.}
\seemodule{xdrlib}{Packing and unpacking of XDR data.}
\end{seealso}