mirror of
https://github.com/python/cpython.git
synced 2025-10-24 02:13:49 +00:00
140 lines
5.2 KiB
TeX
140 lines
5.2 KiB
TeX
\section{Built-in Module \sectcode{struct}}
|
|
\label{module-struct}
|
|
\bimodindex{struct}
|
|
\indexii{C@\C{}}{structures}
|
|
|
|
This module performs conversions between Python values and C
|
|
structs represented as Python strings. It uses \dfn{format strings}
|
|
(explained below) as compact descriptions of the lay-out of the C
|
|
structs and the intended conversion to/from Python values.
|
|
|
|
The module defines the following exception and functions:
|
|
|
|
|
|
\begin{excdesc}{error}
|
|
Exception raised on various occasions; argument is a string
|
|
describing what is wrong.
|
|
\end{excdesc}
|
|
|
|
\begin{funcdesc}{pack}{fmt, v1, v2, {\rm \ldots}}
|
|
Return a string containing the values
|
|
\code{\var{v1}, \var{v2}, {\rm \ldots}} packed according to the given
|
|
format. The arguments must match the values required by the format
|
|
exactly.
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{unpack}{fmt, string}
|
|
Unpack the string (presumably packed by \code{pack(\var{fmt}, {\rm \ldots})})
|
|
according to the given format. The result is a tuple even if it
|
|
contains exactly one item. The string must contain exactly the
|
|
amount of data required by the format (i.e. \code{len(\var{string})} must
|
|
equal \code{calcsize(\var{fmt})}).
|
|
\end{funcdesc}
|
|
|
|
\begin{funcdesc}{calcsize}{fmt}
|
|
Return the size of the struct (and hence of the string)
|
|
corresponding to the given format.
|
|
\end{funcdesc}
|
|
|
|
Format characters have the following meaning; the conversion between C
|
|
and Python values should be obvious given their types:
|
|
|
|
\begin{tableiii}{|c|l|l|}{samp}{Format}{C}{Python}
|
|
\lineiii{x}{pad byte}{no value}
|
|
\lineiii{c}{char}{string of length 1}
|
|
\lineiii{b}{signed char}{integer}
|
|
\lineiii{B}{unsigned char}{integer}
|
|
\lineiii{h}{short}{integer}
|
|
\lineiii{H}{unsigned short}{integer}
|
|
\lineiii{i}{int}{integer}
|
|
\lineiii{I}{unsigned int}{integer}
|
|
\lineiii{l}{long}{integer}
|
|
\lineiii{L}{unsigned long}{integer}
|
|
\lineiii{f}{float}{float}
|
|
\lineiii{d}{double}{float}
|
|
\lineiii{s}{char[]}{string}
|
|
\end{tableiii}
|
|
|
|
A format character may be preceded by an integral repeat count; e.g.\
|
|
the format string \code{'4h'} means exactly the same as \code{'hhhh'}.
|
|
|
|
Whitespace characters between formats are ignored; a count and its
|
|
format must not contain whitespace though.
|
|
|
|
For the \code{'s'} format character, the count is interpreted as the
|
|
size of the string, not a repeat count like for the other format
|
|
characters; e.g. \code{'10s'} means a single 10-byte string, while
|
|
\code{'10c'} means 10 characters. For packing, the string is
|
|
truncated or padded with null bytes as appropriate to make it fit.
|
|
For unpacking, the resulting string always has exactly the specified
|
|
number of bytes. As a special case, \code{'0s'} means a single, empty
|
|
string (while \code{'0c'} means 0 characters).
|
|
|
|
For the \code{'I'} and \code{'L'} format characters, the return
|
|
value is a Python long integer.
|
|
|
|
By default, C numbers are represented in the machine's native format
|
|
and byte order, and properly aligned by skipping pad bytes if
|
|
necessary (according to the rules used by the C compiler).
|
|
|
|
Alternatively, the first character of the format string can be used to
|
|
indicate the byte order, size and alignment of the packed data,
|
|
according to the following table:
|
|
|
|
\begin{tableiii}{|c|l|l|}{samp}{Character}{Byte order}{Size and alignment}
|
|
\lineiii{@}{native}{native}
|
|
\lineiii{=}{native}{standard}
|
|
\lineiii{<}{little-endian}{standard}
|
|
\lineiii{>}{big-endian}{standard}
|
|
\lineiii{!}{network (= big-endian)}{standard}
|
|
\end{tableiii}
|
|
|
|
If the first character is not one of these, \code{'@'} is assumed.
|
|
|
|
Native byte order is big-endian or little-endian, depending on the
|
|
host system (e.g. Motorola and Sun are big-endian; Intel and DEC are
|
|
little-endian).
|
|
|
|
Native size and alignment are determined using the C compiler's sizeof
|
|
expression. This is always combined with native byte order.
|
|
|
|
Standard size and alignment are as follows: no alignment is required
|
|
for any type (so you have to use pad bytes); short is 2 bytes; int and
|
|
long are 4 bytes. Float and double are 32-bit and 64-bit IEEE floating
|
|
point numbers, respectively.
|
|
|
|
Note the difference between \code{'@'} and \code{'='}: both use native
|
|
byte order, but the size and alignment of the latter is standardized.
|
|
|
|
The form \code{'!'} is available for those poor souls who claim they
|
|
can't remember whether network byte order is big-endian or
|
|
little-endian.
|
|
|
|
There is no way to indicate non-native byte order (i.e. force
|
|
byte-swapping); use the appropriate choice of \code{'<'} or
|
|
\code{'>'}.
|
|
|
|
Examples (all using native byte order, size and alignment, on a
|
|
big-endian machine):
|
|
|
|
\begin{verbatim}
|
|
>>> from struct import *
|
|
>>> pack('hhl', 1, 2, 3)
|
|
'\000\001\000\002\000\000\000\003'
|
|
>>> unpack('hhl', '\000\001\000\002\000\000\000\003')
|
|
(1, 2, 3)
|
|
>>> calcsize('hhl')
|
|
8
|
|
>>>
|
|
\end{verbatim}
|
|
%
|
|
Hint: to align the end of a structure to the alignment requirement of
|
|
a particular type, end the format with the code for that type with a
|
|
repeat count of zero, e.g.\ the format \code{'llh0l'} specifies two
|
|
pad bytes at the end, assuming longs are aligned on 4-byte boundaries.
|
|
This only works when native size and alignment are in effect;
|
|
standard size and alignment does not enforce any alignment.
|
|
|
|
\begin{seealso}
|
|
\seemodule{array}{packed binary storage of homogeneous data}
|
|
\end{seealso}
|