mirror of
				https://github.com/python/cpython.git
				synced 2025-11-04 07:31:38 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			1369 lines
		
	
	
	
		
			57 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			1369 lines
		
	
	
	
		
			57 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
\documentstyle[twoside,11pt,myformat]{report}
 | 
						|
 | 
						|
% XXX PM Modulator
 | 
						|
 | 
						|
\title{Extending and Embedding the Python Interpreter}
 | 
						|
 | 
						|
\input{boilerplate}
 | 
						|
 | 
						|
% Tell \index to actually write the .idx file
 | 
						|
\makeindex
 | 
						|
 | 
						|
\begin{document}
 | 
						|
 | 
						|
\pagenumbering{roman}
 | 
						|
 | 
						|
\maketitle
 | 
						|
 | 
						|
\input{copyright}
 | 
						|
 | 
						|
\begin{abstract}
 | 
						|
 | 
						|
\noindent
 | 
						|
Python is an interpreted, object-oriented programming language.  This
 | 
						|
document describes how to write modules in C or \Cpp{} to extend the
 | 
						|
Python interpreter with new modules.  Those modules can define new
 | 
						|
functions but also new object types and their methods.  The document
 | 
						|
also describes how to embed the Python interpreter in another
 | 
						|
application, for use as an extension language.  Finally, it shows how
 | 
						|
to compile and link extension modules so that they can be loaded
 | 
						|
dynamically (at run time) into the interpreter, if the underlying
 | 
						|
operating system supports this feature.
 | 
						|
 | 
						|
This document assumes basic knowledge about Python.  For an informal
 | 
						|
introduction to the language, see the Python Tutorial.  The Python
 | 
						|
Reference Manual gives a more formal definition of the language.  The
 | 
						|
Python Library Reference documents the existing object types,
 | 
						|
functions and modules (both built-in and written in Python) that give
 | 
						|
the language its wide application range.
 | 
						|
 | 
						|
\end{abstract}
 | 
						|
 | 
						|
\pagebreak
 | 
						|
 | 
						|
{
 | 
						|
\parskip = 0mm
 | 
						|
\tableofcontents
 | 
						|
}
 | 
						|
 | 
						|
\pagebreak
 | 
						|
 | 
						|
\pagenumbering{arabic}
 | 
						|
 | 
						|
 | 
						|
\chapter{Extending Python with C or \Cpp{} code}
 | 
						|
 | 
						|
 | 
						|
\section{Introduction}
 | 
						|
 | 
						|
It is quite easy to add new built-in modules to Python, if you know
 | 
						|
how to program in C.  Such \dfn{extension modules} can do two things
 | 
						|
that can't be done directly in Python: they can implement new built-in
 | 
						|
object types, and they can call C library functions and system calls.
 | 
						|
 | 
						|
To support extensions, the Python API (Application Programmers
 | 
						|
Interface) defines a set of functions, macros and variables that
 | 
						|
provide access to most aspects of the Python run-time system.  The
 | 
						|
Python API is incorporated in a C source file by including the header
 | 
						|
\code{"Python.h"}.
 | 
						|
 | 
						|
The compilation of an extension module depends on its intended use as
 | 
						|
well as on your system setup; details are given in a later section.
 | 
						|
 | 
						|
 | 
						|
\section{A Simple Example}
 | 
						|
 | 
						|
Let's create an extension module called \samp{spam} (the favorite food
 | 
						|
of Monty Python fans...) and let's say we want to create a Python
 | 
						|
interface to the C library function \code{system()}.\footnote{An
 | 
						|
interface for this function already exists in the standard module
 | 
						|
\code{os} --- it was chosen as a simple and straightfoward example.}
 | 
						|
This function takes a null-terminated character string as argument and
 | 
						|
returns an integer.  We want this function to be callable from Python
 | 
						|
as follows:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    >>> import spam
 | 
						|
    >>> status = spam.system("ls -l")
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Begin by creating a file \samp{spammodule.c}.  (In general, if a
 | 
						|
module is called \samp{spam}, the C file containing its implementation
 | 
						|
is called \file{spammodule.c}; if the module name is very long, like
 | 
						|
\samp{spammify}, the module name can be just \file{spammify.c}.)
 | 
						|
 | 
						|
The first line of our file can be:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    #include "Python.h"
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
which pulls in the Python API (you can add a comment describing the
 | 
						|
purpose of the module and a copyright notice if you like).
 | 
						|
 | 
						|
All user-visible symbols defined by \code{"Python.h"} have a prefix of
 | 
						|
\samp{Py} or \samp{PY}, except those defined in standard header files.
 | 
						|
For convenience, and since they are used extensively by the Python
 | 
						|
interpreter, \code{"Python.h"} includes a few standard header files:
 | 
						|
\code{<stdio.h>}, \code{<string.h>}, \code{<errno.h>}, and
 | 
						|
\code{<stdlib.h>}.  If the latter header file does not exist on your
 | 
						|
system, it declares the functions \code{malloc()}, \code{free()} and
 | 
						|
\code{realloc()} directly.
 | 
						|
 | 
						|
The next thing we add to our module file is the C function that will
 | 
						|
be called when the Python expression \samp{spam.system(\var{string})}
 | 
						|
is evaluated (we'll see shortly how it ends up being called):
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    static PyObject *
 | 
						|
    spam_system(self, args)
 | 
						|
        PyObject *self;
 | 
						|
        PyObject *args;
 | 
						|
    {
 | 
						|
        char *command;
 | 
						|
        int sts;
 | 
						|
        if (!PyArg_ParseTuple(args, "s", &command))
 | 
						|
            return NULL;
 | 
						|
        sts = system(command);
 | 
						|
        return Py_BuildValue("i", sts);
 | 
						|
    }
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
There is a straightforward translation from the argument list in
 | 
						|
Python (e.g.\ the single expression \code{"ls -l"}) to the arguments
 | 
						|
passed to the C function.  The C function always has two arguments,
 | 
						|
conventionally named \var{self} and \var{args}.
 | 
						|
 | 
						|
The \var{self} argument is only used when the C function implements a
 | 
						|
builtin method.  This will be discussed later. In the example,
 | 
						|
\var{self} will always be a \code{NULL} pointer, since we are defining
 | 
						|
a function, not a method.  (This is done so that the interpreter
 | 
						|
doesn't have to understand two different types of C functions.)
 | 
						|
 | 
						|
The \var{args} argument will be a pointer to a Python tuple object
 | 
						|
containing the arguments.  Each item of the tuple corresponds to an
 | 
						|
argument in the call's argument list.  The arguments are Python
 | 
						|
objects -- in order to do anything with them in our C function we have
 | 
						|
to convert them to C values.  The function \code{PyArg_ParseTuple()}
 | 
						|
in the Python API checks the argument types and converts them to C
 | 
						|
values.  It uses a template string to determine the required types of
 | 
						|
the arguments as well as the types of the C variables into which to
 | 
						|
store the converted values.  More about this later.
 | 
						|
 | 
						|
\code{PyArg_ParseTuple()} returns true (nonzero) if all arguments have
 | 
						|
the right type and its components have been stored in the variables
 | 
						|
whose addresses are passed.  It returns false (zero) if an invalid
 | 
						|
argument list was passed.  In the latter case it also raises an
 | 
						|
appropriate exception by so the calling function can return
 | 
						|
\code{NULL} immediately (as we saw in the example).
 | 
						|
 | 
						|
 | 
						|
\section{Intermezzo: Errors and Exceptions}
 | 
						|
 | 
						|
An important convention throughout the Python interpreter is the
 | 
						|
following: when a function fails, it should set an exception condition
 | 
						|
and return an error value (usually a \code{NULL} pointer).  Exceptions
 | 
						|
are stored in a static global variable inside the interpreter; if this
 | 
						|
variable is \code{NULL} no exception has occurred.  A second global
 | 
						|
variable stores the ``associated value'' of the exception (the second
 | 
						|
argument to \code{raise}).  A third variable contains the stack
 | 
						|
traceback in case the error originated in Python code.  These three
 | 
						|
variables are the C equivalents of the Python variables
 | 
						|
\code{sys.exc_type}, \code{sys.exc_value} and \code{sys.exc_traceback}
 | 
						|
(see the section on module \code{sys} in the Library Reference
 | 
						|
Manual).  It is important to know about them to understand how errors
 | 
						|
are passed around.
 | 
						|
 | 
						|
The Python API defines a number of functions to set various types of
 | 
						|
exceptions.
 | 
						|
 | 
						|
The most common one is \code{PyErr_SetString()}.  Its arguments are an
 | 
						|
exception object and a C string.  The exception object is usually a
 | 
						|
predefined object like \code{PyExc_ZeroDivisionError}.  The C string
 | 
						|
indicates the cause of the error and is converted to a Python string
 | 
						|
object and stored as the ``associated value'' of the exception.
 | 
						|
 | 
						|
Another useful function is \code{PyErr_SetFromErrno()}, which only
 | 
						|
takes an exception argument and constructs the associated value by
 | 
						|
inspection of the (\UNIX{}) global variable \code{errno}.  The most
 | 
						|
general function is \code{PyErr_SetObject()}, which takes two object
 | 
						|
arguments, the exception and its associated value.  You don't need to
 | 
						|
\code{Py_INCREF()} the objects passed to any of these functions.
 | 
						|
 | 
						|
You can test non-destructively whether an exception has been set with
 | 
						|
\code{PyErr_Occurred()}.  This returns the current exception object,
 | 
						|
or \code{NULL} if no exception has occurred.  You normally don't need
 | 
						|
to call \code{PyErr_Occurred()} to see whether an error occurred in a
 | 
						|
function call, since you should be able to tell from the return value.
 | 
						|
 | 
						|
When a function \var{f} that calls another function \var{g} detects
 | 
						|
that the latter fails, \var{f} should itself return an error value
 | 
						|
(e.g. \code{NULL} or \code{-1}).  It should \emph{not} call one of the
 | 
						|
\code{PyErr_*()} functions --- one has already been called by \var{g}.
 | 
						|
\var{f}'s caller is then supposed to also return an error indication
 | 
						|
to \emph{its} caller, again \emph{without} calling \code{PyErr_*()},
 | 
						|
and so on --- the most detailed cause of the error was already
 | 
						|
reported by the function that first detected it.  Once the error
 | 
						|
reaches the Python interpreter's main loop, this aborts the currently
 | 
						|
executing Python code and tries to find an exception handler specified
 | 
						|
by the Python programmer.
 | 
						|
 | 
						|
(There are situations where a module can actually give a more detailed
 | 
						|
error message by calling another \code{PyErr_*()} function, and in
 | 
						|
such cases it is fine to do so.  As a general rule, however, this is
 | 
						|
not necessary, and can cause information about the cause of the error
 | 
						|
to be lost: most operations can fail for a variety of reasons.)
 | 
						|
 | 
						|
To ignore an exception set by a function call that failed, the exception
 | 
						|
condition must be cleared explicitly by calling \code{PyErr_Clear()}. 
 | 
						|
The only time C code should call \code{PyErr_Clear()} is if it doesn't
 | 
						|
want to pass the error on to the interpreter but wants to handle it
 | 
						|
completely by itself (e.g. by trying something else or pretending
 | 
						|
nothing happened).
 | 
						|
 | 
						|
Note that a failing \code{malloc()} call must be turned into an
 | 
						|
exception --- the direct caller of \code{malloc()} (or
 | 
						|
\code{realloc()}) must call \code{PyErr_NoMemory()} and return a
 | 
						|
failure indicator itself.  All the object-creating functions
 | 
						|
(\code{PyInt_FromLong()} etc.) already do this, so only if you call
 | 
						|
\code{malloc()} directly this note is of importance.
 | 
						|
 | 
						|
Also note that, with the important exception of
 | 
						|
\code{PyArg_ParseTuple()} and friends, functions that return an
 | 
						|
integer status usually return a positive value or zero for success and
 | 
						|
\code{-1} for failure, like \UNIX{} system calls.
 | 
						|
 | 
						|
Finally, be careful to clean up garbage (by making \code{Py_XDECREF()}
 | 
						|
or \code{Py_DECREF()} calls for objects you have already created) when
 | 
						|
you return an error indicator!
 | 
						|
 | 
						|
The choice of which exception to raise is entirely yours.  There are
 | 
						|
predeclared C objects corresponding to all built-in Python exceptions,
 | 
						|
e.g. \code{PyExc_ZeroDevisionError} which you can use directly.  Of
 | 
						|
course, you should choose exceptions wisely --- don't use
 | 
						|
\code{PyExc_TypeError} to mean that a file couldn't be opened (that
 | 
						|
should probably be \code{PyExc_IOError}).  If something's wrong with
 | 
						|
the argument list, the \code{PyArg_ParseTuple()} function usually
 | 
						|
raises \code{PyExc_TypeError}.  If you have an argument whose value
 | 
						|
which must be in a particular range or must satisfy other conditions,
 | 
						|
\code{PyExc_ValueError} is appropriate.
 | 
						|
 | 
						|
You can also define a new exception that is unique to your module.
 | 
						|
For this, you usually declare a static object variable at the
 | 
						|
beginning of your file, e.g.
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    static PyObject *SpamError;
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
and initialize it in your module's initialization function
 | 
						|
(\code{initspam()}) with a string object, e.g. (leaving out the error
 | 
						|
checking for now):
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    void
 | 
						|
    initspam()
 | 
						|
    {
 | 
						|
        PyObject *m, *d;
 | 
						|
        m = Py_InitModule("spam", SpamMethods);
 | 
						|
        d = PyModule_GetDict(m);
 | 
						|
        SpamError = PyString_FromString("spam.error");
 | 
						|
        PyDict_SetItemString(d, "error", SpamError);
 | 
						|
    }
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Note that the Python name for the exception object is
 | 
						|
\code{spam.error}.  It is conventional for module and exception names
 | 
						|
to be spelled in lower case.  It is also conventional that the
 | 
						|
\emph{value} of the exception object is the same as its name, e.g.\
 | 
						|
the string \code{"spam.error"}.
 | 
						|
 | 
						|
 | 
						|
\section{Back to the Example}
 | 
						|
 | 
						|
Going back to our example function, you should now be able to
 | 
						|
understand this statement:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
        if (!PyArg_ParseTuple(args, "s", &command))
 | 
						|
            return NULL;
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
It returns \code{NULL} (the error indicator for functions returning
 | 
						|
object pointers) if an error is detected in the argument list, relying
 | 
						|
on the exception set by \code{PyArg_ParseTuple()}.  Otherwise the
 | 
						|
string value of the argument has been copied to the local variable
 | 
						|
\code{command}.  This is a pointer assignment and you are not supposed
 | 
						|
to modify the string to which it points (so in Standard C, the variable
 | 
						|
\code{command} should properly be declared as \samp{const char
 | 
						|
*command}).
 | 
						|
 | 
						|
The next statement is a call to the \UNIX{} function \code{system()},
 | 
						|
passing it the string we just got from \code{PyArg_ParseTuple()}:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
        sts = system(command);
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Our \code{spam.system()} function must return the value of \code{sts}
 | 
						|
as a Python object.  This is done using the function
 | 
						|
\code{Py_BuildValue()}, which is something like the inverse of
 | 
						|
\code{PyArg_ParseTuple()}: it takes a format string and an arbitrary
 | 
						|
number of C values, and returns a new Python object.  More info on
 | 
						|
\code{Py_BuildValue()} is given later.
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
        return Py_BuildValue("i", sts);
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
In this case, it will return an integer object.  (Yes, even integers
 | 
						|
are objects on the heap in Python!)
 | 
						|
 | 
						|
If you have a C function that returns no useful argument (a function
 | 
						|
returning \code{void}), the corresponding Python function must return
 | 
						|
\code{None}.   You need this idiom to do so:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
        Py_INCREF(Py_None);
 | 
						|
        return Py_None;
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
\code{Py_None} is the C name for the special Python object
 | 
						|
\code{None}.  It is a genuine Python object (not a \code{NULL}
 | 
						|
pointer, which means ``error'' in most contexts, as we have seen).
 | 
						|
 | 
						|
 | 
						|
\section{The Module's Method Table and Initialization Function}
 | 
						|
 | 
						|
I promised to show how \code{spam_system()} is called from Python
 | 
						|
programs.  First, we need to list its name and address in a ``method
 | 
						|
table'':
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    static PyMethodDef SpamMethods[] = {
 | 
						|
        ...
 | 
						|
        {"system",  spam_system, 1},
 | 
						|
        ...
 | 
						|
        {NULL,      NULL}        /* Sentinel */
 | 
						|
    };
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Note the third entry (\samp{1}).  This is a flag telling the
 | 
						|
interpreter the calling convention to be used for the C function.  It
 | 
						|
should normally always be \samp{1}; a value of \samp{0} means that an
 | 
						|
obsolete variant of \code{PyArg_ParseTuple()} is used.
 | 
						|
 | 
						|
The method table must be passed to the interpreter in the module's
 | 
						|
initialization function (which should be the only non-\code{static}
 | 
						|
item defined in the module file):
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    void
 | 
						|
    initspam()
 | 
						|
    {
 | 
						|
        (void) Py_InitModule("spam", SpamMethods);
 | 
						|
    }
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
When the Python program imports module \code{spam} for the first time,
 | 
						|
\code{initspam()} is called.  It calls \code{Py_InitModule()}, which
 | 
						|
creates a ``module object'' (which is inserted in the dictionary
 | 
						|
\code{sys.modules} under the key \code{"spam"}), and inserts built-in
 | 
						|
function objects into the newly created module based upon the table
 | 
						|
(an array of \code{PyMethodDef} structures) that was passed as its
 | 
						|
second argument.  \code{Py_InitModule()} returns a pointer to the
 | 
						|
module object that it creates (which is unused here).  It aborts with
 | 
						|
a fatal error if the module could not be initialized satisfactorily,
 | 
						|
so the caller doesn't need to check for errors.
 | 
						|
 | 
						|
 | 
						|
\section{Compilation and Linkage}
 | 
						|
 | 
						|
There are two more things to do before you can use your new extension:
 | 
						|
compiling and linking it with the Python system.  If you use dynamic
 | 
						|
loading, the details depend on the style of dynamic loading your
 | 
						|
system uses; see the chapter on Dynamic Loading for more info about
 | 
						|
this.
 | 
						|
 | 
						|
If you can't use dynamic loading, or if you want to make your module a
 | 
						|
permanent part of the Python interpreter, you will have to change the
 | 
						|
configuration setup and rebuild the interpreter.  Luckily, this is
 | 
						|
very simple: just place your file (\file{spammodule.c} for example) in
 | 
						|
the \file{Modules} directory, add a line to the file
 | 
						|
\file{Modules/Setup} describing your file:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    spam spammodule.o
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
and rebuild the interpreter by running \code{make} in the toplevel
 | 
						|
directory.  You can also run \code{make} in the \file{Modules}
 | 
						|
subdirectory, but then you must first rebuilt the \file{Makefile}
 | 
						|
there by running \code{make Makefile}.  (This is necessary each time
 | 
						|
you change the \file{Setup} file.)
 | 
						|
 | 
						|
If your module requires additional libraries to link with, these can
 | 
						|
be listed on the line in the \file{Setup} file as well, for instance:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    spam spammodule.o -lX11
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
 | 
						|
\section{Calling Python Functions From C}
 | 
						|
 | 
						|
So far we have concentrated on making C functions callable from
 | 
						|
Python.  The reverse is also useful: calling Python functions from C.
 | 
						|
This is especially the case for libraries that support so-called
 | 
						|
``callback'' functions.  If a C interface makes use of callbacks, the
 | 
						|
equivalent Python often needs to provide a callback mechanism to the
 | 
						|
Python programmer; the implementation will require calling the Python
 | 
						|
callback functions from a C callback.  Other uses are also imaginable.
 | 
						|
 | 
						|
Fortunately, the Python interpreter is easily called recursively, and
 | 
						|
there is a standard interface to call a Python function.  (I won't
 | 
						|
dwell on how to call the Python parser with a particular string as
 | 
						|
input --- if you're interested, have a look at the implementation of
 | 
						|
the \samp{-c} command line option in \file{Python/pythonmain.c}.)
 | 
						|
 | 
						|
Calling a Python function is easy.  First, the Python program must
 | 
						|
somehow pass you the Python function object.  You should provide a
 | 
						|
function (or some other interface) to do this.  When this function is
 | 
						|
called, save a pointer to the Python function object (be careful to
 | 
						|
\code{Py_INCREF()} it!) in a global variable --- or whereever you see fit.
 | 
						|
For example, the following function might be part of a module
 | 
						|
definition:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    static PyObject *my_callback = NULL;
 | 
						|
 | 
						|
    static PyObject *
 | 
						|
    my_set_callback(dummy, arg)
 | 
						|
        PyObject *dummy, *arg;
 | 
						|
    {
 | 
						|
        Py_XDECREF(my_callback); /* Dispose of previous callback */
 | 
						|
        Py_XINCREF(arg); /* Add a reference to new callback */
 | 
						|
        my_callback = arg; /* Remember new callback */
 | 
						|
        /* Boilerplate to return "None" */
 | 
						|
        Py_INCREF(Py_None);
 | 
						|
        return Py_None;
 | 
						|
    }
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
The macros \code{Py_XINCREF()} and \code{Py_XDECREF()} increment/decrement
 | 
						|
the reference count of an object and are safe in the presence of
 | 
						|
\code{NULL} pointers.  More info on them in the section on Reference
 | 
						|
Counts below.
 | 
						|
 | 
						|
Later, when it is time to call the function, you call the C function
 | 
						|
\code{PyEval_CallObject()}.  This function has two arguments, both
 | 
						|
pointers to arbitrary Python objects: the Python function, and the
 | 
						|
argument list.  The argument list must always be a tuple object, whose
 | 
						|
length is the number of arguments.  To call the Python function with
 | 
						|
no arguments, pass an empty tuple; to call it with one argument, pass
 | 
						|
a singleton tuple.  \code{Py_BuildValue()} returns a tuple when its
 | 
						|
format string consists of zero or more format codes between
 | 
						|
parentheses.  For example:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    int arg;
 | 
						|
    PyObject *arglist;
 | 
						|
    PyObject *result;
 | 
						|
    ...
 | 
						|
    arg = 123;
 | 
						|
    ...
 | 
						|
    /* Time to call the callback */
 | 
						|
    arglist = Py_BuildValue("(i)", arg);
 | 
						|
    result = PyEval_CallObject(my_callback, arglist);
 | 
						|
    Py_DECREF(arglist);
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
\code{PyEval_CallObject()} returns a Python object pointer: this is
 | 
						|
the return value of the Python function.  \code{PyEval_CallObject()} is
 | 
						|
``reference-count-neutral'' with respect to its arguments.  In the
 | 
						|
example a new tuple was created to serve as the argument list, which
 | 
						|
is \code{Py_DECREF()}-ed immediately after the call.
 | 
						|
 | 
						|
The return value of \code{PyEval_CallObject()} is ``new'': either it
 | 
						|
is a brand new object, or it is an existing object whose reference
 | 
						|
count has been incremented.  So, unless you want to save it in a
 | 
						|
global variable, you should somehow \code{Py_DECREF()} the result,
 | 
						|
even (especially!) if you are not interested in its value.
 | 
						|
 | 
						|
Before you do this, however, it is important to check that the return
 | 
						|
value isn't \code{NULL}.  If it is, the Python function terminated by raising
 | 
						|
an exception.  If the C code that called \code{PyEval_CallObject()} is
 | 
						|
called from Python, it should now return an error indication to its
 | 
						|
Python caller, so the interpreter can print a stack trace, or the
 | 
						|
calling Python code can handle the exception.  If this is not possible
 | 
						|
or desirable, the exception should be cleared by calling
 | 
						|
\code{PyErr_Clear()}.  For example:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    if (result == NULL)
 | 
						|
        return NULL; /* Pass error back */
 | 
						|
    ...use result...
 | 
						|
    Py_DECREF(result); 
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Depending on the desired interface to the Python callback function,
 | 
						|
you may also have to provide an argument list to \code{PyEval_CallObject()}.
 | 
						|
In some cases the argument list is also provided by the Python
 | 
						|
program, through the same interface that specified the callback
 | 
						|
function.  It can then be saved and used in the same manner as the
 | 
						|
function object.  In other cases, you may have to construct a new
 | 
						|
tuple to pass as the argument list.  The simplest way to do this is to
 | 
						|
call \code{Py_BuildValue()}.  For example, if you want to pass an integral
 | 
						|
event code, you might use the following code:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    PyObject *arglist;
 | 
						|
    ...
 | 
						|
    arglist = Py_BuildValue("(l)", eventcode);
 | 
						|
    result = PyEval_CallObject(my_callback, arglist);
 | 
						|
    Py_DECREF(arglist);
 | 
						|
    if (result == NULL)
 | 
						|
        return NULL; /* Pass error back */
 | 
						|
    /* Here maybe use the result */
 | 
						|
    Py_DECREF(result);
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
Note the placement of \code{Py_DECREF(argument)} immediately after the call,
 | 
						|
before the error check!  Also note that strictly spoken this code is
 | 
						|
not complete: \code{Py_BuildValue()} may run out of memory, and this should
 | 
						|
be checked.
 | 
						|
 | 
						|
 | 
						|
\section{Format Strings for {\tt PyArg_ParseTuple()}}
 | 
						|
 | 
						|
The \code{PyArg_ParseTuple()} function is declared as follows:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    int PyArg_ParseTuple(PyObject *arg, char *format, ...);
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
The \var{arg} argument must be a tuple object containing an argument
 | 
						|
list passed from Python to a C function.  The \var{format} argument
 | 
						|
must be a format string, whose syntax is explained below.  The
 | 
						|
remaining arguments must be addresses of variables whose type is
 | 
						|
determined by the format string.  For the conversion to succeed, the
 | 
						|
\var{arg} object must match the format and the format must be
 | 
						|
exhausted.
 | 
						|
 | 
						|
Note that while \code{PyArg_ParseTuple()} checks that the Python
 | 
						|
arguments have the required types, it cannot check the validity of the
 | 
						|
addresses of C variables passed to the call: if you make mistakes
 | 
						|
there, your code will probably crash or at least overwrite random bits
 | 
						|
in memory.  So be careful!
 | 
						|
 | 
						|
A format string consists of zero or more ``format units''.  A format
 | 
						|
unit describes one Python object; it is usually a single character or
 | 
						|
a parenthesized sequence of format units.  With a few exceptions, a
 | 
						|
format unit that is not a parenthesized sequence normally corresponds
 | 
						|
to a single address argument to \code{PyArg_ParseTuple()}.  In the
 | 
						|
following description, the quoted form is the format unit; the entry
 | 
						|
in (round) parentheses is the Python object type that matches the
 | 
						|
format unit; and the entry in [square] brackets is the type of the C
 | 
						|
variable(s) whose address should be passed.  (Use the \samp{\&}
 | 
						|
operator to pass a variable's address.)
 | 
						|
 | 
						|
\begin{description}
 | 
						|
 | 
						|
\item[\samp{s} (string) [char *]]
 | 
						|
Convert a Python string to a C pointer to a character string.  You
 | 
						|
must not provide storage for the string itself; a pointer to an
 | 
						|
existing string is stored into the character pointer variable whose
 | 
						|
address you pass.  The C string is null-terminated.  The Python string
 | 
						|
must not contain embedded null bytes; if it does, a \code{TypeError}
 | 
						|
exception is raised.
 | 
						|
 | 
						|
\item[\samp{s\#} (string) {[char *, int]}]
 | 
						|
This variant on \code{'s'} stores into two C variables, the first one
 | 
						|
a pointer to a character string, the second one its length.  In this
 | 
						|
case the Python string may contain embedded null bytes.
 | 
						|
 | 
						|
\item[\samp{z} (string or \code{None}) {[char *]}]
 | 
						|
Like \samp{s}, but the Python object may also be \code{None}, in which
 | 
						|
case the C pointer is set to \code{NULL}.
 | 
						|
 | 
						|
\item[\samp{z\#} (string or \code{None}) {[char *, int]}]
 | 
						|
This is to \code{'s\#'} as \code{'z'} is to \code{'s'}.
 | 
						|
 | 
						|
\item[\samp{b} (integer) {[char]}]
 | 
						|
Convert a Python integer to a tiny int, stored in a C \code{char}.
 | 
						|
 | 
						|
\item[\samp{h} (integer) {[short int]}]
 | 
						|
Convert a Python integer to a C \code{short int}.
 | 
						|
 | 
						|
\item[\samp{i} (integer) {[int]}]
 | 
						|
Convert a Python integer to a plain C \code{int}.
 | 
						|
 | 
						|
\item[\samp{l} (integer) {[long int]}]
 | 
						|
Convert a Python integer to a C \code{long int}.
 | 
						|
 | 
						|
\item[\samp{c} (string of length 1) {[char]}]
 | 
						|
Convert a Python character, represented as a string of length 1, to a
 | 
						|
C \code{char}.
 | 
						|
 | 
						|
\item[\samp{f} (float) {[float]}]
 | 
						|
Convert a Python floating point number to a C \code{float}.
 | 
						|
 | 
						|
\item[\samp{d} (float) {[double]}]
 | 
						|
Convert a Python floating point number to a C \code{double}.
 | 
						|
 | 
						|
\item[\samp{O} (object) {[PyObject *]}]
 | 
						|
Store a Python object (without any conversion) in a C object pointer.
 | 
						|
The C program thus receives the actual object that was passed.  The
 | 
						|
object's reference count is not increased.  The pointer stored is not
 | 
						|
\code{NULL}.
 | 
						|
 | 
						|
\item[\samp{O!} (object) {[\var{typeobject}, PyObject *]}]
 | 
						|
Store a Python object in a C object pointer.  This is similar to
 | 
						|
\samp{O}, but takes two C arguments: the first is the address of a
 | 
						|
Python type object, the second is the address of the C variable (of
 | 
						|
type \code{PyObject *}) into which the object pointer is stored.
 | 
						|
If the Python object does not have the required type, a
 | 
						|
\code{TypeError} exception is raised.
 | 
						|
 | 
						|
\item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
 | 
						|
Convert a Python object to a C variable through a \var{converter}
 | 
						|
function.  This takes two arguments: the first is a function, the
 | 
						|
second is the address of a C variable (of arbitrary type), converted
 | 
						|
to \code{void *}.  The \var{converter} function in turn is called as
 | 
						|
follows:
 | 
						|
 | 
						|
\code{\var{status} = \var{converter}(\var{object}, \var{address});}
 | 
						|
 | 
						|
where \var{object} is the Python object to be converted and
 | 
						|
\var{address} is the \code{void *} argument that was passed to
 | 
						|
\code{PyArg_ConvertTuple()}.  The returned \var{status} should be
 | 
						|
\code{1} for a successful conversion and \code{0} if the conversion
 | 
						|
has failed.  When the conversion fails, the \var{converter} function
 | 
						|
should raise an exception.
 | 
						|
 | 
						|
\item[\samp{S} (string) {[PyStringObject *]}]
 | 
						|
Like \samp{O} but raises a \code{TypeError} exception that the object
 | 
						|
is a string object.  The C variable may also be declared as
 | 
						|
\code{PyObject *}.
 | 
						|
 | 
						|
\item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
 | 
						|
The object must be a Python tuple whose length is the number of format
 | 
						|
units in \var{items}.  The C arguments must correspond to the
 | 
						|
individual format units in \var{items}.  Format units for tuples may
 | 
						|
be nested.
 | 
						|
 | 
						|
\end{description}
 | 
						|
 | 
						|
It is possible to pass Python long integers where integers are
 | 
						|
requested; however no proper range checking is done -- the most
 | 
						|
significant bits are silently truncated when the receiving field is
 | 
						|
too small to receive the value (actually, the semantics are inherited
 | 
						|
from downcasts in C --- your milage may vary).
 | 
						|
 | 
						|
A few other characters have a meaning in a format string.  These may
 | 
						|
not occur inside nested parentheses.  They are:
 | 
						|
 | 
						|
\begin{description}
 | 
						|
 | 
						|
\item[\samp{|}]
 | 
						|
Indicates that the remaining arguments in the Python argument list are
 | 
						|
optional.  The C variables corresponding to optional arguments should
 | 
						|
be initialized to their default value --- when an optional argument is
 | 
						|
not specified, the \code{PyArg_ParseTuple} does not touch the contents
 | 
						|
of the corresponding C variable(s).
 | 
						|
 | 
						|
\item[\samp{:}]
 | 
						|
The list of format units ends here; the string after the colon is used
 | 
						|
as the function name in error messages (the ``associated value'' of
 | 
						|
the exceptions that \code{PyArg_ParseTuple} raises).
 | 
						|
 | 
						|
\item[\samp{;}]
 | 
						|
The list of format units ends here; the string after the colon is used
 | 
						|
as the error message \emph{instead} of the default error message.
 | 
						|
Clearly, \samp{:} and \samp{;} mutually exclude each other.
 | 
						|
 | 
						|
\end{description}
 | 
						|
 | 
						|
Some example calls:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    int ok;
 | 
						|
    int i, j;
 | 
						|
    long k, l;
 | 
						|
    char *s;
 | 
						|
    int size;
 | 
						|
 | 
						|
    ok = PyArg_ParseTuple(args, ""); /* No arguments */
 | 
						|
        /* Python call: f() */
 | 
						|
    
 | 
						|
    ok = PyArg_ParseTuple(args, "s", &s); /* A string */
 | 
						|
        /* Possible Python call: f('whoops!') */
 | 
						|
 | 
						|
    ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */
 | 
						|
        /* Possible Python call: f(1, 2, 'three') */
 | 
						|
    
 | 
						|
    ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size);
 | 
						|
        /* A pair of ints and a string, whose size is also returned */
 | 
						|
        /* Possible Python call: f(1, 2, 'three') */
 | 
						|
 | 
						|
    {
 | 
						|
        char *file;
 | 
						|
        char *mode = "r";
 | 
						|
        int bufsize = 0;
 | 
						|
        ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize);
 | 
						|
        /* A string, and optionally another string and an integer */
 | 
						|
        /* Possible Python calls:
 | 
						|
           f('spam')
 | 
						|
           f('spam', 'w')
 | 
						|
           f('spam', 'wb', 100000) */
 | 
						|
    }
 | 
						|
 | 
						|
    {
 | 
						|
        int left, top, right, bottom, h, v;
 | 
						|
        ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)",
 | 
						|
                 &left, &top, &right, &bottom, &h, &v);
 | 
						|
                 /* A rectangle and a point */
 | 
						|
                 /* Possible Python call:
 | 
						|
                    f(((0, 0), (400, 300)), (10, 10)) */
 | 
						|
    }
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
 | 
						|
\section{The {\tt Py_BuildValue()} Function}
 | 
						|
 | 
						|
This function is the counterpart to \code{PyArg_ParseTuple()}.  It is
 | 
						|
declared as follows:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    PyObject *Py_BuildValue(char *format, ...);
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
It recognizes a set of format units similar to the ones recognized by
 | 
						|
\code{PyArg_ParseTuple()}, but the arguments (which are input to the
 | 
						|
function, not output) must not be pointers, just values.  It returns a
 | 
						|
new Python object, suitable for returning from a C function called
 | 
						|
from Python.
 | 
						|
 | 
						|
One difference with \code{PyArg_ParseTuple()}: while the latter
 | 
						|
requires its first argument to be a tuple (since Python argument lists
 | 
						|
are always represented as tuples internally), \code{BuildValue()} does
 | 
						|
not always build a tuple.  It builds a tuple only if its format string
 | 
						|
contains two or more format units.  If the format string is empty, it
 | 
						|
returns \code{None}; if it contains exactly one format unit, it
 | 
						|
returns whatever object is described by that format unit.  To force it
 | 
						|
to return a tuple of size 0 or one, parenthesize the format string.
 | 
						|
 | 
						|
In the following description, the quoted form is the format unit; the
 | 
						|
entry in (round) parentheses is the Python object type that the format
 | 
						|
unit will return; and the entry in [square] brackets is the type of
 | 
						|
the C value(s) to be passed.
 | 
						|
 | 
						|
The characters space, tab, colon and comma are ignored in format
 | 
						|
strings (but not within format units such as \samp{s\#}).  This can be
 | 
						|
used to make long format strings a tad more readable.
 | 
						|
 | 
						|
\begin{description}
 | 
						|
 | 
						|
\item[\samp{s} (string) {[char *]}]
 | 
						|
Convert a null-terminated C string to a Python object.  If the C
 | 
						|
string pointer is \code{NULL}, \code{None} is returned.
 | 
						|
 | 
						|
\item[\samp{s\#} (string) {[char *, int]}]
 | 
						|
Convert a C string and its length to a Python object.  If the C string
 | 
						|
pointer is \code{NULL}, the length is ignored and \code{None} is
 | 
						|
returned.
 | 
						|
 | 
						|
\item[\samp{z} (string or \code{None}) {[char *]}]
 | 
						|
Same as \samp{s}.
 | 
						|
 | 
						|
\item[\samp{z\#} (string or \code{None}) {[char *, int]}]
 | 
						|
Same as \samp{s\#}.
 | 
						|
 | 
						|
\item[\samp{i} (integer) {[int]}]
 | 
						|
Convert a plain C \code{int} to a Python integer object.
 | 
						|
 | 
						|
\item[\samp{b} (integer) {[char]}]
 | 
						|
Same as \samp{i}.
 | 
						|
 | 
						|
\item[\samp{h} (integer) {[short int]}]
 | 
						|
Same as \samp{i}.
 | 
						|
 | 
						|
\item[\samp{l} (integer) {[long int]}]
 | 
						|
Convert a C \code{long int} to a Python integer object.
 | 
						|
 | 
						|
\item[\samp{c} (string of length 1) {[char]}]
 | 
						|
Convert a C \code{int} representing a character to a Python string of
 | 
						|
length 1.
 | 
						|
 | 
						|
\item[\samp{d} (float) {[double]}]
 | 
						|
Convert a C \code{double} to a Python floating point number.
 | 
						|
 | 
						|
\item[\samp{f} (float) {[float]}]
 | 
						|
Same as \samp{d}.
 | 
						|
 | 
						|
\item[\samp{O} (object) {[PyObject *]}]
 | 
						|
Pass a Python object untouched (except for its reference count, which
 | 
						|
is incremented by one).  If the object passed in is a \code{NULL}
 | 
						|
pointer, it is assumed that this was caused because the call producing
 | 
						|
the argument found an error and set an exception.  Therefore,
 | 
						|
\code{Py_BuildValue()} will return \code{NULL} but won't raise an
 | 
						|
exception.  If no exception has been raised yet,
 | 
						|
\code{PyExc_SystemError} is set.
 | 
						|
 | 
						|
\item[\samp{S} (object) {[PyObject *]}]
 | 
						|
Same as \samp{O}.
 | 
						|
 | 
						|
\item[\samp{O\&} (object) {[\var{converter}, \var{anything}]}]
 | 
						|
Convert \var{anything} to a Python object through a \var{converter}
 | 
						|
function.  The function is called with \var{anything} (which should be
 | 
						|
compatible with \code{void *}) as its argument and should return a
 | 
						|
``new'' Python object, or \code{NULL} if an error occurred.
 | 
						|
 | 
						|
\item[\samp{(\var{items})} (tuple) {[\var{matching-items}]}]
 | 
						|
Convert a sequence of C values to a Python tuple with the same number
 | 
						|
of items.
 | 
						|
 | 
						|
\item[\samp{[\var{items}]} (list) {[\var{matching-items}]}]
 | 
						|
Convert a sequence of C values to a Python list with the same number
 | 
						|
of items.
 | 
						|
 | 
						|
\item[\samp{\{\var{items}\}} (dictionary) {[\var{matching-items}]}]
 | 
						|
Convert a sequence of C values to a Python dictionary.  Each pair of
 | 
						|
consecutive C values adds one item to the dictionary, serving as key
 | 
						|
and value, respectively.
 | 
						|
 | 
						|
\end{description}
 | 
						|
 | 
						|
If there is an error in the format string, the
 | 
						|
\code{PyExc_SystemError} exception is raised and \code{NULL} returned.
 | 
						|
 | 
						|
Examples (to the left the call, to the right the resulting Python value):
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
    Py_BuildValue("")                        None
 | 
						|
    Py_BuildValue("i", 123)                  123
 | 
						|
    Py_BuildValue("iii", 123, 456, 789)      (123, 456, 789)
 | 
						|
    Py_BuildValue("s", "hello")              'hello'
 | 
						|
    Py_BuildValue("ss", "hello", "world")    ('hello', 'world')
 | 
						|
    Py_BuildValue("s#", "hello", 4)          'hell'
 | 
						|
    Py_BuildValue("()")                      ()
 | 
						|
    Py_BuildValue("(i)", 123)                (123,)
 | 
						|
    Py_BuildValue("(ii)", 123, 456)          (123, 456)
 | 
						|
    Py_BuildValue("(i,i)", 123, 456)         (123, 456)
 | 
						|
    Py_BuildValue("[i,i]", 123, 456)         [123, 456]
 | 
						|
    Py_BuildValue("{s:i,s:i}",
 | 
						|
                  "abc", 123, "def", 456)    {'abc': 123, 'def': 456}
 | 
						|
    Py_BuildValue("((ii)(ii)) (ii)",
 | 
						|
                  1, 2, 3, 4, 5, 6)          (((1, 2), (3, 4)), (5, 6))
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
 | 
						|
\section{Reference Counts}
 | 
						|
 | 
						|
\subsection{Introduction}
 | 
						|
 | 
						|
In languages like C or \Cpp{}, the programmer is responsible for
 | 
						|
dynamic allocation and deallocation of memory on the heap.  In C, this
 | 
						|
is done using the functions \code{malloc()} and \code{free()}.  In
 | 
						|
\Cpp{}, the operators \code{new} and \code{delete} are used with
 | 
						|
essentially the same meaning; they are actually implemented using
 | 
						|
\code{malloc()} and \code{free()}, so we'll restrict the following
 | 
						|
discussion to the latter.
 | 
						|
 | 
						|
Every block of memory allocated with \code{malloc()} should eventually
 | 
						|
be returned to the pool of available memory by exactly one call to
 | 
						|
\code{free()}.  It is important to call \code{free()} at the right
 | 
						|
time.  If a block's address is forgotten but \code{free()} is not
 | 
						|
called for it, the memory it occupies cannot be reused until the
 | 
						|
program terminates.  This is called a \dfn{memory leak}.  On the other
 | 
						|
hand, if a program calls \code{free()} for a block and then continues
 | 
						|
to use the block, it creates a conflict with re-use of the block
 | 
						|
through another \code{malloc()} call.  This is called \dfn{using freed
 | 
						|
memory} has the same bad consequences as referencing uninitialized
 | 
						|
data --- core dumps, wrong results, mysterious crashes.
 | 
						|
 | 
						|
Common causes of memory leaks are unusual paths through the code.  For
 | 
						|
instance, a function may allocate a block of memory, do some
 | 
						|
calculation, and then free the block again.  Now a change in the
 | 
						|
requirements for the function may add a test to the calculation that
 | 
						|
detects an error condition and can return prematurely from the
 | 
						|
function.  It's easy to forget to free the allocated memory block when
 | 
						|
taking this premature exit, especially when it is added later to the
 | 
						|
code.  Such leaks, once introduced, often go undetected for a long
 | 
						|
time: the error exit is taken only in a small fraction of all calls,
 | 
						|
and most modern machines have plenty of virtual memory, so the leak
 | 
						|
only becomes apparent in a long-running process that uses the leaking
 | 
						|
function frequently.  Therefore, it's important to prevent leaks from
 | 
						|
happening by having a coding convention or strategy that minimizes
 | 
						|
this kind of errors.
 | 
						|
 | 
						|
Since Python makes heavy use of \code{malloc()} and \code{free()}, it
 | 
						|
needs a strategy to avoid memory leaks as well as the use of freed
 | 
						|
memory.  The chosen method is called \dfn{reference counting}.  The
 | 
						|
principle is simple: every object contains a counter, which is
 | 
						|
incremented when a reference to the object is stored somewhere, and
 | 
						|
which is decremented when a reference to it is deleted.  When the
 | 
						|
counter reaches zero, the last reference to the object has been
 | 
						|
deleted and the object is freed.
 | 
						|
 | 
						|
An alternative strategy is called \dfn{automatic garbage collection}.
 | 
						|
(Sometimes, reference counting is also referred to as a garbage
 | 
						|
collection strategy, hence my use of ``automatic'' to distinguish the
 | 
						|
two.)  The big advantage of automatic garbage collection is that the
 | 
						|
user doesn't need to call \code{free()} explicitly.  (Another claimed
 | 
						|
advantage is an improvement in speed or memory usage --- this is no
 | 
						|
hard fact however.)  The disadvantage is that for C, there is no
 | 
						|
truly portable automatic garbage collector, while reference counting
 | 
						|
can be implemented portably (as long as the functions \code{malloc()}
 | 
						|
and \code{free()} are available --- which the C Standard guarantees).
 | 
						|
Maybe some day a sufficiently portable automatic garbage collector
 | 
						|
will be available for C.  Until then, we'll have to live with
 | 
						|
reference counts.
 | 
						|
 | 
						|
\subsection{Reference Counting in Python}
 | 
						|
 | 
						|
There are two macros, \code{Py_INCREF(x)} and \code{Py_DECREF(x)},
 | 
						|
which handle the incrementing and decrementing of the reference count.
 | 
						|
\code{Py_DECREF()} also frees the object when the count reaches zero.
 | 
						|
For flexibility, it doesn't call \code{free()} directly --- rather, it
 | 
						|
makes a call through a function pointer in the object's \dfn{type
 | 
						|
object}.  For this purpose (and others), every object also contains a
 | 
						|
pointer to its type object.
 | 
						|
 | 
						|
The big question now remains: when to use \code{Py_INCREF(x)} and
 | 
						|
\code{Py_DECREF(x)}?  Let's first introduce some terms.  Nobody
 | 
						|
``owns'' an object; however, you can \dfn{own a reference} to an
 | 
						|
object.  An object's reference count is now defined as the number of
 | 
						|
owned references to it.  The owner of a reference is responsible for
 | 
						|
calling \code{Py_DECREF()} when the reference is no longer needed.
 | 
						|
Ownership of a reference can be transferred.  There are three ways to
 | 
						|
dispose of an owned reference: pass it on, store it, or call
 | 
						|
\code{Py_DECREF()}.  Forgetting to dispose of an owned reference creates
 | 
						|
a memory leak.
 | 
						|
 | 
						|
It is also possible to \dfn{borrow}\footnote{The metaphor of
 | 
						|
``borrowing'' a reference is not completely correct: the owner still
 | 
						|
has a copy of the reference.} a reference to an object.  The borrower
 | 
						|
of a reference should not call \code{Py_DECREF()}.  The borrower must
 | 
						|
not hold on to the object longer than the owner from which it was
 | 
						|
borrowed.  Using a borrowed reference after the owner has disposed of
 | 
						|
it risks using freed memory and should be avoided
 | 
						|
completely.\footnote{Checking that the reference count is at least 1
 | 
						|
\strong{does not work} --- the reference count itself could be in
 | 
						|
freed memory and may thus be reused for another object!}
 | 
						|
 | 
						|
The advantage of borrowing over owning a reference is that you don't
 | 
						|
need to take care of disposing of the reference on all possible paths
 | 
						|
through the code --- in other words, with a borrowed reference you
 | 
						|
don't run the risk of leaking when a premature exit is taken.  The
 | 
						|
disadvantage of borrowing over leaking is that there are some subtle
 | 
						|
situations where in seemingly correct code a borrowed reference can be
 | 
						|
used after the owner from which it was borrowed has in fact disposed
 | 
						|
of it.
 | 
						|
 | 
						|
A borrowed reference can be changed into an owned reference by calling
 | 
						|
\code{Py_INCREF()}.  This does not affect the status of the owner from
 | 
						|
which the reference was borrowed --- it creates a new owned reference,
 | 
						|
and gives full owner responsibilities (i.e., the new owner must
 | 
						|
dispose of the reference properly, as well as the previous owner).
 | 
						|
 | 
						|
\subsection{Ownership Rules}
 | 
						|
 | 
						|
Whenever an object reference is passed into or out of a function, it
 | 
						|
is part of the function's interface specification whether ownership is
 | 
						|
transferred with the reference or not.
 | 
						|
 | 
						|
Most functions that return a reference to an object pass on ownership
 | 
						|
with the reference.  In particular, all functions whose function it is
 | 
						|
to create a new object, e.g.\ \code{PyInt_FromLong()} and
 | 
						|
\code{Py_BuildValue()}, pass ownership to the receiver.  Even if in
 | 
						|
fact, in some cases, you don't receive a reference to a brand new
 | 
						|
object, you still receive ownership of the reference.  For instance,
 | 
						|
\code{PyInt_FromLong()} maintains a cache of popular values and can
 | 
						|
return a reference to a cached item.
 | 
						|
 | 
						|
Many functions that extract objects from other objects also transfer
 | 
						|
ownership with the reference, for instance
 | 
						|
\code{PyObject_GetAttrString()}.  The picture is less clear, here,
 | 
						|
however, since a few common routines are exceptions:
 | 
						|
\code{PyTuple_GetItem()}, \code{PyList_GetItem()} and
 | 
						|
\code{PyDict_GetItem()} (and \code{PyDict_GetItemString()}) all return
 | 
						|
references that you borrow from the tuple, list or dictionary.
 | 
						|
 | 
						|
The function \code{PyImport_AddModule()} also returns a borrowed
 | 
						|
reference, even though it may actually create the object it returns:
 | 
						|
this is possible because an owned reference to the object is stored in
 | 
						|
\code{sys.modules}.
 | 
						|
 | 
						|
When you pass an object reference into another function, in general,
 | 
						|
the function borrows the reference from you --- if it needs to store
 | 
						|
it, it will use \code{Py_INCREF()} to become an independent owner.
 | 
						|
There are exactly two important exceptions to this rule:
 | 
						|
\code{PyTuple_SetItem()} and \code{PyList_SetItem()}.  These functions
 | 
						|
take over ownership of the item passed to them --- even if they fail!
 | 
						|
(Note that \code{PyDict_SetItem()} and friends don't take over
 | 
						|
ownership --- they are ``normal''.)
 | 
						|
 | 
						|
When a C function is called from Python, it borrows references to its
 | 
						|
arguments from the caller.  The caller owns a reference to the object,
 | 
						|
so the borrowed reference's lifetime is guaranteed until the function
 | 
						|
returns.  Only when such a borrowed reference must be stored or passed
 | 
						|
on, it must be turned into an owned reference by calling
 | 
						|
\code{Py_INCREF()}.
 | 
						|
 | 
						|
The object reference returned from a C function that is called from
 | 
						|
Python must be an owned reference --- ownership is tranferred from the
 | 
						|
function to its caller.
 | 
						|
 | 
						|
\subsection{Thin Ice}
 | 
						|
 | 
						|
There are a few situations where seemingly harmless use of a borrowed
 | 
						|
reference can lead to problems.  These all have to do with implicit
 | 
						|
invocations of the interpreter, which can cause the owner of a
 | 
						|
reference to dispose of it.
 | 
						|
 | 
						|
The first and most important case to know about is using
 | 
						|
\code{Py_DECREF()} on an unrelated object while borrowing a reference
 | 
						|
to a list item.  For instance:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
bug(PyObject *list) {
 | 
						|
    PyObject *item = PyList_GetItem(list, 0);
 | 
						|
    PyList_SetItem(list, 1, PyInt_FromLong(0L));
 | 
						|
    PyObject_Print(item, stdout, 0); /* BUG! */
 | 
						|
}
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
This function first borrows a reference to \code{list[0]}, then
 | 
						|
replaces \code{list[1]} with the value \code{0}, and finally prints
 | 
						|
the borrowed reference.  Looks harmless, right?  But it's not!
 | 
						|
 | 
						|
Let's follow the control flow into \code{PyList_SetItem()}.  The list
 | 
						|
owns references to all its items, so when item 1 is replaced, it has
 | 
						|
to dispose of the original item 1.  Now let's suppose the original
 | 
						|
item 1 was an instance of a user-defined class, and let's further
 | 
						|
suppose that the class defined a \code{__del__()} method.  If this
 | 
						|
class instance has a reference count of 1, disposing of it will call
 | 
						|
its \code{__del__()} method.
 | 
						|
 | 
						|
Since it is written in Python, the \code{__del__()} method can execute
 | 
						|
arbitrary Python code.  Could it perhaps do something to invalidate
 | 
						|
the reference to \code{item} in \code{bug()}?  You bet!  Assuming that
 | 
						|
the list passed into \code{bug()} is accessible to the
 | 
						|
\code{__del__()} method, it could execute a statement to the effect of
 | 
						|
\code{del list[0]}, and assuming this was the last reference to that
 | 
						|
object, it would free the memory associated with it, thereby
 | 
						|
invalidating \code{item}.
 | 
						|
 | 
						|
The solution, once you know the source of the problem, is easy:
 | 
						|
temporarily increment the reference count.  The correct version of the
 | 
						|
function reads:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
no_bug(PyObject *list) {
 | 
						|
    PyObject *item = PyList_GetItem(list, 0);
 | 
						|
    Py_INCREF(item);
 | 
						|
    PyList_SetItem(list, 1, PyInt_FromLong(0L));
 | 
						|
    PyObject_Print(item, stdout, 0);
 | 
						|
    Py_DECREF(item);
 | 
						|
}
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
This is a true story.  An older version of Python contained variants
 | 
						|
of this bug and someone spent a considerable amount of time in a C
 | 
						|
debugger to figure out why his \code{__del__()} methods would fail...
 | 
						|
 | 
						|
The second case of problems with a borrowed reference is a variant
 | 
						|
involving threads.  Normally, multiple threads in the Python
 | 
						|
interpreter can't get in each other's way, because there is a global
 | 
						|
lock protecting Python's entire object space.  However, it is possible
 | 
						|
to temporarily release this lock using the macro
 | 
						|
\code{Py_BEGIN_ALLOW_THREADS}, and to re-acquire it using
 | 
						|
\code{Py_END_ALLOW_THREADS}.  This is common around blocking I/O
 | 
						|
calls, to let other threads use the CPU while waiting for the I/O to
 | 
						|
complete.  Obviously, the following function has the same problem as
 | 
						|
the previous one:
 | 
						|
 | 
						|
\begin{verbatim}
 | 
						|
bug(PyObject *list) {
 | 
						|
    PyObject *item = PyList_GetItem(list, 0);
 | 
						|
    Py_BEGIN_ALLOW_THREADS
 | 
						|
    ...some blocking I/O call...
 | 
						|
    Py_END_ALLOW_THREADS
 | 
						|
    PyObject_Print(item, stdout, 0); /* BUG! */
 | 
						|
}
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
\subsection{NULL Pointers}
 | 
						|
 | 
						|
In general, functions that take object references as arguments don't
 | 
						|
expect you to pass them \code{NULL} pointers, and will dump core (or
 | 
						|
cause later core dumps) if you do so.  Functions that return object
 | 
						|
references generally return \code{NULL} only to indicate that an
 | 
						|
exception occurred.  The reason for not testing for \code{NULL}
 | 
						|
arguments is that functions often pass the objects they receive on to
 | 
						|
other function --- if each function were to test for \code{NULL},
 | 
						|
there would be a lot of redundant tests and the code would run slower.
 | 
						|
 | 
						|
It is better to test for \code{NULL} only at the ``source'', i.e.\
 | 
						|
when a pointer that may be \code{NULL} is received, e.g.\ from
 | 
						|
\code{malloc()} or from a function that may raise an exception.
 | 
						|
 | 
						|
The macros \code{Py_INCREF()} and \code{Py_DECREF()}
 | 
						|
don't check for \code{NULL} pointers --- however, their variants
 | 
						|
\code{Py_XINCREF()} and \code{Py_XDECREF()} do.
 | 
						|
 | 
						|
The macros for checking for a particular object type
 | 
						|
(\code{Py\var{type}_Check()}) don't check for \code{NULL} pointers ---
 | 
						|
again, there is much code that calls several of these in a row to test
 | 
						|
an object against various different expected types, and this would
 | 
						|
generate redundant tests.  There are no variants with \code{NULL}
 | 
						|
checking.
 | 
						|
 | 
						|
The C function calling mechanism guarantees that the argument list
 | 
						|
passed to C functions (\code{args} in the examples) is never
 | 
						|
\code{NULL} --- in fact it guarantees that it is always a tuple.%
 | 
						|
\footnote{These guarantees don't hold when you use the ``old'' style
 | 
						|
calling convention --- this is still found in much existing code.}
 | 
						|
 | 
						|
It is a severe error to ever let a \code{NULL} pointer ``escape'' to
 | 
						|
the Python user.  
 | 
						|
 | 
						|
 | 
						|
\section{Writing Extensions in \Cpp{}}
 | 
						|
 | 
						|
It is possible to write extension modules in \Cpp{}.  Some restrictions
 | 
						|
apply.  If the main program (the Python interpreter) is compiled and
 | 
						|
linked by the C compiler, global or static objects with constructors
 | 
						|
cannot be used.  This is not a problem if the main program is linked
 | 
						|
by the \Cpp{} compiler.  All functions that will be called directly or
 | 
						|
indirectly (i.e. via function pointers) by the Python interpreter will
 | 
						|
have to be declared using \code{extern "C"}; this applies to all
 | 
						|
``methods'' as well as to the module's initialization function.
 | 
						|
It is unnecessary to enclose the Python header files in
 | 
						|
\code{extern "C" \{...\}} --- they use this form already if the symbol
 | 
						|
\samp{__cplusplus} is defined (all recent C++ compilers define this
 | 
						|
symbol).
 | 
						|
 | 
						|
\chapter{Embedding Python in another application}
 | 
						|
 | 
						|
Embedding Python is similar to extending it, but not quite.  The
 | 
						|
difference is that when you extend Python, the main program of the
 | 
						|
application is still the Python interpreter, while if you embed
 | 
						|
Python, the main program may have nothing to do with Python ---
 | 
						|
instead, some parts of the application occasionally call the Python
 | 
						|
interpreter to run some Python code.
 | 
						|
 | 
						|
So if you are embedding Python, you are providing your own main
 | 
						|
program.  One of the things this main program has to do is initialize
 | 
						|
the Python interpreter.  At the very least, you have to call the
 | 
						|
function \code{Py_Initialize()}.  There are optional calls to pass command
 | 
						|
line arguments to Python.  Then later you can call the interpreter
 | 
						|
from any part of the application.
 | 
						|
 | 
						|
There are several different ways to call the interpreter: you can pass
 | 
						|
a string containing Python statements to \code{PyRun_SimpleString()},
 | 
						|
or you can pass a stdio file pointer and a file name (for
 | 
						|
identification in error messages only) to \code{PyRun_SimpleFile()}.  You
 | 
						|
can also call the lower-level operations described in the previous
 | 
						|
chapters to construct and use Python objects.
 | 
						|
 | 
						|
A simple demo of embedding Python can be found in the directory
 | 
						|
\file{Demo/embed}.
 | 
						|
 | 
						|
 | 
						|
\section{Embedding Python in \Cpp{}}
 | 
						|
 | 
						|
It is also possible to embed Python in a \Cpp{} program; precisely how this
 | 
						|
is done will depend on the details of the \Cpp{} system used; in general you
 | 
						|
will need to write the main program in \Cpp{}, and use the \Cpp{} compiler
 | 
						|
to compile and link your program.  There is no need to recompile Python
 | 
						|
itself using \Cpp{}.
 | 
						|
 | 
						|
 | 
						|
\chapter{Dynamic Loading}
 | 
						|
 | 
						|
On most modern systems it is possible to configure Python to support
 | 
						|
dynamic loading of extension modules implemented in C.  When shared
 | 
						|
libraries are used dynamic loading is configured automatically;
 | 
						|
otherwise you have to select it as a build option (see below).  Once
 | 
						|
configured, dynamic loading is trivial to use: when a Python program
 | 
						|
executes \code{import spam}, the search for modules tries to find a
 | 
						|
file \file{spammodule.o} (\file{spammodule.so} when using shared
 | 
						|
libraries) in the module search path, and if one is found, it is
 | 
						|
loaded into the executing binary and executed.  Once loaded, the
 | 
						|
module acts just like a built-in extension module.
 | 
						|
 | 
						|
The advantages of dynamic loading are twofold: the ``core'' Python
 | 
						|
binary gets smaller, and users can extend Python with their own
 | 
						|
modules implemented in C without having to build and maintain their
 | 
						|
own copy of the Python interpreter.  There are also disadvantages:
 | 
						|
dynamic loading isn't available on all systems (this just means that
 | 
						|
on some systems you have to use static loading), and dynamically
 | 
						|
loading a module that was compiled for a different version of Python
 | 
						|
(e.g. with a different representation of objects) may dump core.
 | 
						|
 | 
						|
 | 
						|
\section{Configuring and Building the Interpreter for Dynamic Loading}
 | 
						|
 | 
						|
There are three styles of dynamic loading: one using shared libraries,
 | 
						|
one using SGI IRIX 4 dynamic loading, and one using GNU dynamic
 | 
						|
loading.
 | 
						|
 | 
						|
\subsection{Shared Libraries}
 | 
						|
 | 
						|
The following systems support dynamic loading using shared libraries:
 | 
						|
SunOS 4; Solaris 2; SGI IRIX 5 (but not SGI IRIX 4!); and probably all
 | 
						|
systems derived from SVR4, or at least those SVR4 derivatives that
 | 
						|
support shared libraries (are there any that don't?).
 | 
						|
 | 
						|
You don't need to do anything to configure dynamic loading on these
 | 
						|
systems --- the \file{configure} detects the presence of the
 | 
						|
\file{<dlfcn.h>} header file and automatically configures dynamic
 | 
						|
loading.
 | 
						|
 | 
						|
\subsection{SGI IRIX 4 Dynamic Loading}
 | 
						|
 | 
						|
Only SGI IRIX 4 supports dynamic loading of modules using SGI dynamic
 | 
						|
loading.  (SGI IRIX 5 might also support it but it is inferior to
 | 
						|
using shared libraries so there is no reason to; a small test didn't
 | 
						|
work right away so I gave up trying to support it.)
 | 
						|
 | 
						|
Before you build Python, you first need to fetch and build the \code{dl}
 | 
						|
package written by Jack Jansen.  This is available by anonymous ftp
 | 
						|
from host \file{ftp.cwi.nl}, directory \file{pub/dynload}, file
 | 
						|
\file{dl-1.6.tar.Z}.  (The version number may change.)  Follow the
 | 
						|
instructions in the package's \file{README} file to build it.
 | 
						|
 | 
						|
Once you have built \code{dl}, you can configure Python to use it.  To
 | 
						|
this end, you run the \file{configure} script with the option
 | 
						|
\code{--with-dl=\var{directory}} where \var{directory} is the absolute
 | 
						|
pathname of the \code{dl} directory.
 | 
						|
 | 
						|
Now build and install Python as you normally would (see the
 | 
						|
\file{README} file in the toplevel Python directory.)
 | 
						|
 | 
						|
\subsection{GNU Dynamic Loading}
 | 
						|
 | 
						|
GNU dynamic loading supports (according to its \file{README} file) the
 | 
						|
following hardware and software combinations: VAX (Ultrix), Sun 3
 | 
						|
(SunOS 3.4 and 4.0), Sparc (SunOS 4.0), Sequent Symmetry (Dynix), and
 | 
						|
Atari ST.  There is no reason to use it on a Sparc; I haven't seen a
 | 
						|
Sun 3 for years so I don't know if these have shared libraries or not.
 | 
						|
 | 
						|
You need to fetch and build two packages.  One is GNU DLD 3.2.3,
 | 
						|
available by anonymous ftp from host \file{ftp.cwi.nl}, directory
 | 
						|
\file{pub/dynload}, file \file{dld-3.2.3.tar.Z}.  (As far as I know,
 | 
						|
no further development on GNU DLD is being done.)  The other is an
 | 
						|
emulation of Jack Jansen's \code{dl} package that I wrote on top of
 | 
						|
GNU DLD 3.2.3.  This is available from the same host and directory,
 | 
						|
file dl-dld-1.1.tar.Z.  (The version number may change --- but I doubt
 | 
						|
it will.)  Follow the instructions in each package's \file{README}
 | 
						|
file to configure build them.
 | 
						|
 | 
						|
Now configure Python.  Run the \file{configure} script with the option
 | 
						|
\code{--with-dl-dld=\var{dl-directory},\var{dld-directory}} where
 | 
						|
\var{dl-directory} is the absolute pathname of the directory where you
 | 
						|
have built the \file{dl-dld} package, and \var{dld-directory} is that
 | 
						|
of the GNU DLD package.  The Python interpreter you build hereafter
 | 
						|
will support GNU dynamic loading.
 | 
						|
 | 
						|
 | 
						|
\section{Building a Dynamically Loadable Module}
 | 
						|
 | 
						|
Since there are three styles of dynamic loading, there are also three
 | 
						|
groups of instructions for building a dynamically loadable module.
 | 
						|
Instructions common for all three styles are given first.  Assuming
 | 
						|
your module is called \code{spam}, the source filename must be
 | 
						|
\file{spammodule.c}, so the object name is \file{spammodule.o}.  The
 | 
						|
module must be written as a normal Python extension module (as
 | 
						|
described earlier).
 | 
						|
 | 
						|
Note that in all cases you will have to create your own Makefile that
 | 
						|
compiles your module file(s).  This Makefile will have to pass two
 | 
						|
\samp{-I} arguments to the C compiler which will make it find the
 | 
						|
Python header files.  If the Make variable \var{PYTHONTOP} points to
 | 
						|
the toplevel Python directory, your \var{CFLAGS} Make variable should
 | 
						|
contain the options \samp{-I\$(PYTHONTOP) -I\$(PYTHONTOP)/Include}.
 | 
						|
(Most header files are in the \file{Include} subdirectory, but the
 | 
						|
\file{config.h} header lives in the toplevel directory.)
 | 
						|
 | 
						|
 | 
						|
\subsection{Shared Libraries}
 | 
						|
 | 
						|
You must link the \samp{.o} file to produce a shared library.  This is
 | 
						|
done using a special invocation of the \UNIX{} loader/linker, {\em
 | 
						|
ld}(1).  Unfortunately the invocation differs slightly per system.
 | 
						|
 | 
						|
On SunOS 4, use
 | 
						|
\begin{verbatim}
 | 
						|
    ld spammodule.o -o spammodule.so
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
On Solaris 2, use
 | 
						|
\begin{verbatim}
 | 
						|
    ld -G spammodule.o -o spammodule.so
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
On SGI IRIX 5, use
 | 
						|
\begin{verbatim}
 | 
						|
    ld -shared spammodule.o -o spammodule.so
 | 
						|
\end{verbatim}
 | 
						|
 | 
						|
On other systems, consult the manual page for \code{ld}(1) to find what
 | 
						|
flags, if any, must be used.
 | 
						|
 | 
						|
If your extension module uses system libraries that haven't already
 | 
						|
been linked with Python (e.g. a windowing system), these must be
 | 
						|
passed to the \code{ld} command as \samp{-l} options after the
 | 
						|
\samp{.o} file.
 | 
						|
 | 
						|
The resulting file \file{spammodule.so} must be copied into a directory
 | 
						|
along the Python module search path.
 | 
						|
 | 
						|
 | 
						|
\subsection{SGI IRIX 4 Dynamic Loading}
 | 
						|
 | 
						|
{\bf IMPORTANT:} You must compile your extension module with the
 | 
						|
additional C flag \samp{-G0} (or \samp{-G 0}).  This instruct the
 | 
						|
assembler to generate position-independent code.
 | 
						|
 | 
						|
You don't need to link the resulting \file{spammodule.o} file; just
 | 
						|
copy it into a directory along the Python module search path.
 | 
						|
 | 
						|
The first time your extension is loaded, it takes some extra time and
 | 
						|
a few messages may be printed.  This creates a file
 | 
						|
\file{spammodule.ld} which is an image that can be loaded quickly into
 | 
						|
the Python interpreter process.  When a new Python interpreter is
 | 
						|
installed, the \code{dl} package detects this and rebuilds
 | 
						|
\file{spammodule.ld}.  The file \file{spammodule.ld} is placed in the
 | 
						|
directory where \file{spammodule.o} was found, unless this directory is
 | 
						|
unwritable; in that case it is placed in a temporary
 | 
						|
directory.\footnote{Check the manual page of the \code{dl} package for
 | 
						|
details.}
 | 
						|
 | 
						|
If your extension modules uses additional system libraries, you must
 | 
						|
create a file \file{spammodule.libs} in the same directory as the
 | 
						|
\file{spammodule.o}.  This file should contain one or more lines with
 | 
						|
whitespace-separated options that will be passed to the linker ---
 | 
						|
normally only \samp{-l} options or absolute pathnames of libraries
 | 
						|
(\samp{.a} files) should be used.
 | 
						|
 | 
						|
 | 
						|
\subsection{GNU Dynamic Loading}
 | 
						|
 | 
						|
Just copy \file{spammodule.o} into a directory along the Python module
 | 
						|
search path.
 | 
						|
 | 
						|
If your extension modules uses additional system libraries, you must
 | 
						|
create a file \file{spammodule.libs} in the same directory as the
 | 
						|
\file{spammodule.o}.  This file should contain one or more lines with
 | 
						|
whitespace-separated absolute pathnames of libraries (\samp{.a}
 | 
						|
files).  No \samp{-l} options can be used.
 | 
						|
 | 
						|
 | 
						|
\input{extref}
 | 
						|
 | 
						|
\input{ext.ind}
 | 
						|
 | 
						|
\end{document}
 |