mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 13:41:24 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			974 lines
		
	
	
	
		
			34 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			974 lines
		
	
	
	
		
			34 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \chapter{Defining New Types
 | |
|         \label{defining-new-types}}
 | |
| \sectionauthor{Michael Hudson}{mwh@python.net}
 | |
| \sectionauthor{Dave Kuhlman}{dkuhlman@rexx.com}
 | |
| 
 | |
| As mentioned in the last chapter, Python allows the writer of an
 | |
| extension module to define new types that can be manipulated from
 | |
| Python code, much like strings and lists in core Python.
 | |
| 
 | |
| This is not hard; the code for all extension types follows a pattern,
 | |
| but there are some details that you need to understand before you can
 | |
| get started.
 | |
| 
 | |
| \section{The Basics
 | |
|     \label{dnt-basics}}
 | |
| 
 | |
| The Python runtime sees all Python objects as variables of type
 | |
| \ctype{PyObject*}.  A \ctype{PyObject} is not a very magnificent
 | |
| object - it just contains the refcount and a pointer to the object's
 | |
| ``type object''.  This is where the action is; the type object
 | |
| determines which (C) functions get called when, for instance, an
 | |
| attribute gets looked up on an object or it is multiplied by another
 | |
| object.  These C functions are called ``type methods'' to distinguish
 | |
| them from things like \code{[].append} (which we call ``object
 | |
| methods'').
 | |
| 
 | |
| So, if you want to define a new object type, you need to create a new
 | |
| type object.
 | |
| 
 | |
| This sort of thing can only be explained by example, so here's a
 | |
| minimal, but complete, module that defines a new type:
 | |
| 
 | |
| \verbatiminput{noddy.c}
 | |
| 
 | |
| Now that's quite a bit to take in at once, but hopefully bits will
 | |
| seem familiar from the last chapter.
 | |
| 
 | |
| The first bit that will be new is:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static PyTypeObject noddy_NoddyType;
 | |
| \end{verbatim}
 | |
| 
 | |
| This names the type object that will be defining further down in the
 | |
| file.  It can't be defined here because its definition has to refer to
 | |
| functions that have no yet been defined, but we need to be able to
 | |
| refer to it, hence the declaration.
 | |
| 
 | |
| \begin{verbatim}
 | |
| typedef struct {
 | |
|     PyObject_HEAD
 | |
| } noddy_NoddyObject;
 | |
| \end{verbatim}
 | |
| 
 | |
| This is what a Noddy object will contain.  In this case nothing more
 | |
| than every Python object contains - a refcount and a pointer to a type
 | |
| object.  These are the fields the \code{PyObject_HEAD} macro brings
 | |
| in.  The reason for the macro is to standardize the layout and to
 | |
| enable special debugging fields in debug builds.  Note that there is
 | |
| no semicolon after the \code{PyObject_HEAD} macro; one is included in
 | |
| the macro definition.  Be wary of adding one by accident; it's easy to
 | |
| do from habit, and your compiler might not complain, but someone
 | |
| else's probably will!  (On Windows, MSVC is known to call this an
 | |
| error and refuse to produce compiled code.)
 | |
| 
 | |
| For contrast, let's take a look at the corresponding definition for
 | |
| standard Python integers:
 | |
| 
 | |
| \begin{verbatim}
 | |
| typedef struct {
 | |
|     PyObject_HEAD
 | |
|     long ob_ival;
 | |
| } PyIntObject;
 | |
| \end{verbatim}
 | |
| 
 | |
| Next up is:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static PyObject*
 | |
| noddy_new_noddy(PyObject* self, PyObject* args)
 | |
| {
 | |
|     noddy_NoddyObject* noddy;
 | |
| 
 | |
|     if (!PyArg_ParseTuple(args,":new_noddy")) 
 | |
|         return NULL;
 | |
| 
 | |
|     noddy = PyObject_New(noddy_NoddyObject, &noddy_NoddyType);
 | |
| 
 | |
|     return (PyObject*)noddy;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| This is in fact just a regular module function, as described in the
 | |
| last chapter.  The reason it gets special mention is that this is
 | |
| where we create our Noddy object.  Defining \ctype{PyTypeObject}
 | |
| structures is all very well, but if there's no way to actually
 | |
| \emph{create} one of the wretched things it is not going to do anyone
 | |
| much good.
 | |
| 
 | |
| Almost always, you create objects with a call of the form:
 | |
| 
 | |
| \begin{verbatim}
 | |
| PyObject_New(<type>, &<type object>);
 | |
| \end{verbatim}
 | |
| 
 | |
| This allocates the memory and then initializes the object (sets
 | |
| the reference count to one, makes the \member{ob_type} pointer point at
 | |
| the right place and maybe some other stuff, depending on build options).
 | |
| You \emph{can} do these steps separately if you have some reason to
 | |
| --- but at this level we don't bother.
 | |
| 
 | |
| Note that \cfunction{PyObject_New()} is a polymorphic macro rather
 | |
| than a real function.  The first parameter is the name of the C
 | |
| structure that represents an object of our new type, and the return
 | |
| value is a pointer to that type.  This would be
 | |
| \ctype{noddy_NoddyObject} in our example:
 | |
| 
 | |
| \begin{verbatim}
 | |
|     noddy_NoddyObject *my_noddy;
 | |
| 
 | |
|     my_noddy = PyObject_New(noddy_NoddyObject, &noddy_NoddyType);
 | |
| \end{verbatim}
 | |
| 
 | |
| We cast the return value to a \ctype{PyObject*} because that's what
 | |
| the Python runtime expects.  This is safe because of guarantees about
 | |
| the layout of structures in the C standard, and is a fairly common C
 | |
| programming trick.  One could declare \cfunction{noddy_new_noddy} to
 | |
| return a \ctype{noddy_NoddyObject*} and then put a cast in the
 | |
| definition of \cdata{noddy_methods} further down the file --- it
 | |
| doesn't make much difference.
 | |
| 
 | |
| Now a Noddy object doesn't do very much and so doesn't need to
 | |
| implement many type methods.  One you can't avoid is handling
 | |
| deallocation, so we find
 | |
| 
 | |
| \begin{verbatim}
 | |
| static void
 | |
| noddy_noddy_dealloc(PyObject* self)
 | |
| {
 | |
|     PyObject_Del(self);
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| This is so short as to be self explanatory.  This function will be
 | |
| called when the reference count on a Noddy object reaches \code{0} (or
 | |
| it is found as part of an unreachable cycle by the cyclic garbage
 | |
| collector).  \cfunction{PyObject_Del()} is what you call when you want
 | |
| an object to go away.  If a Noddy object held references to other
 | |
| Python objects, one would decref them here.
 | |
| 
 | |
| Moving on, we come to the crunch --- the type object.
 | |
| 
 | |
| \begin{verbatim}
 | |
| static PyTypeObject noddy_NoddyType = {
 | |
|     PyObject_HEAD_INIT(NULL)
 | |
|     0,                          /* ob_size */
 | |
|     "Noddy",                    /* tp_name */
 | |
|     sizeof(noddy_NoddyObject),  /* tp_basicsize */
 | |
|     0,                          /* tp_itemsize */
 | |
|     noddy_noddy_dealloc,        /* tp_dealloc */
 | |
|     0,                          /* tp_print */
 | |
|     0,                          /* tp_getattr */
 | |
|     0,                          /* tp_setattr */
 | |
|     0,                          /* tp_compare */
 | |
|     0,                          /* tp_repr */
 | |
|     0,                          /* tp_as_number */
 | |
|     0,                          /* tp_as_sequence */
 | |
|     0,                          /* tp_as_mapping */
 | |
|     0,                          /* tp_hash */
 | |
| };
 | |
| \end{verbatim}
 | |
| 
 | |
| Now if you go and look up the definition of \ctype{PyTypeObject} in
 | |
| \file{object.h} you'll see that it has many, many more fields that the
 | |
| definition above.  The remaining fields will be filled with zeros by
 | |
| the C compiler, and it's common practice to not specify them
 | |
| explicitly unless you need them.  
 | |
| 
 | |
| This is so important that we're going to pick the top of it apart still
 | |
| further:
 | |
| 
 | |
| \begin{verbatim}
 | |
|     PyObject_HEAD_INIT(NULL)
 | |
| \end{verbatim}
 | |
| 
 | |
| This line is a bit of a wart; what we'd like to write is:
 | |
| 
 | |
| \begin{verbatim}
 | |
|     PyObject_HEAD_INIT(&PyType_Type)
 | |
| \end{verbatim}
 | |
| 
 | |
| as the type of a type object is ``type'', but this isn't strictly
 | |
| conforming C and some compilers complain.  So instead we fill in the
 | |
| \member{ob_type} field of \cdata{noddy_NoddyType} at the earliest
 | |
| oppourtunity --- in \cfunction{initnoddy()}.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     0,                          /* ob_size */
 | |
| \end{verbatim}
 | |
| 
 | |
| The \member{ob_size} field of the header is not used; it's presence in
 | |
| the type structure is a historical artifact that is maintained for
 | |
| binary compatibility with extension modules compiled for older
 | |
| versions of Python.  Always set this field to zero.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     "Noddy",                    /* tp_name */
 | |
| \end{verbatim}
 | |
| 
 | |
| The name of our type.  This will appear in the default textual
 | |
| representation of our objects and in some error messages, for example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| >>> "" + noddy.new_noddy()
 | |
| Traceback (most recent call last):
 | |
|   File "<stdin>", line 1, in ?
 | |
| TypeError: cannot add type "Noddy" to string
 | |
| \end{verbatim}
 | |
| 
 | |
| \begin{verbatim}
 | |
|     sizeof(noddy_NoddyObject),  /* tp_basicsize */
 | |
| \end{verbatim}
 | |
| 
 | |
| This is so that Python knows how much memory to allocate when you call
 | |
| \cfunction{PyObject_New}.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     0,                          /* tp_itemsize */
 | |
| \end{verbatim}
 | |
| 
 | |
| This has to do with variable length objects like lists and strings.
 | |
| Ignore this for now.
 | |
| 
 | |
| Now we get into the type methods, the things that make your objects
 | |
| different from the others.  Of course, the Noddy object doesn't
 | |
| implement many of these, but as mentioned above you have to implement
 | |
| the deallocation function.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     noddy_noddy_dealloc,        /* tp_dealloc */
 | |
| \end{verbatim}
 | |
| 
 | |
| From here, all the type methods are \NULL, so we'll go over them later
 | |
| --- that's for the next section!
 | |
| 
 | |
| Everything else in the file should be familiar, except for this line
 | |
| in \cfunction{initnoddy}:
 | |
| 
 | |
| \begin{verbatim}
 | |
|     noddy_NoddyType.ob_type = &PyType_Type;
 | |
| \end{verbatim}
 | |
| 
 | |
| This was alluded to above --- the \cdata{noddy_NoddyType} object should
 | |
| have type ``type'', but \code{\&PyType_Type} is not constant and so
 | |
| can't be used in its initializer.  To work around this, we patch it up
 | |
| in the module initialization.
 | |
| 
 | |
| That's it!  All that remains is to build it; put the above code in a
 | |
| file called \file{noddymodule.c} and
 | |
| 
 | |
| \begin{verbatim}
 | |
| from distutils.core import setup, Extension
 | |
| setup(name="noddy", version="1.0",
 | |
|       ext_modules=[Extension("noddy", ["noddymodule.c"])])
 | |
| \end{verbatim}
 | |
| 
 | |
| in a file called \file{setup.py}; then typing
 | |
| 
 | |
| \begin{verbatim}
 | |
| $ python setup.py build
 | |
| \end{verbatim} %$ <-- bow to font-lock  ;-(
 | |
| 
 | |
| at a shell should produce a file \file{noddy.so} in a subdirectory;
 | |
| move to that directory and fire up Python --- you should be able to
 | |
| \code{import noddy} and play around with Noddy objects.
 | |
| 
 | |
| That wasn't so hard, was it?
 | |
| 
 | |
| 
 | |
| \section{Type Methods
 | |
|          \label{dnt-type-methods}}
 | |
| 
 | |
| This section aims to give a quick fly-by on the various type methods
 | |
| you can implement and what they do.
 | |
| 
 | |
| Here is the definition of \ctype{PyTypeObject}, with some fields only
 | |
| used in debug builds omitted:
 | |
| 
 | |
| \verbatiminput{typestruct.h}
 | |
| 
 | |
| Now that's a \emph{lot} of methods.  Don't worry too much though - if
 | |
| you have a type you want to define, the chances are very good that you
 | |
| will only implement a handful of these.
 | |
| 
 | |
| As you probably expect by now, we're going to go over this and give
 | |
| more information about the various handlers.  We won't go in the order
 | |
| they are defined in the structure, because there is a lot of
 | |
| historical baggage that impacts the ordering of the fields; be sure
 | |
| your type initializaion keeps the fields in the right order!  It's
 | |
| often easiest to find an example that includes all the fields you need
 | |
| (even if they're initialized to \code{0}) and then change the values
 | |
| to suit your new type.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     char *tp_name; /* For printing */
 | |
| \end{verbatim}
 | |
| 
 | |
| The name of the type - as mentioned in the last section, this will
 | |
| appear in various places, almost entirely for diagnostic purposes.
 | |
| Try to choose something that will be helpful in such a situation!
 | |
| 
 | |
| \begin{verbatim}
 | |
|     int tp_basicsize, tp_itemsize; /* For allocation */
 | |
| \end{verbatim}
 | |
| 
 | |
| These fields tell the runtime how much memory to allocate when new
 | |
| objects of this typed are created.  Python has some builtin support
 | |
| for variable length structures (think: strings, lists) which is where
 | |
| the \member{tp_itemsize} field comes in.  This will be dealt with
 | |
| later.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     char *tp_doc;
 | |
| \end{verbatim}
 | |
| 
 | |
| Here you can put a string (or its address) that you want returned when
 | |
| the Python script references \code{obj.__doc__} to retrieve the
 | |
| docstring.
 | |
|    
 | |
| Now we come to the basic type methods---the ones most extension types
 | |
| will implement.
 | |
| 
 | |
| 
 | |
| \subsection{Finalization and De-allocation}
 | |
| 
 | |
| \index{object!deallocation}
 | |
| \index{deallocation, object}
 | |
| \index{object!finalization}
 | |
| \index{finalization, of objects}
 | |
| 
 | |
| \begin{verbatim}
 | |
|     destructor tp_dealloc;
 | |
| \end{verbatim}
 | |
| 
 | |
| This function is called when the reference count of the instance of
 | |
| your type is reduced to zero and the Python interpreter wants to
 | |
| reclaim it.  If your type has memory to free or other clean-up to
 | |
| perform, put it here.  The object itself needs to be freed here as
 | |
| well.  Here is an example of this function:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static void
 | |
| newdatatype_dealloc(newdatatypeobject * obj)
 | |
| {
 | |
|     free(obj->obj_UnderlyingDatatypePtr);
 | |
|     PyObject_DEL(obj);
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| One important requirement of the deallocator function is that it
 | |
| leaves any pending exceptions alone.  This is important since
 | |
| deallocators are frequently called as the interpreter unwinds the
 | |
| Python stack; when the stack is unwound due to an exception (rather
 | |
| than normal returns), nothing is done to protect the deallocators from
 | |
| seeing that an exception has already been set.  Any actions which a
 | |
| deallocator performs which may cause additional Python code to be
 | |
| executed may detect that an exception has been set.  This can lead to
 | |
| misleading errors from the interpreter.  The proper way to protect
 | |
| against this is to save a pending exception before performing the
 | |
| unsafe action, and restoring it when done.  This can be done using the
 | |
| \cfunction{PyErr_Fetch()}\ttindex{PyErr_Fetch()} and
 | |
| \cfunction{PyErr_Restore()}\ttindex{PyErr_Restore()} functions:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static void
 | |
| my_dealloc(PyObject *obj)
 | |
| {
 | |
|     MyObject *self = (MyObject *) obj;
 | |
|     PyObject *cbresult;
 | |
| 
 | |
|     if (self->my_callback != NULL) {
 | |
|         PyObject *err_type, *err_value, *err_traceback;
 | |
|         int have_error = PyErr_Occurred() ? 1 : 0;
 | |
| 
 | |
|         if (have_error)
 | |
|             PyErr_Fetch(&err_type, &err_value, &err_traceback);
 | |
| 
 | |
|         cbresult = PyObject_CallObject(self->my_callback, NULL);
 | |
|         if (cbresult == NULL)
 | |
|             PyErr_WriteUnraisable();
 | |
|         else
 | |
|             Py_DECREF(cbresult);
 | |
| 
 | |
|         if (have_error)
 | |
|             PyErr_Restore(err_type, err_value, err_traceback);
 | |
| 
 | |
|         Py_DECREF(self->my_callback);
 | |
|     }
 | |
|     PyObject_DEL(obj);
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| 
 | |
| \subsection{Object Presentation}
 | |
| 
 | |
| In Python, there are three ways to generate a textual representation
 | |
| of an object: the \function{repr()}\bifuncindex{repr} function (or
 | |
| equivalent backtick syntax), the \function{str()}\bifuncindex{str}
 | |
| function, and the \keyword{print} statement.  For most objects, the
 | |
| \keyword{print} statement is equivalent to the \function{str()}
 | |
| function, but it is possible to special-case printing to a
 | |
| \ctype{FILE*} if necessary; this should only be done if efficiency is
 | |
| identified as a problem and profiling suggests that creating a
 | |
| temporary string object to be written to a file is too expensive.
 | |
| 
 | |
| These handlers are all optional, and most types at most need to
 | |
| implement the \member{tp_str} and \member{tp_repr} handlers.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     reprfunc tp_repr;
 | |
|     reprfunc tp_str;
 | |
|     printfunc tp_print;
 | |
| \end{verbatim}
 | |
| 
 | |
| The \member{tp_repr} handler should return a string object containing
 | |
| a representation of the instance for which it is called.  Here is a
 | |
| simple example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static PyObject *
 | |
| newdatatype_repr(newdatatypeobject * obj)
 | |
| {
 | |
|     return PyString_FromFormat("Repr-ified_newdatatype{{size:\%d}}",
 | |
|                                obj->obj_UnderlyingDatatypePtr->size);
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| If no \member{tp_repr} handler is specified, the interpreter will
 | |
| supply a representation that uses the type's \member{tp_name} and a
 | |
| uniquely-identifying value for the object.
 | |
| 
 | |
| The \member{tp_str} handler is to \function{str()} what the
 | |
| \member{tp_repr} handler described above is to \function{repr()}; that
 | |
| is, it is called when Python code calls \function{str()} on an
 | |
| instance of your object.  It's implementation is very similar to the
 | |
| \member{tp_repr} function, but the resulting string is intended for
 | |
| human consumption.  If \member{tp_str} is not specified, the
 | |
| \member{tp_repr} handler is used instead.
 | |
| 
 | |
| Here is a simple example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static PyObject *
 | |
| newdatatype_str(newdatatypeobject * obj)
 | |
| {
 | |
|     return PyString_FromFormat("Stringified_newdatatype{{size:\%d}}",
 | |
|                                obj->obj_UnderlyingDatatypePtr->size);
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| The print function will be called whenever Python needs to "print" an
 | |
| instance of the type.  For example, if 'node' is an instance of type
 | |
| TreeNode, then the print function is called when Python code calls:
 | |
| 
 | |
| \begin{verbatim}
 | |
| print node
 | |
| \end{verbatim}
 | |
| 
 | |
| There is a flags argument and one flag, \constant{Py_PRINT_RAW}, and
 | |
| it suggests that you print without string quotes and possibly without
 | |
| interpreting escape sequences.
 | |
| 
 | |
| The print function receives a file object as an argument. You will
 | |
| likely want to write to that file object.
 | |
| 
 | |
| Here is a sampe print function:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static int
 | |
| newdatatype_print(newdatatypeobject *obj, FILE *fp, int flags)
 | |
| {
 | |
|     if (flags & Py_PRINT_RAW) {
 | |
|         fprintf(fp, "<{newdatatype object--size: %d}>",
 | |
|                 obj->obj_UnderlyingDatatypePtr->size);
 | |
|     }
 | |
|     else {
 | |
|         fprintf(fp, "\"<{newdatatype object--size: %d}>\"",
 | |
|                 obj->obj_UnderlyingDatatypePtr->size);
 | |
|     }
 | |
|     return 0;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| 
 | |
| \subsection{Attribute Management}
 | |
| 
 | |
| For every object which can support attributes, the corresponding type
 | |
| must provide the functions that control how the attributes are
 | |
| resolved.  There needs to be a function which can retrieve attributes
 | |
| (if any are defined), and another to set attributes (if setting
 | |
| attributes is allowed).  Removing an attribute is a special case, for
 | |
| which the new value passed to the handler is \NULL.
 | |
| 
 | |
| Python supports two pairs of attribute handlers; a type that supports
 | |
| attributes only needs to implement the functions for one pair.  The
 | |
| difference is that one pair takes the name of the attribute as a
 | |
| \ctype{char*}, while the other accepts a \ctype{PyObject*}.  Each type
 | |
| can use whichever pair makes more sense for the implementation's
 | |
| convenience.
 | |
| 
 | |
| \begin{verbatim}
 | |
|     getattrfunc  tp_getattr;        /* char * version */
 | |
|     setattrfunc  tp_setattr;
 | |
|     /* ... */
 | |
|     getattrofunc tp_getattrofunc;   /* PyObject * version */
 | |
|     setattrofunc tp_setattrofunc;
 | |
| \end{verbatim}
 | |
| 
 | |
| If accessing attributes of an object is always a simple operation
 | |
| (this will be explained shortly), there are generic implementations
 | |
| which can be used to provide the \ctype{PyObject*} version of the
 | |
| attribute management functions.  The actual need for type-specific
 | |
| attribute handlers almost completely disappeared starting with Python
 | |
| 2.2, though there are many examples which have not been updated to use
 | |
| some of the new generic mechanism that is available.
 | |
| 
 | |
| 
 | |
| \subsubsection{Generic Attribute Management}
 | |
| 
 | |
| \versionadded{2.2}
 | |
| 
 | |
| Most extension types only use \emph{simple} attributes.  So, what
 | |
| makes the attributes simple?  There are only a couple of conditions
 | |
| that must be met:
 | |
| 
 | |
| \begin{enumerate}
 | |
|   \item   The name of the attributes must be known when
 | |
|           \cfunction{PyType_Ready()} is called.
 | |
| 
 | |
|   \item   No special processing is need to record that an attribute
 | |
|           was looked up or set, nor do actions need to be taken based
 | |
|           on the value.
 | |
| \end{enumerate}
 | |
| 
 | |
| Note that this list does not place any restrictions on the values of
 | |
| the attributes, when the values are computed, or how relevant data is
 | |
| stored.
 | |
| 
 | |
| When \cfunction{PyType_Ready()} is called, it uses three tables
 | |
| referenced by the type object to create \emph{descriptors} which are
 | |
| placed in the dictionary of the type object.  Each descriptor controls
 | |
| access to one attribute of the instance object.  Each of the tables is
 | |
| optional; if all three are \NULL, instances of the type will only have
 | |
| attributes that are inherited from their base type, and should leave
 | |
| the \member{tp_getattro} and \member{tp_setattro} fields \NULL{} as
 | |
| well, allowing the base type to handle attributes.
 | |
| 
 | |
| The tables are declared as three fields of the type object:
 | |
| 
 | |
| \begin{verbatim}
 | |
|     struct PyMethodDef *tp_methods;
 | |
|     struct PyMemberDef *tp_members;
 | |
|     struct PyGetSetDef *tp_getset;
 | |
| \end{verbatim}
 | |
| 
 | |
| If \member{tp_methods} is not \NULL, it must refer to an array of
 | |
| \ctype{PyMethodDef} structures.  Each entry in the table is an
 | |
| instance of this structure:
 | |
| 
 | |
| \begin{verbatim}
 | |
| typedef struct PyMethodDef {
 | |
|     char        *ml_name;       /* method name */
 | |
|     PyCFunction  ml_meth;       /* implementation function */
 | |
|     int	         ml_flags;      /* flags */
 | |
|     char        *ml_doc;        /* docstring */
 | |
| } PyMethodDef;
 | |
| \end{verbatim}
 | |
| 
 | |
| One entry should be defined for each method provided by the type; no
 | |
| entries are needed for methods inherited from a base type.  One
 | |
| additional entry is needed at the end; it is a sentinel that marks the
 | |
| end of the array.  The \member{ml_name} field of the sentinel must be
 | |
| \NULL.
 | |
| 
 | |
| XXX Need to refer to some unified discussion of the structure fields,
 | |
| shared with the next section.
 | |
| 
 | |
| The second table is used to define attributes which map directly to
 | |
| data stored in the instance.  A variety of primitive C types are
 | |
| supported, and access may be read-only or read-write.  The structures
 | |
| in the table are defined as:
 | |
| 
 | |
| \begin{verbatim}
 | |
| typedef struct PyMemberDef {
 | |
|     char *name;
 | |
|     int   type;
 | |
|     int   offset;
 | |
|     int   flags;
 | |
|     char *doc;
 | |
| } PyMemberDef;
 | |
| \end{verbatim}
 | |
| 
 | |
| For each entry in the table, a descriptor will be constructed and
 | |
| added to the type which will be able to extract a value from the
 | |
| instance structure.  The \member{type} field should contain one of the
 | |
| type codes defined in the \file{structmember.h} header; the value will
 | |
| be used to determine how to convert Python values to and from C
 | |
| values.  The \member{flags} field is used to store flags which control
 | |
| how the attribute can be accessed.
 | |
| 
 | |
| XXX Need to move some of this to a shared section!
 | |
| 
 | |
| The following flag constants are defined in \file{structmember.h};
 | |
| they may be combined using bitwise-OR.
 | |
| 
 | |
| \begin{tableii}{l|l}{constant}{Constant}{Meaning}
 | |
|   \lineii{READONLY \ttindex{READONLY}}
 | |
|          {Never writable.}
 | |
|   \lineii{RO \ttindex{RO}}
 | |
|          {Shorthand for \constant{READONLY}.}
 | |
|   \lineii{READ_RESTRICTED \ttindex{READ_RESTRICTED}}
 | |
|          {Not readable in restricted mode.}
 | |
|   \lineii{WRITE_RESTRICTED \ttindex{WRITE_RESTRICTED}}
 | |
|          {Not writable in restricted mode.}
 | |
|   \lineii{RESTRICTED \ttindex{RESTRICTED}}
 | |
|          {Not readable or writable in restricted mode.}
 | |
| \end{tableii}
 | |
| 
 | |
| An interesting advantage of using the \member{tp_members} table to
 | |
| build descriptors that are used at runtime is that any attribute
 | |
| defined this way can have an associated docstring simply by providing
 | |
| the text in the table.  An application can use the introspection API
 | |
| to retrieve the descriptor from the class object, and get the
 | |
| docstring using its \member{__doc__} attribute.
 | |
| 
 | |
| As with the \member{tp_methods} table, a sentinel entry with a
 | |
| \member{name} value of \NULL{} is required.  
 | |
| 
 | |
| 
 | |
| % XXX Descriptors need to be explained in more detail somewhere, but
 | |
| % not here.
 | |
| %
 | |
| % Descriptor objects have two handler functions which correspond to
 | |
| % the \member{tp_getattro} and \member{tp_setattro} handlers.  The
 | |
| % \method{__get__()} handler is a function which is passed the
 | |
| % descriptor, instance, and type objects, and returns the value of the
 | |
| % attribute, or it returns \NULL{} and sets an exception.  The
 | |
| % \method{__set__()} handler is passed the descriptor, instance, type,
 | |
| % and new value;
 | |
| 
 | |
| 
 | |
| \subsubsection{Type-specific Attribute Management}
 | |
| 
 | |
| For simplicity, only the \ctype{char*} version will be demonstrated
 | |
| here; the type of the name parameter is the only difference between
 | |
| the \ctype{char*} and \ctype{PyObject*} flavors of the interface.
 | |
| This example effectively does the same thing as the generic example
 | |
| above, but does not use the generic support added in Python 2.2.  The
 | |
| value in showing this is two-fold: it demonstrates how basic attribute
 | |
| management can be done in a way that is portable to older versions of
 | |
| Python, and explains how the handler functions are called, so that if
 | |
| you do need to extend their functionality, you'll understand what
 | |
| needs to be done.
 | |
| 
 | |
| The \member{tp_getattr} handler is called when the object requires an
 | |
| attribute look-up.  It is called in the same situations where the
 | |
| \method{__getattr__()} method of a class would be called.
 | |
| 
 | |
| A likely way to handle this is (1) to implement a set of functions
 | |
| (such as \cfunction{newdatatype_getSize()} and
 | |
| \cfunction{newdatatype_setSize()} in the example below), (2) provide a
 | |
| method table listing these functions, and (3) provide a getattr
 | |
| function that returns the result of a lookup in that table.  The
 | |
| method table uses the same structure as the \member{tp_methods} field
 | |
| of the type object.
 | |
| 
 | |
| Here is an example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static PyMethodDef newdatatype_methods[] = {
 | |
|     {"getSize", (PyCFunction)newdatatype_getSize, METH_VARARGS,
 | |
|      "Return the current size."},
 | |
|     {"setSize", (PyCFunction)newdatatype_setSize, METH_VARARGS,
 | |
|      "Set the size."},
 | |
|     {NULL, NULL, 0, NULL}           /* sentinel */
 | |
| };
 | |
| 
 | |
| static PyObject *
 | |
| newdatatype_getattr(newdatatypeobject *obj, char *name)
 | |
| {
 | |
|     return Py_FindMethod(newdatatype_methods, (PyObject *)obj, name);
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| The \member{tp_setattr} handler is called when the
 | |
| \method{__setattr__()} or \method{__delattr__()} method of a class
 | |
| instance would be called.  When an attribute should be deleted, the
 | |
| third parameter will be \NULL.  Here is an example that simply raises
 | |
| an exception; if this were really all you wanted, the
 | |
| \member{tp_setattr} handler should be set to \NULL.
 | |
|    
 | |
| \begin{verbatim}
 | |
| static int
 | |
| newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v)
 | |
| {
 | |
|     (void)PyErr_Format(PyExc_RuntimeError, "Read-only attribute: \%s", name);
 | |
|     return -1;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| 
 | |
| \subsection{Object Comparison}
 | |
| 
 | |
| \begin{verbatim}
 | |
|     cmpfunc tp_compare;
 | |
| \end{verbatim}
 | |
| 
 | |
| The \member{tp_compare} handler is called when comparisons are needed
 | |
| are the object does not implement the specific rich comparison method
 | |
| which matches the requested comparison.  (It is always used if defined
 | |
| and the \cfunction{PyObject_Compare()} or \cfunction{PyObject_Cmp()}
 | |
| functions are used, or if \function{cmp()} is used from Python.)
 | |
| It is analogous to the \method{__cmp__()} method.  This function
 | |
| should return \code{-1} if \var{obj1} is less than
 | |
| \var{obj2}, \code{0} if they are equal, and \code{1} if
 | |
| \var{obj1} is greater than
 | |
| \var{obj2}.
 | |
| (It was previously allowed to return arbitrary negative or positive
 | |
| integers for less than and greater than, respectively; as of Python
 | |
| 2.2, this is no longer allowed.  In the future, other return values
 | |
| may be assigned a different meaning.)
 | |
| 
 | |
| A \member{tp_compare} handler may raise an exception.  In this case it
 | |
| should return a negative value.  The caller has to test for the
 | |
| exception using \cfunction{PyErr_Occurred()}.
 | |
| 
 | |
| 
 | |
| Here is a sample implementation:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static int
 | |
| newdatatype_compare(newdatatypeobject * obj1, newdatatypeobject * obj2)
 | |
| {
 | |
|     long result;
 | |
|  
 | |
|     if (obj1->obj_UnderlyingDatatypePtr->size <
 | |
|         obj2->obj_UnderlyingDatatypePtr->size) {
 | |
|         result = -1;
 | |
|     }
 | |
|     else if (obj1->obj_UnderlyingDatatypePtr->size >
 | |
|              obj2->obj_UnderlyingDatatypePtr->size) {
 | |
|         result = 1;
 | |
|     }
 | |
|     else {
 | |
|         result = 0;
 | |
|     }
 | |
|     return result;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| 
 | |
| \subsection{Abstract Protocol Support}
 | |
| 
 | |
| Python supports a variety of \emph{abstract} `protocols;' the specific
 | |
| interfaces provided to use these interfaces are documented in the
 | |
| \citetitle[../api/api.html]{Python/C API Reference Manual} in the
 | |
| chapter ``\ulink{Abstract Objects Layer}{../api/abstract.html}.''
 | |
| 
 | |
| A number of these abstract interfaces were defined early in the
 | |
| development of the Python implementation.  In particular, the number,
 | |
| mapping, and sequence protocols have been part of Python since the
 | |
| beginning.  Other protocols have been added over time.  For protocols
 | |
| which depend on several handler routines from the type implementation,
 | |
| the older protocols have been defined as optional blocks of handlers
 | |
| referenced by the type object, while newer protocols have been added
 | |
| using additional slots in the main type object, with a flag bit being
 | |
| set to indicate that the slots are present.  (The flag bit does not
 | |
| indicate that the slot values are non-\NULL.)
 | |
| 
 | |
| \begin{verbatim}
 | |
|     PyNumberMethods   tp_as_number;
 | |
|     PySequenceMethods tp_as_sequence;
 | |
|     PyMappingMethods  tp_as_mapping;
 | |
| \end{verbatim}
 | |
| 
 | |
| If you wish your object to be able to act like a number, a sequence,
 | |
| or a mapping object, then you place the address of a structure that
 | |
| implements the C type \ctype{PyNumberMethods},
 | |
| \ctype{PySequenceMethods}, or \ctype{PyMappingMethods}, respectively.
 | |
| It is up to you to fill in this structure with appropriate values. You
 | |
| can find examples of the use of each of these in the \file{Objects}
 | |
| directory of the Python source distribution.
 | |
| 
 | |
| 
 | |
| \begin{verbatim}
 | |
|     hashfunc tp_hash;
 | |
| \end{verbatim}
 | |
| 
 | |
| This function, if you choose to provide it, should return a hash
 | |
| number for an instance of your datatype. Here is a moderately
 | |
| pointless example:
 | |
| 
 | |
| \begin{verbatim}
 | |
| static long
 | |
| newdatatype_hash(newdatatypeobject *obj)
 | |
| {
 | |
|     long result;
 | |
|     result = obj->obj_UnderlyingDatatypePtr->size;
 | |
|     result = result * 3;
 | |
|     return result;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| \begin{verbatim}
 | |
|     ternaryfunc tp_call;
 | |
| \end{verbatim}
 | |
| 
 | |
| This function is called when an instance of your datatype is "called",
 | |
| for example, if \code{obj1} is an instance of your datatype and the Python
 | |
| script contains \code{obj1('hello')}, the \member{tp_call} handler is
 | |
| invoked.
 | |
| 
 | |
| This function takes three arguments:
 | |
| 
 | |
| \begin{enumerate}
 | |
|   \item
 | |
|     \var{arg1} is the instance of the datatype which is the subject of
 | |
|     the call. If the call is \code{obj1('hello')}, then \var{arg1} is
 | |
|     \code{obj1}.
 | |
| 
 | |
|   \item
 | |
|     \var{arg2} is a tuple containing the arguments to the call.  You
 | |
|     can use \cfunction{PyArg_ParseTuple()} to extract the arguments.
 | |
| 
 | |
|   \item
 | |
|     \var{arg3} is a dictionary of keyword arguments that were passed.
 | |
|     If this is non-\NULL{} and you support keyword arguments, use
 | |
|     \cfunction{PyArg_ParseTupleAndKeywords()} to extract the
 | |
|     arguments.  If you do not want to support keyword arguments and
 | |
|     this is non-\NULL, raise a \exception{TypeError} with a message
 | |
|     saying that keyword arguments are not supported.
 | |
| \end{enumerate}
 | |
|        
 | |
| Here is a desultory example of the implementation of the call function.
 | |
| 
 | |
| \begin{verbatim}
 | |
| /* Implement the call function.
 | |
|  *    obj1 is the instance receiving the call.
 | |
|  *    obj2 is a tuple containing the arguments to the call, in this
 | |
|  *         case 3 strings.
 | |
|  */
 | |
| static PyObject *
 | |
| newdatatype_call(newdatatypeobject *obj, PyObject *args, PyObject *other)
 | |
| {
 | |
|     PyObject *result;
 | |
|     char *arg1;
 | |
|     char *arg2;
 | |
|     char *arg3;
 | |
| 
 | |
|     if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) {
 | |
|         return NULL;
 | |
|     }
 | |
|     result = PyString_FromFormat(
 | |
|         "Returning -- value: [\%d] arg1: [\%s] arg2: [\%s] arg3: [\%s]\n",
 | |
|         obj->obj_UnderlyingDatatypePtr->size,
 | |
|         arg1, arg2, arg3);
 | |
|     printf("\%s", PyString_AS_STRING(result));
 | |
|     return result;
 | |
| }
 | |
| \end{verbatim}
 | |
| 
 | |
| XXX some fields need to be added here...
 | |
| 
 | |
| 
 | |
| \begin{verbatim}
 | |
|     /* Added in release 2.2 */
 | |
|     /* Iterators */
 | |
|     getiterfunc tp_iter;
 | |
|     iternextfunc tp_iternext;
 | |
| \end{verbatim}
 | |
| 
 | |
| These functions provide support for the iterator protocol.  Any object
 | |
| which wishes to support iteration over it's contents (which may be
 | |
| generated during iteration) must implement the \code{tp_iter}
 | |
| handler.  Objects which are returned by a \code{tp_iter} handler must
 | |
| implement both the \code{tp_iter} and \code{tp_iternext} handlers.
 | |
| Both handlers take exactly one parameter, the instance for which they
 | |
| are being called, and return a new reference.  In the case of an
 | |
| error, they should set an exception and return \NULL.
 | |
| 
 | |
| For an object which represents an iterable collection, the
 | |
| \code{tp_iter} handler must return an iterator object.  The iterator
 | |
| object is responsible for maintaining the state of the iteration.  For
 | |
| collections which can support multiple iterators which do not
 | |
| interfere with each other (as lists and tuples do), a new iterator
 | |
| should be created and returned.  Objects which can only be iterated
 | |
| over once (usually due to side effects of iteration) should implement
 | |
| this handler by returning a new reference to themselves, and should
 | |
| also implement the \code{tp_iternext} handler.  File objects are an
 | |
| example of such an iterator.
 | |
| 
 | |
| Iterator objects should implement both handlers.  The \code{tp_iter}
 | |
| handler should return a new reference to the iterator (this is the
 | |
| same as the \code{tp_iter} handler for objects which can only be
 | |
| iterated over destructively).  The \code{tp_iternext} handler should
 | |
| return a new reference to the next object in the iteration if there is
 | |
| one.  If the iteration has reached the end, it may return \NULL{}
 | |
| without setting an exception or it may set \exception{StopIteration};
 | |
| avoiding the exception can yield slightly better performance.  If an
 | |
| actual error occurs, it should set an exception and return \NULL.
 | |
| 
 | |
| 
 | |
| \subsection{Supporting the Cycle Collector
 | |
|             \label{example-cycle-support}}
 | |
| 
 | |
| This example shows only enough of the implementation of an extension
 | |
| type to show how the garbage collector support needs to be added.  It
 | |
| shows the definition of the object structure, the
 | |
| \member{tp_traverse}, \member{tp_clear} and \member{tp_dealloc}
 | |
| implementations, the type structure, and a constructor --- the module
 | |
| initialization needed to export the constructor to Python is not shown
 | |
| as there are no special considerations there for the collector.  To
 | |
| make this interesting, assume that the module exposes ways for the
 | |
| \member{container} field of the object to be modified.  Note that
 | |
| since no checks are made on the type of the object used to initialize
 | |
| \member{container}, we have to assume that it may be a container.
 | |
| 
 | |
| \verbatiminput{cycle-gc.c}
 | |
| 
 | |
| Full details on the APIs related to the cycle detector are in
 | |
| \ulink{Supporting Cyclic Garbarge
 | |
| Collection}{../api/supporting-cycle-detection.html} in the
 | |
| \citetitle[../api/api.html]{Python/C API Reference Manual}.
 | |
| 
 | |
| 
 | |
| \subsection{More Suggestions}
 | |
| 
 | |
| Remember that you can omit most of these functions, in which case you
 | |
| provide \code{0} as a value.
 | |
| 
 | |
| In the \file{Objects} directory of the Python source distribution,
 | |
| there is a file \file{xxobject.c}, which is intended to be used as a
 | |
| template for the implementation of new types.  One useful strategy
 | |
| for implementing a new type is to copy and rename this file, then
 | |
| read the instructions at the top of it.
 | |
| 
 | |
| There are type definitions for each of the functions you must
 | |
| provide.  They are in \file{object.h} in the Python include
 | |
| directory that comes with the source distribution of Python.
 | |
| 
 | |
| In order to learn how to implement any specific method for your new
 | |
| datatype, do the following: Download and unpack the Python source
 | |
| distribution.  Go the the \file{Objects} directory, then search the
 | |
| C source files for \code{tp_} plus the function you want (for
 | |
| example, \code{tp_print} or \code{tp_compare}).  You will find
 | |
| examples of the function you want to implement.
 | |
| 
 | |
| When you need to verify that the type of an object is indeed the
 | |
| object you are implementing and if you use xxobject.c as an starting
 | |
| template for your implementation, then there is a macro defined for
 | |
| this purpose. The macro definition will look something like this:
 | |
| 
 | |
| \begin{verbatim}
 | |
| #define is_newdatatypeobject(v)  ((v)->ob_type == &Newdatatypetype)
 | |
| \end{verbatim}
 | |
| 
 | |
| And, a sample of its use might be something like the following:
 | |
| 
 | |
| \begin{verbatim}
 | |
|     if (!is_newdatatypeobject(objp1) {
 | |
|         PyErr_SetString(PyExc_TypeError, "arg #1 not a newdatatype");
 | |
|         return NULL;
 | |
|     }
 | |
| \end{verbatim}
 | 
