mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 13:41:24 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			1236 lines
		
	
	
	
		
			58 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			1236 lines
		
	
	
	
		
			58 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| \documentstyle[11pt]{article}
 | |
| \newcommand{\Cpp}{C\protect\raisebox{.18ex}{++}}
 | |
| 
 | |
| \title{
 | |
| Interactively Testing Remote Servers Using the Python Programming Language
 | |
| }
 | |
| 
 | |
| \author{
 | |
| 	Guido van Rossum \\
 | |
| 	Dept. AA, CWI, P.O. Box 94079 \\
 | |
| 	1090 GB Amsterdam, The Netherlands \\
 | |
| 	E-mail: {\tt guido@cwi.nl}
 | |
| \and
 | |
| 	Jelke de Boer \\
 | |
| 	HIO Enschede; P.O.Box 1326 \\
 | |
| 	7500 BH  Enschede, The Netherlands
 | |
| }
 | |
| 
 | |
| \begin{document}
 | |
| 
 | |
| \maketitle
 | |
| 
 | |
| \begin{abstract}
 | |
| This paper describes how two tools that were developed quite
 | |
| independently gained in power by a well-designed connection between
 | |
| them.  The tools are Python, an interpreted prototyping language, and
 | |
| AIL, a Remote Procedure Call stub generator.  The context is Amoeba, a
 | |
| well-known distributed operating system developed jointly by the Free
 | |
| University and CWI in Amsterdam.
 | |
| 
 | |
| As a consequence of their integration, both tools have profited:
 | |
| Python gained usability when used with Amoeba --- for which it was not
 | |
| specifically developed --- and AIL users now have a powerful
 | |
| interactive tool to test servers and to experiment with new
 | |
| client/server interfaces.%
 | |
| \footnote{
 | |
| An earlier version of this paper was presented at the Spring 1991
 | |
| EurOpen Conference in Troms{\o} under the title ``Linking a Stub
 | |
| Generator (AIL) to a Prototyping Language (Python).''
 | |
| }
 | |
| \end{abstract}
 | |
| 
 | |
| \section{Introduction}
 | |
| 
 | |
| Remote Procedure Call (RPC) interfaces, used in distributed systems
 | |
| like Amoeba
 | |
| \cite{Amoeba:IEEE,Amoeba:CACM},
 | |
| have a much more concrete character than local procedure call
 | |
| interfaces in traditional systems.  Because clients and servers may
 | |
| run on different machines, with possibly different word size, byte
 | |
| order, etc., much care is needed to describe interfaces exactly and to
 | |
| implement them in such a way that they continue to work when a client
 | |
| or server is moved to a different machine.  Since machines may fail
 | |
| independently, error handling must also be treated more carefully.
 | |
| 
 | |
| A common approach to such problems is to use a {\em stub generator}.
 | |
| This is a program that takes an interface description and transforms
 | |
| it into functions that must be compiled and linked with client and
 | |
| server applications.  These functions are called by the application
 | |
| code to take care of details of interfacing to the system's RPC layer,
 | |
| to implement transformations between data representations of different
 | |
| machines, to check for errors, etc.  They are called `stubs' because
 | |
| they don't actually perform the action that they are called for but
 | |
| only relay the parameters to the server
 | |
| \cite{RPC}.
 | |
| 
 | |
| Amoeba's stub generator is called AIL, which stands for Amoeba
 | |
| Interface Language
 | |
| \cite{AIL}.
 | |
| The first version of AIL generated only C functions, but an explicit
 | |
| goal of AIL's design was {\em retargetability}: it should be possible
 | |
| to add back-ends that generate stubs for different languages from the
 | |
| same interface descriptions.  Moreover, the stubs generated by
 | |
| different back-ends must be {\em interoperable}: a client written in
 | |
| Modula-3, say, should be able to use a server written in C, and vice
 | |
| versa.
 | |
| 
 | |
| This interoperability is the key to the success of the marriage
 | |
| between AIL and Python.  Python is a versatile interpreted language
 | |
| developed by the first author.  Originally intended as an alternative
 | |
| for the kind of odd jobs that are traditionally solved by a mixture of
 | |
| shell scripts, manually given shell commands, and an occasional ad hoc
 | |
| C program, Python has evolved into a general interactive prototyping
 | |
| language.  It has been applied to a wide range of problems, from
 | |
| replacements for large shell scripts to fancy graphics demos and
 | |
| multimedia applications.
 | |
| 
 | |
| One of Python's strengths is the ability for the user to type in some
 | |
| code and immediately run it: no compilation or linking is necessary.
 | |
| Interactive performance is further enhanced by Python's concise, clear
 | |
| syntax, its very-high-level data types, and its lack of declarations
 | |
| (which is compensated by run-time type checking).  All this makes
 | |
| programming in Python feel like a leisure trip compared to the hard
 | |
| work involved in writing and debugging even a smallish C program.
 | |
| 
 | |
| It should be clear by now that Python will be the ideal tool to test
 | |
| servers and their interfaces.  Especially during the development of a
 | |
| complex server, one often needs to generate test requests on an ad hoc
 | |
| basis, to answer questions like ``what happens if request X arrives
 | |
| when the server is in state Y,'' to test the behavior of the server
 | |
| with requests that touch its limitations, to check server responses to
 | |
| all sorts of wrong requests, etc.  Python's ability to immediately
 | |
| execute `improvised' code makes it a much better tool for this
 | |
| situation than C.
 | |
| 
 | |
| The link to AIL extends Python with the necessary functionality to
 | |
| connect to arbitrary servers, making the server testbed sketched above
 | |
| a reality.  Python's high-level data types, general programming
 | |
| features, and system interface ensure that it has all the power and
 | |
| flexibility needed for the job.
 | |
| 
 | |
| One could go even further than this.  Current distributed operating
 | |
| systems, based on client-server interaction, all lack a good command
 | |
| language or `shell' to give adequate access to available services.
 | |
| Python has considerable potential for becoming such a shell.
 | |
| 
 | |
| \subsection{Overview of this Paper}
 | |
| 
 | |
| The rest of this paper contains three major sections and a conclusion.
 | |
| First an overview of the Python programming language is given.  Next
 | |
| comes a short description of AIL, together with some relevant details
 | |
| about Amoeba.  Finally, the design and construction of the link
 | |
| between Python and AIL is described in much detail.  The conclusion
 | |
| looks back at the work and points out weaknesses and strengths of
 | |
| Python and AIL that were discovered in the process.
 | |
| 
 | |
| \section{An Overview of Python}
 | |
| 
 | |
| Python%
 | |
| \footnote{
 | |
| Named after the funny TV show, not the nasty reptile.
 | |
| }
 | |
| owes much to ABC
 | |
| \cite{ABC},
 | |
| a language developed at CWI as a programming language for non-expert
 | |
| computer users.  Python borrows freely from ABC's syntax and data
 | |
| types, but adds modules, exceptions and classes, extensibility, and
 | |
| the ability to call system functions.  The concepts of modules,
 | |
| exceptions and (to some extent) classes are influenced strongly by
 | |
| their occurrence in Modula-3
 | |
| \cite{Modula-3}.
 | |
| 
 | |
| Although Python resembles ABC in many ways, there is a a clear
 | |
| difference in application domain.  ABC is intended to be the only
 | |
| programming language for those who use a computer as a tool, but
 | |
| occasionally need to write a program.  For this reason, ABC is not
 | |
| just a programming language but also a programming environment, which
 | |
| comes with an integrated syntax-directed editor and some source
 | |
| manipulation commands.  Python, on the other hand, aims to be a tool
 | |
| for professional (system) programmers, for whom having a choice of
 | |
| languages with different feature sets makes it possible to choose `the
 | |
| right tool for the job.'  The features added to Python make it more
 | |
| useful than ABC in an environment where access to system functions
 | |
| (such as file and directory manipulations) are common.  They also
 | |
| support the building of larger systems and libraries.  The Python
 | |
| implementation offers little in the way of a programming environment,
 | |
| but is designed to integrate seamlessly with existing programming
 | |
| environments (e.g. UNIX and Emacs).
 | |
| 
 | |
| Perhaps the best introduction to Python is a short example.  The
 | |
| following is a complete Python program to list the contents of a UNIX
 | |
| directory.
 | |
| \begin{verbatim}
 | |
| import sys, posix
 | |
| 
 | |
| def ls(dirname):    # Print sorted directory contents
 | |
|     names = posix.listdir(dirname)
 | |
|     names.sort()
 | |
|     for name in names:
 | |
|         if name[0] != '.': print name
 | |
| 
 | |
| ls(sys.argv[1])
 | |
| \end{verbatim}
 | |
| The largest part of this program, in the middle starting with {\tt
 | |
| def}, is a function definition.  It defines a function named {\tt ls}
 | |
| with a single parameter called {\tt dirname}.  (Comments in Python
 | |
| start with `\#' and extend to the end of the line.)  The function body
 | |
| is indented: Python uses indentation for statement grouping instead of
 | |
| braces or begin/end keywords.  This is shorter to type and avoids
 | |
| frustrating mismatches between the perception of grouping by the user
 | |
| and the parser.  Python accepts one statement per line; long
 | |
| statements may be broken in pieces using the standard backslash
 | |
| convention.  If the body of a compound statement is a single, simple
 | |
| statement, it may be placed on the same line as the head.
 | |
| 
 | |
| The first statement of the function body calls the function {\tt
 | |
| listdir} defined in the module {\tt posix}.  This function returns a
 | |
| list of strings representing the contents of the directory name passed
 | |
| as a string argument, here the argument {\tt dirname}.  If {\tt
 | |
| dirname} were not a valid directory name, or perhaps not even a
 | |
| string, {\tt listdir} would raise an exception and the next statement
 | |
| would never be reached.  (Exceptions can be caught in Python; see
 | |
| later.)  Assuming {\tt listdir} returns normally, its result is
 | |
| assigned to the local variable {\tt names}.
 | |
| 
 | |
| The second statement calls the method {\tt sort} of the variable {\tt
 | |
| names}.  This method is defined for all lists in Python and does the
 | |
| obvious thing: the elements of the list are reordered according to
 | |
| their natural ordering relationship.  Since in our example the list
 | |
| contains strings, they are sorted in ascending \ASCII{} order.
 | |
| 
 | |
| The last two lines of the function contain a loop that prints all
 | |
| elements of the list whose first character isn't a period.  In each
 | |
| iteration, the {\tt for} statement assigns an element of the list to
 | |
| the local variable {\tt name}.  The {\tt print} statement is intended
 | |
| for simple-minded output; more elaborate formatting is possible with
 | |
| Python's string handling functions.
 | |
| 
 | |
| The other two parts of the program are easily explained.  The first
 | |
| line is an {\tt import} statement that tells the interpreter to import
 | |
| the modules {\tt sys} and {\tt posix}.  As it happens these are both
 | |
| built into the interpreter.  Importing a module (built-in or
 | |
| otherwise) only makes the module name available in the current scope;
 | |
| functions and data defined in the module are accessed through the dot
 | |
| notation as in {\tt posix.listdir}.  The scope rules of Python are
 | |
| such that the imported module name {\tt posix} is also available in
 | |
| the function {\tt ls} (this will be discussed in more detail later).
 | |
| 
 | |
| Finally, the last line of the program calls the {\tt ls} function with
 | |
| a definite argument.  It must be last since Python objects must be
 | |
| defined before they can be used; in particular, the function {\tt ls}
 | |
| must be defined before it can be called.  The argument to {\tt ls} is
 | |
| {\tt sys.argv[1]}, which happens to be the Python equivalent of {\tt
 | |
| \$1} in a shell script or {\tt argv[1]} in a C program's {\tt main}
 | |
| function.
 | |
| 
 | |
| \subsection{Python Data Types}
 | |
| 
 | |
| (This and the following subsections describe Python in quite a lot of
 | |
| detail.  If you are more interested in AIL, Amoeba and how they are
 | |
| linked with Python, you can skip to section 3 now.)
 | |
| 
 | |
| Python's syntax may not have big surprises (which is exactly as it
 | |
| should be), but its data types are quite different from what is found
 | |
| in languages like C, Ada or Modula-3.  All data types in Python, even
 | |
| integers, are `objects'.  All objects participate in a common garbage
 | |
| collection scheme (currently implemented using reference counting).
 | |
| Assignment is cheap, independent of object size and type: only a
 | |
| pointer to the assigned object is stored in the assigned-to variable.
 | |
| No type checking is performed on assignment; only specific operations
 | |
| like addition test for particular operand types.
 | |
| 
 | |
| The basic object types in Python are numbers, strings, tuples, lists
 | |
| and dictionaries.  Some other object types are open files, functions,
 | |
| modules, classes, and class instances; even types themselves are
 | |
| represented as objects.  Extension modules written in C can define
 | |
| additional object types; examples are objects representing windows and
 | |
| Amoeba capabilities.  Finally, the implementation itself makes heavy
 | |
| use of objects, and defines some private object types that aren't
 | |
| normally visible to the user.  There is no explicit pointer type in
 | |
| Python.
 | |
| 
 | |
| {\em Numbers}, both integers and floating point, are pretty
 | |
| straightforward.  The notation for numeric literals is the same as in
 | |
| C, including octal and hexadecimal integers; precision is the same as
 | |
| {\tt long} or {\tt double} in C\@.  A third numeric type, `long
 | |
| integer', written with an `L' suffix, can be used for arbitrary
 | |
| precision calculations.  All arithmetic, shifting and masking
 | |
| operations from C are supported.
 | |
| 
 | |
| {\em Strings} are `primitive' objects just like numbers.  String
 | |
| literals are written between single quotes, using similar escape
 | |
| sequences as in C\@.  Operations are built into the language to
 | |
| concatenate and to replicate strings, to extract substrings, etc.
 | |
| There is no limit to the length of the strings created by a program.
 | |
| There is no separate character data type; strings of length one do
 | |
| nicely.
 | |
| 
 | |
| {\em Tuples} are a way to `pack' small amounts of heterogeneous data
 | |
| together and carry them around as a unit.  Unlike structure members in
 | |
| C, tuple items are nameless.  Packing and unpacking assignments allow
 | |
| access to the items, for example:
 | |
| \begin{verbatim}
 | |
| x = 'Hi', (1, 2), 'World'   # x is a 3-item tuple,
 | |
|                             # its middle item is (1, 2)
 | |
| p, q, r = x                 # unpack x into p, q and r
 | |
| a, b = q                    # unpack q into a and b
 | |
| \end{verbatim}
 | |
| A combination of packing and unpacking assignment can be used as
 | |
| parallel assignment, and is idiom for permutations, e.g.:
 | |
| \begin{verbatim}
 | |
| p, q = q, p                 # swap without temporary
 | |
| a, b, c = b, c, a           # cyclic permutation
 | |
| \end{verbatim}
 | |
| Tuples are also used for function argument lists if there is more than
 | |
| one argument.  A tuple object, once created, cannot be modified; but
 | |
| it is easy enough to unpack it and create a new, modified tuple from
 | |
| the unpacked items and assign this to the variable that held the
 | |
| original tuple object (which will then be garbage-collected).
 | |
| 
 | |
| {\em Lists} are array-like objects.  List items may be arbitrary
 | |
| objects and can be accessed and changed using standard subscription
 | |
| notation.  Lists support item insertion and deletion, and can
 | |
| therefore be used as queues, stacks etc.; there is no limit to their
 | |
| size.
 | |
| 
 | |
| Strings, tuples and lists together are {\em sequence} types.  These
 | |
| share a common notation for generic operations on sequences such as
 | |
| subscription, concatenation, slicing (taking subsequences) and
 | |
| membership tests.  As in C, subscripts start at 0.
 | |
| 
 | |
| {\em Dictionaries} are `mappings' from one domain to another.  The
 | |
| basic operations on dictionaries are item insertion, extraction and
 | |
| deletion, using subscript notation with the key as subscript.  (The
 | |
| current implementation allows only strings in the key domain, but a
 | |
| future version of the language may remove this restriction.)
 | |
| 
 | |
| \subsection{Statements}
 | |
| 
 | |
| Python has various kinds of simple statements, such as assignments
 | |
| and {\tt print} statements, and several kinds of compound statements,
 | |
| like {\tt if} and {\tt for} statements.  Formally, function
 | |
| definitions and {\tt import} statements are also statements, and there
 | |
| are no restrictions on the ordering of statements or their nesting:
 | |
| {\tt import} may be used inside a function, functions may be defined
 | |
| conditionally using an {\tt if} statement, etc.  The effect of a
 | |
| declaration-like statement takes place only when it is executed.
 | |
| 
 | |
| All statements except assignments and expression statements begin with
 | |
| a keyword: this makes the language easy to parse.  An overview of the
 | |
| most common statement forms in Python follows.
 | |
| 
 | |
| An {\em assignment} has the general form
 | |
| \vspace{\itemsep}
 | |
| 
 | |
| \noindent
 | |
| {\em variable $=$ variable $= ... =$ variable $=$ expression}
 | |
| \vspace{\itemsep}
 | |
| 
 | |
| It assigns the value of the expression to all listed variables.  (As
 | |
| shown in the section on tuples, variables and expressions can in fact
 | |
| be comma-separated lists.)  The assignment operator is not an
 | |
| expression operator; there are no horrible things in Python like
 | |
| \begin{verbatim}
 | |
| while (p = p->next) { ... }
 | |
| \end{verbatim}
 | |
| Expression syntax is mostly straightforward and will not be explained
 | |
| in detail here.
 | |
| 
 | |
| An {\em expression statement} is just an expression on a line by
 | |
| itself.  This writes the value of the expression to standard output,
 | |
| in a suitably unambiguous way, unless it is a `procedure call' (a
 | |
| function call that returns no value).  Writing the value is useful
 | |
| when Python is used in `calculator mode', and reminds the programmer
 | |
| not to ignore function results.
 | |
| 
 | |
| The {\tt if} statement allows conditional execution.  It has optional
 | |
| {\tt elif} and {\tt else} parts; a construct like {\tt
 | |
| if...elif...elif...elif...else} can be used to compensate for the
 | |
| absence of a {\em switch} or {\em case} statement.
 | |
| 
 | |
| Looping is done with {\tt while} and {\tt for} statements.  The latter
 | |
| (demonstrated in the `ls' example earlier) iterates over the elements
 | |
| of a `sequence' (see the discussion of data types below).  It is
 | |
| possible to terminate a loop with a {\tt break} statement or to start
 | |
| the next iteration with {\tt continue}.  Both looping statements have
 | |
| an optional {\tt else} clause which is executed after the loop is
 | |
| terminated normally, but skipped when it is terminated by {\tt break}.
 | |
| This can be handy for searches, to handle the case that the item is
 | |
| not found.
 | |
| 
 | |
| Python's {\em exception} mechanism is modelled after that of Modula-3.
 | |
| Exceptions are raised by the interpreter when an illegal operation is
 | |
| tried.  It is also possible to explicitly raise an exception with the
 | |
| {\tt raise} statement:
 | |
| \vspace{\itemsep}
 | |
| 
 | |
| \noindent
 | |
| {\tt raise {\em expression}, {\em expression}}
 | |
| \vspace{\itemsep}
 | |
| 
 | |
| The first expression identifies which exception should be raised;
 | |
| there are several built-in exceptions and the user may define
 | |
| additional ones.  The second, optional expression is passed to the
 | |
| handler, e.g. as a detailed error message.
 | |
| 
 | |
| Exceptions may be handled (caught) with the {\tt try} statement, which
 | |
| has the following general form:
 | |
| \vspace{\itemsep}
 | |
| 
 | |
| \noindent
 | |
| {\tt
 | |
| \begin{tabular}{l}
 | |
| try: {\em block} \\
 | |
| except {\em expression}, {\em variable}: {\em block} \\
 | |
| except {\em expression}, {\em variable}: {\em block} \\
 | |
| ... \\
 | |
| except: {\em block}
 | |
| \end{tabular}
 | |
| }
 | |
| \vspace{\itemsep}
 | |
| 
 | |
| When an exception is raised during execution of the first block, a
 | |
| search for an exception handler starts.  The first {\tt except} clause
 | |
| whose {\em expression} matches the exception is executed.  The
 | |
| expression may specify a list of exceptions to match against.  A
 | |
| handler without an expression serves as a `catch-all'.  If there is no
 | |
| match, the search for a handler continues with outer {\tt try}
 | |
| statements; if no match is found on the entire invocation stack, an
 | |
| error message and stack trace are printed, and the program is
 | |
| terminated (interactively, the interpreter returns to its main loop).
 | |
| 
 | |
| Note that the form of the {\tt except} clauses encourages a style of
 | |
| programming whereby only selected exceptions are caught, passing
 | |
| unanticipated exceptions on to the caller and ultimately to the user.
 | |
| This is preferable over a simpler `catch-all' error handling
 | |
| mechanism, where a simplistic handler intended to catch a single type
 | |
| of error like `file not found' can easily mask genuine programming
 | |
| errors --- especially in a language like Python which relies strongly
 | |
| on run-time checking and allows the catching of almost any type of
 | |
| error.
 | |
| 
 | |
| Other common statement forms, which we have already encountered, are
 | |
| function definitions, {\tt import} statements and {\tt print}
 | |
| statements.  There is also a {\tt del} statement to delete one or more
 | |
| variables, a {\tt return} statement to return from a function, and a
 | |
| {\tt global} statement to allow assignments to global variables.
 | |
| Finally, the {\tt pass} statement is a no-op.
 | |
| 
 | |
| \subsection{Execution Model}
 | |
| 
 | |
| A Python program is executed by a stack-based interpreter.
 | |
| 
 | |
| When a function is called, a new `execution environment' for it is
 | |
| pushed onto the stack.  An execution environment contains (among other
 | |
| data) pointers to two `symbol tables' that are used to hold variables:
 | |
| the local and the global symbol table.  The local symbol table
 | |
| contains local variables of the current function invocation (including
 | |
| the function arguments); the global symbol table contains variables
 | |
| defined in the module containing the current function.
 | |
| 
 | |
| The `global' symbol table is thus only global with respect to the
 | |
| current function.  There are no system-wide global variables; using
 | |
| the {\tt import} statement it is easy enough to reference variables
 | |
| that are defined in other modules.  A system-wide read-only symbol
 | |
| table is used for built-in functions and constants though.
 | |
| 
 | |
| On assignment to a variable, by default an entry for it is made in the
 | |
| local symbol table of the current execution environment.  The {\tt
 | |
| global} command can override this (it is not enough that a global
 | |
| variable by the same name already exists).  When a variable's value is
 | |
| needed, it is searched first in the local symbol table, then in the
 | |
| global one, and finally in the symbol table containing built-in
 | |
| functions and constants.
 | |
| 
 | |
| The term `variable' in this context refers to any name: functions and
 | |
| imported modules are searched in exactly the same way.  
 | |
| 
 | |
| Names defined in a module's symbol table survive until the end of the
 | |
| program.  This approximates the semantics of file-static global
 | |
| variables in C or module variables in Modula-3.  A module is
 | |
| initialized the first time it is imported, by executing the text of
 | |
| the module as a parameterless function whose local and global symbol
 | |
| tables are the same, so names are defined in module's symbol table.
 | |
| (Modules implemented in C have another way to define symbols.)
 | |
| 
 | |
| A Python main program is read from standard input or from a script
 | |
| file passed as an argument to the interpreter.  It is executed as if
 | |
| an anonymous module was imported.  Since {\tt import} statements are
 | |
| executed like all other statements, the initialization order of the
 | |
| modules used in a program is defined by the flow of control through
 | |
| the program.
 | |
| 
 | |
| The `attribute' notation {\em m.name}, where {\em m} is a module,
 | |
| accesses the symbol {\em name} in that module's symbol table.  It can
 | |
| be assigned to as well.  This is in fact a special case of the
 | |
| construct {\em x.name} where {\em x} denotes an arbitrary object; the
 | |
| type of {\em x} determines how this is to be interpreted, and what
 | |
| assignment to it means.
 | |
| 
 | |
| For instance, when {\tt a} is a list object, {\tt a.append} yields a
 | |
| built-in `method' object which, when called, appends an item to {\tt a}.
 | |
| (If {\tt a} and {\tt b} are distinct list objects, {\tt a.append} and
 | |
| {\tt b.append} are distinguishable method objects.)  Normally, in
 | |
| statements like {\tt a.append(x)}, the method object {\tt a.append} is
 | |
| called and then discarded, but this is a matter of convention.
 | |
| 
 | |
| List attributes are read-only --- the user cannot define new list
 | |
| methods.  Some objects, like numbers and strings, have no attributes
 | |
| at all.  Like all type checking in Python, the meaning of an attribute
 | |
| is determined at run-time --- when the parser sees {\em x.name}, it
 | |
| has no idea of the type of {\em x}.  Note that {\em x} here does not
 | |
| have to be a variable --- it can be an arbitrary (perhaps
 | |
| parenthesized) expression.
 | |
| 
 | |
| Given the flexibility of the attribute notation, one is tempted to use
 | |
| methods to replace all standard operations.  Yet, Python has kept a
 | |
| small repertoire of built-in functions like {\tt len()} and {\tt
 | |
| abs()}.  The reason is that in some cases the function notation is
 | |
| more familiar than the method notation; just like programs would
 | |
| become less readable if all infix operators were replaced by function
 | |
| calls, they would become less readable if all function calls had to be
 | |
| replaced by method calls (and vice versa!).
 | |
| 
 | |
| The choice whether to make something a built-in function or a method
 | |
| is a matter of taste.  For arithmetic and string operations, function
 | |
| notation is preferred, since frequently the argument to such an
 | |
| operation is an expression using infix notation, as in {\tt abs(a+b)};
 | |
| this definitely looks better than {\tt (a+b).abs()}.  The choice
 | |
| between make something a built-in function or a function defined in a
 | |
| built-in method (requiring {\tt import}) is similarly guided by
 | |
| intuition; all in all, only functions needed by `general' programming
 | |
| techniques are built-in functions.
 | |
| 
 | |
| \subsection{Classes}
 | |
| 
 | |
| Python has a class mechanism distinct from the object-orientation
 | |
| already explained.  A class in Python is not much more than a
 | |
| collection of methods and a way to create class instances.  Class
 | |
| methods are ordinary functions whose first parameter is the class
 | |
| instance; they are called using the method notation.
 | |
| 
 | |
| For instance, a class can be defined as follows:
 | |
| \begin{verbatim}
 | |
| class Foo:
 | |
|    def meth1(self, arg): ...
 | |
|    def meth2(self): ...
 | |
| \end{verbatim}
 | |
| A class instance is created by
 | |
| {\tt x = Foo()}
 | |
| and its methods can be called thus:
 | |
| \begin{verbatim}
 | |
| x.meth1('Hi There!')
 | |
| x.meth2()
 | |
| \end{verbatim}
 | |
| The functions used as methods are also available as attributes of the
 | |
| class object, and the above method calls could also have been written
 | |
| as follows:
 | |
| \begin{verbatim}
 | |
| Foo.meth1(x, 'Hi There!')
 | |
| Foo.meth2(x)
 | |
| \end{verbatim}
 | |
| Class methods can store instance data by assigning to instance data
 | |
| attributes, e.g.:
 | |
| \begin{verbatim}
 | |
| self.size = 100
 | |
| self.title = 'Dear John'
 | |
| \end{verbatim}
 | |
| Data attributes do not have to be declared; as with local variables,
 | |
| they spring into existence when assigned to.  It is a matter of
 | |
| discretion to avoid name conflicts with method names.  This facility
 | |
| is also available to class users; instances of a method-less class can
 | |
| be used as records with named fields.
 | |
| 
 | |
| There is no built-in mechanism for instance initialization.  Classes
 | |
| by convention provide an {\tt init()} method which initializes the
 | |
| instance and then returns it, so the user can write
 | |
| \begin{verbatim}
 | |
| x = Foo().init('Dr. Strangelove')
 | |
| \end{verbatim}
 | |
| 
 | |
| Any user-defined class can be used as a base class to derive other
 | |
| classes.  However, built-in types like lists cannot be used as base
 | |
| classes.  (Incidentally, the same is true in \Cpp{} and Modula-3.)  A
 | |
| class may override any method of its base classes.  Instance methods
 | |
| are first searched in the method list of their class, and then,
 | |
| recursively, in the method lists of their base class.  Initialization
 | |
| methods of derived classes should explicitly call the initialization
 | |
| methods of their base class.
 | |
| 
 | |
| A simple form of multiple inheritance is also supported: a class can
 | |
| have multiple base classes, but the language rules for resolving name
 | |
| conflicts are somewhat simplistic, and consequently the feature has so
 | |
| far found little usage.
 | |
| 
 | |
| \subsection{The Python Library}
 | |
| 
 | |
| Python comes with an extensive library, structured as a collection of
 | |
| modules.  A few modules are built into the interpreter: these
 | |
| generally provide access to system libraries implemented in C such as
 | |
| mathematical functions or operating system calls.  Two built-in
 | |
| modules provide access to internals of the interpreter and its
 | |
| environment.  Even abusing these internals will at most cause an
 | |
| exception in the Python program; the interpreter will not dump core
 | |
| because of errors in Python code.
 | |
| 
 | |
| Most modules however are written in Python and distributed with the
 | |
| interpreter; they provide general programming tools like string
 | |
| operations and random number generators, provide more convenient
 | |
| interfaces to some built-in modules, or provide specialized services
 | |
| like a {\em getopt}-style command line option processor for
 | |
| stand-alone scripts.
 | |
| 
 | |
| There are also some modules written in Python that dig deep in the
 | |
| internals of the interpreter; there is a module to browse the stack
 | |
| backtrace when an unhandled exception has occurred, one to disassemble
 | |
| the internal representation of Python code, and even an interactive
 | |
| source code debugger which can trace Python code, set breakpoints,
 | |
| etc.
 | |
| 
 | |
| \subsection{Extensibility}
 | |
| 
 | |
| It is easy to add new built-in modules written in C to the Python
 | |
| interpreter.  Extensions appear to the Python user as built-in
 | |
| modules.  Using a built-in module is no different from using a module
 | |
| written in Python, but obviously the author of a built-in module can
 | |
| do things that cannot be implemented purely in Python.
 | |
| 
 | |
| In particular, built-in modules can contain Python-callable functions
 | |
| that call functions from particular system libraries (`wrapper
 | |
| functions'), and they can define new object types.  In general, if a
 | |
| built-in module defines a new object type, it should also provide at
 | |
| least one function that creates such objects.  Attributes of such
 | |
| object types are also implemented in C; they can return data
 | |
| associated with the object or methods, implemented as C functions.
 | |
| 
 | |
| For instance, an extension was created for Amoeba: it provides wrapper
 | |
| functions for the basic Amoeba name server functions, and defines a
 | |
| `capability' object type, whose methods are file server operations.
 | |
| Another extension is a built-in module called {\tt posix}; it provides
 | |
| wrappers around post UNIX system calls.  Extension modules also
 | |
| provide access to two different windowing/graphics interfaces: STDWIN
 | |
| \cite{STDWIN}
 | |
| (which connects to X11 on UNIX and to the Mac Toolbox on the
 | |
| Macintosh), and the Graphics Library (GL) for Silicon Graphics
 | |
| machines.
 | |
| 
 | |
| Any function in an extension module is supposed to type-check its
 | |
| arguments; the interpreter contains a convenience function to
 | |
| facilitate extracting C values from arguments and type-checking them
 | |
| at the same time.  Returning values is also painless, using standard
 | |
| functions to create Python objects from C values.
 | |
| 
 | |
| On some systems extension modules may be dynamically loaded, thus
 | |
| avoiding the need to maintain a private copy of the Python interpreter
 | |
| in order to use a private extension.
 | |
| 
 | |
| \section{A Short Description of AIL and Amoeba}
 | |
| 
 | |
| An RPC stub generator takes an interface description as input.  The
 | |
| designer of a stub generator has at least two choices for the input
 | |
| language: use a suitably restricted version of the target language, or
 | |
| design a new language.  The first solution was chosen, for instance,
 | |
| by the designers of Flume, the stub generator for the Topaz
 | |
| distributed operating system built at DEC SRC
 | |
| \cite{Flume,Evolving}.
 | |
| 
 | |
| Flume's one and only target language is Modula-2+ (the predecessor of
 | |
| Modula-3).  Modula-2+, like Modula-N for any N, has an interface
 | |
| syntax that is well suited as a stub generator input language: an
 | |
| interface module declares the functions that are `exported' by a
 | |
| module implementation, with their parameter and return types, plus the
 | |
| types and constants used for the parameters.  Therefore, the input to
 | |
| Flume is simply a Modula-2+ interface module.  But even in this ideal
 | |
| situation, an RPC stub generator needs to know things about functions
 | |
| that are not stated explicitly in the interface module: for instance,
 | |
| the transfer direction of VAR parameters (IN, OUT or both) is not
 | |
| given.  Flume solves this and other problems by a mixture of
 | |
| directives hidden in comments and a convention for the names of
 | |
| objects.  Thus, one could say that the designers of Flume really
 | |
| created a new language, even though it looks remarkably like their
 | |
| target language.
 | |
| 
 | |
| \subsection{The AIL Input Language}
 | |
| 
 | |
| Amoeba uses C as its primary programming language.  C function
 | |
| declarations (at least in `Classic' C) don't specify the types of
 | |
| the parameters, let alone their transfer direction.  Using this as
 | |
| input for a stub generator would require almost all information for
 | |
| the stub generator to be hidden inside comments, which would require a
 | |
| rather contorted scanner.  Therefore we decided to design the input
 | |
| syntax for Amoeba's stub generator `from scratch'.  This gave us the
 | |
| liberty to invent proper syntax not only for the transfer direction of
 | |
| parameters, but also for variable-length arrays.
 | |
| 
 | |
| On the other hand we decided not to abuse our freedom, and borrowed as
 | |
| much from C as we could.  For instance, AIL runs its input through the
 | |
| C preprocessor, so we get macros, include files and conditional
 | |
| compilation for free.  AIL's type declaration syntax is a superset of
 | |
| C's, so the user can include C header files to use the types declared
 | |
| there as function parameter types --- which are declared using
 | |
| function prototypes as in \Cpp{} or Standard C\@.  It should be clear by
 | |
| now that AIL's lexical conventions are also identical to C's.  The
 | |
| same is true for its expression syntax.
 | |
| 
 | |
| Where does AIL differ from C, then?  Function declarations in AIL are
 | |
| grouped in {\em classes}.  Classes in AIL are mostly intended as a
 | |
| grouping mechanism: all functions implemented by a server are grouped
 | |
| together in a class.  Inheritance is used to form new groups by adding
 | |
| elements to existing groups; multiple inheritance is supported to join
 | |
| groups together.  Classes can also contain constant and type
 | |
| definitions, and one form of output that AIL can generate is a header
 | |
| file for use by C programmers who wish to use functions from a
 | |
| particular AIL class.
 | |
| 
 | |
| Let's have a look at some (unrealistically simple) class definitions:
 | |
| \begin{verbatim}
 | |
| #include <amoeba.h>     /* Defines `capability', etc. */
 | |
| 
 | |
| class standard_ops [1000 .. 1999] {
 | |
|     /* Operations supported by most interfaces */
 | |
|     std_info(*, out char buf[size:100], out int size);
 | |
|     std_destroy(*);
 | |
| };
 | |
| \end{verbatim}
 | |
| This defines a class called `standard\_ops' whose request codes are
 | |
| chosen by AIL from the range 1000-1999.  Request codes are small
 | |
| integers used to identify remote operations.  The author of the class
 | |
| must specify a range from which AIL chooses, and class authors must
 | |
| make sure they avoid conflicts, e.g. by using an `assigned number
 | |
| administration office'.  In the example, `std\_info' will be assigned
 | |
| request code 1000 and `std\_destroy' will get code 1001.  There is
 | |
| also an option to explicitly assign request codes, for compatibility
 | |
| with servers with manually written interfaces.
 | |
| 
 | |
| The class `standard\_ops' defines two operations, `std\_info' and
 | |
| `std\_destroy'.  The first parameter of each operation is a star
 | |
| (`*'); this is a placeholder for a capability that must be passed when
 | |
| the operation is called.  The description of Amoeba below explains the
 | |
| meaning and usage of capabilities; for now, it is sufficient to know
 | |
| that a capability is a small structure that uniquely identifies an
 | |
| object and a server or service.
 | |
| 
 | |
| The standard operation `std\_info' has two output parameters: a
 | |
| variable-size character buffer (which will be filled with a short
 | |
| descriptive string of the object to which the operation is applied)
 | |
| and an integer giving the length of this string.  The standard
 | |
| operation `std\_destroy' has no further parameters --- it just
 | |
| destroys the object, if the caller has the right to do so.
 | |
| 
 | |
| The next class is called `tty':
 | |
| \begin{verbatim}
 | |
| class tty [2000 .. 2099] {
 | |
|     inherit standard_ops;
 | |
|     const TTY_MAXBUF = 1000;
 | |
|     tty_write(*, char buf[size:TTY_MAXBUF], int size);
 | |
|     tty_read(*, out char buf[size:TTY_MAXBUF], out int size);
 | |
| };
 | |
| \end{verbatim}
 | |
| The request codes for operations defined in this class lie in the
 | |
| range 2000-2099; inherited operations use the request codes already
 | |
| assigned to them.  The operations defined by this class are
 | |
| `tty\_read' and `tty\_write', which pass variable-sized data buffers
 | |
| between client and server.  Class `tty' inherits class
 | |
| `standard\_ops', so tty objects also support the operations
 | |
| `std\_info' and `std\_destroy'.
 | |
| 
 | |
| Only the {\em interface} for `std\_info' and `std\_destroy' is shared
 | |
| between tty objects and other objects whose interface inherits
 | |
| `standard\_ops'; the implementation may differ.  Even multiple
 | |
| implementations of the `tty' interface may exist, e.g. a driver for a
 | |
| console terminal and a terminal emulator in a window.  To expand on
 | |
| the latter example, consider:
 | |
| \begin{verbatim}
 | |
| class window [2100 .. 2199] {
 | |
|     inherit standard_ops;
 | |
|     win_create(*, int x, int y, int width, int height,
 | |
|                   out capability win_cap);
 | |
|     win_reconfigure(*, int x, int y, int width, int height);
 | |
| };
 | |
| 
 | |
| class tty_emulator [2200 .. 2299] {
 | |
|     inherit tty, window;
 | |
| };
 | |
| \end{verbatim}
 | |
| Here two new interface classes are defined.
 | |
| Class `window' could be used for creating and manipulating windows.
 | |
| Note that `win\_create' returns a capability for the new window.
 | |
| This request should probably should be sent to a generic window
 | |
| server capability, or it might create a subwindow when applied to a
 | |
| window object.
 | |
| 
 | |
| Class `tty\_emulator' demonstrates the essence of multiple inheritance.
 | |
| It is presumably the interface to a window-based terminal emulator.
 | |
| Inheritance is transitive, so `tty\_emulator' also implicitly inherits
 | |
| `standard\_ops'.
 | |
| In fact, it inherits it twice: once via `tty' and once via `window'.
 | |
| Since AIL class inheritance only means interface sharing, not
 | |
| implementation sharing, inheriting the same class multiple times is
 | |
| never a problem and has the same effect as inheriting it once.
 | |
| 
 | |
| Note that the power of AIL classes doesn't go as far as \Cpp{}.
 | |
| AIL classes cannot have data members, and there is
 | |
| no mechanism for a server that implements a derived class
 | |
| to inherit the implementation of the base
 | |
| class --- other than copying the source code.
 | |
| The syntax for class definitions and inheritance is also different.
 | |
| 
 | |
| \subsection{Amoeba}
 | |
| 
 | |
| The smell of `object-orientedness' that the use of classes in AIL
 | |
| creates matches nicely with Amoeba's object-oriented approach to
 | |
| RPC\@.  In Amoeba, almost all operating system entities (files,
 | |
| directories, processes, devices etc.) are implemented as {\em
 | |
| objects}.  Objects are managed by {\em services} and represented by
 | |
| {\em capabilities}.  A capability gives its holder access to the
 | |
| object it represents.  Capabilities are protected cryptographically
 | |
| against forgery and can thus be kept in user space.  A capability is a
 | |
| 128-bit binary string, subdivided as follows:
 | |
| 
 | |
| % XXX Need a better version of this picture!
 | |
| \begin{verbatim}
 | |
|         48             24          8           48       Bits
 | |
| +----------------+------------+--------+---------------+
 | |
| |    Service     |   Object   |  Perm. |     Check     |
 | |
| |      port      |   number   |  bits  |     word      |
 | |
| +----------------+------------+--------+---------------+
 | |
| \end{verbatim}
 | |
| 
 | |
| The service port is used by the RPC implementation in the Amoeba
 | |
| kernel to locate a server implementing the service that manages the
 | |
| object.  In many cases there is a one-to-one correspondence between
 | |
| servers and services (each service is implemented by exactly one
 | |
| server process), but some services are replicated.  For instance,
 | |
| Amoeba's directory service, which is crucial for gaining access to most
 | |
| other services, is implemented by two servers that listen on the same
 | |
| port and know about exactly the same objects.
 | |
| 
 | |
| The object number in the capability is used by the server receiving
 | |
| the request for identifying the object to which the operation applies.
 | |
| The permission bits specify which operations the holder of the capability
 | |
| may apply.  The last part of a capability is a 48-bit long `check
 | |
| word', which is used to prevent forgery.  The check word is computed
 | |
| by the server based upon the permission bits and a random key per object
 | |
| that it keeps secret.  If you change the permission bits you must compute
 | |
| the proper check word or else the server will refuse the capability.
 | |
| Due to the size of the check word and the nature of the cryptographic
 | |
| `one-way function' used to compute it, inverting this function is
 | |
| impractical, so forging capabilities is impossible.%
 | |
| \footnote{
 | |
| As computers become faster, inverting the one-way function becomes
 | |
| less impractical.
 | |
| Therefore, a next version of Amoeba will have 64-bit check words.
 | |
| }
 | |
| 
 | |
| A working Amoeba system is a collection of diverse servers, managing
 | |
| files, directories, processes, devices etc.  While most servers have
 | |
| their own interface, there are some requests that make sense for some
 | |
| or all object types.  For instance, the {\em std\_info()} request,
 | |
| which returns a short descriptive string, applies to all object types.
 | |
| Likewise, {\em std\_destroy()} applies to files, directories and
 | |
| processes, but not to devices.
 | |
| 
 | |
| Similarly, different file server implementations may want to offer the
 | |
| same interface for operations like {\em read()} and {\em write()} to
 | |
| their clients.  AIL's grouping of requests into classes is ideally
 | |
| suited to describe this kind of interface sharing, and a class
 | |
| hierarchy results which clearly shows the similarities between server
 | |
| interfaces (not necessarily their implementations!).
 | |
| 
 | |
| The base class of all classes defines the {\em std\_info()} request.
 | |
| Most server interfaces actually inherit a derived class that also
 | |
| defines {\em std\_destroy().} File servers inherit a class that
 | |
| defines the common operations on files, etc.
 | |
| 
 | |
| \subsection{How AIL Works}
 | |
| 
 | |
| The AIL stub generator functions in three phases:
 | |
| \begin{itemize}
 | |
| \item
 | |
| parsing,
 | |
| \item
 | |
| strategy determination,
 | |
| \item
 | |
| code generation.
 | |
| \end{itemize}
 | |
| 
 | |
| {\bf Phase one} parses the input and builds a symbol table containing
 | |
| everything it knows about the classes and other definitions found in
 | |
| the input.
 | |
| 
 | |
| {\bf Phase two} determines the strategy to use for each function
 | |
| declaration in turn and decides upon the request and reply message
 | |
| formats.  This is not a simple matter, because of various optimization
 | |
| attempts.  Amoeba's kernel interface for RPC requests takes a
 | |
| fixed-size header and one arbitrary-size buffer.  A large part of the
 | |
| header holds the capability of the object to which the request is
 | |
| directed, but there is some space left for a few integer parameters
 | |
| whose interpretation is left up to the server.  AIL tries to use these
 | |
| slots for simple integer parameters, for two reasons.
 | |
| 
 | |
| First, unlike the buffer, header fields are byte-swapped by the RPC
 | |
| layer in the kernel if necessary, so it saves a few byte swapping
 | |
| instructions in the user code.  Second, and more important, a common
 | |
| form of request transfers a few integers and one large buffer to or
 | |
| from a server.  The {\em read()} and {\em write()} requests of most
 | |
| file servers have this form, for instance.  If it is possible to place
 | |
| all integer parameters in the header, the address of the buffer
 | |
| parameter can be passed directly to the kernel RPC layer.  While AIL
 | |
| is perfectly capable of handling requests that do not fit this format,
 | |
| the resulting code involves allocating a new buffer and copying all
 | |
| parameters into it.  It is a top priority to avoid this copying
 | |
| (`marshalling') if at all possible, in order to maintain Amoeba's
 | |
| famous RPC performance.
 | |
| 
 | |
| When AIL resorts to copying parameters into a buffer, it reorders them
 | |
| so that integers indicating the lengths of variable-size arrays are
 | |
| placed in the buffer before the arrays they describe, since otherwise
 | |
| decoding the request would be impossible.  It also adds occasional
 | |
| padding bytes to ensure integers are aligned properly in the buffer ---
 | |
| this can speed up (un)marshalling.
 | |
| 
 | |
| {\bf Phase three} is the code generator, or back-end.  There are in
 | |
| fact many different back-ends that may be called in a single run to
 | |
| generate different types of output.  The most important output types
 | |
| are header files (for inclusion by the clients of an interface),
 | |
| client stubs, and `server main loop' code.  The latter decodes
 | |
| incoming requests in the server.  The generated code depends on the
 | |
| programming language requested, and there are separate back-ends for
 | |
| each supported language.
 | |
| 
 | |
| It is important that the strategy chosen by phase two is independent
 | |
| of the language requested for phase three --- otherwise the
 | |
| interoperability of servers and clients written in different languages
 | |
| would be compromised.
 | |
| 
 | |
| \section{Linking AIL to Python}
 | |
| 
 | |
| From the previous section it can be concluded that linking AIL to
 | |
| Python is a matter of writing a back-end for Python.  This is indeed
 | |
| what we did.
 | |
| 
 | |
| Considerable time went into the design of the back-end in order to
 | |
| make the resulting RPC interface for Python fit as smoothly as
 | |
| possible in Python's programming style.  For instance, the issues of
 | |
| parameter transfer, variable-size arrays, error handling, and call
 | |
| syntax were all solved in a manner that favors ease of use in Python
 | |
| rather than strict correspondence with the stubs generated for C,
 | |
| without compromising network-level compatibility.
 | |
| 
 | |
| \subsection{Mapping AIL Entities to Python}
 | |
| 
 | |
| For each programming language that AIL is to support, a mapping must
 | |
| be designed between the data types in AIL and those in that language.
 | |
| Other aspects of the programming languages, such as differences in
 | |
| function call semantics, must also be taken care of.
 | |
| 
 | |
| While the mapping for C is mostly straightforward, the mapping for
 | |
| Python requires a little thinking to get the best results for Python
 | |
| programmers.
 | |
| 
 | |
| \subsubsection{Parameter Transfer Direction}
 | |
| 
 | |
| Perhaps the simplest issue is that of parameter transfer direction.
 | |
| Parameters of functions declared in AIL are categorized as being of
 | |
| type {\tt in}, {\tt out} or {\tt in} {\tt out} (the same distinction
 | |
| as made in Ada).  Python only has call-by-value parameter semantics;
 | |
| functions can return multiple values as a tuple.  This means that,
 | |
| unlike the C back-end, the Python back-end cannot always generate
 | |
| Python functions with exactly the same parameter list as the AIL
 | |
| functions.
 | |
| 
 | |
| Instead, the Python parameter list consists of all {\tt in} and {\tt
 | |
| in} {\tt out} parameters, in the order in which they occur in the AIL
 | |
| parameter list; similarly, the Python function returns a tuple
 | |
| containing all {\tt in} {\tt out} and {\tt out} parameters.  In fact
 | |
| Python packs function parameters into a tuple as well, stressing the
 | |
| symmetry between parameters and return value.  For example, a stub
 | |
| with this AIL parameter list:
 | |
| \begin{verbatim}
 | |
| (*, in int p1, in out int p2, in int p3, out int p4)
 | |
| \end{verbatim}
 | |
| will have the following parameter list and return values in Python:
 | |
| \begin{verbatim}
 | |
| (p1, p2, p3)  ->  (p2, p4)
 | |
| \end{verbatim}
 | |
| 
 | |
| \subsubsection{Variable-size Entities}
 | |
| 
 | |
| The support for variable-size objects in AIL is strongly guided by the
 | |
| limitations of C in this matter.  Basically, AIL allows what is
 | |
| feasible in C: functions may have variable-size arrays as parameters
 | |
| (both input or output), provided their length is passed separately.
 | |
| In practice this is narrowed to the following rule: for each
 | |
| variable-size array parameter, there must be an integer parameter
 | |
| giving its length.  (An exception for null-terminated strings is
 | |
| planned but not yet realized.)
 | |
| 
 | |
| Variable-size arrays in AIL or C correspond to {\em sequences} in
 | |
| Python: lists, tuples or strings.  These are much easier to use than
 | |
| their C counterparts.  Given a sequence object in Python, it is always
 | |
| possible to determine its size: the built-in function {\tt len()}
 | |
| returns it.  It would be annoying to require the caller of an RPC stub
 | |
| with a variable-size parameter to also pass a parameter that
 | |
| explicitly gives its size.  Therefore we eliminate all parameters from
 | |
| the Python parameter list whose value is used as the size of a
 | |
| variable-size array.  Such parameters are easily found: the array
 | |
| bound expression contains the name of the parameter giving its size.
 | |
| This requires the stub code to work harder (it has to recover the
 | |
| value for size parameters from the corresponding sequence parameter),
 | |
| but at least part of this work would otherwise be needed as well, to
 | |
| check that the given and actual sizes match.
 | |
| 
 | |
| Because of the symmetry in Python between the parameter list and the
 | |
| return value of a function, the same elimination is performed on
 | |
| return values containing variable-size arrays: integers returned
 | |
| solely to tell the client the size of a returned array are not
 | |
| returned explicitly to the caller in Python.
 | |
| 
 | |
| \subsubsection{Error Handling}
 | |
| 
 | |
| Another point where Python is really better than C is the issue of
 | |
| error handling.  It is a fact of life that everything involving RPC
 | |
| may fail, for a variety of reasons outside the user's control: the
 | |
| network may be disconnected, the server may be down, etc.  Clients
 | |
| must be prepared to handle such failures and recover from them, or at
 | |
| least print an error message and die.  In C this means that every
 | |
| function returns an error status that must be checked by the caller,
 | |
| causing programs to be cluttered with error checks --- or worse,
 | |
| programs that ignore errors and carry on working with garbage data.
 | |
| 
 | |
| In Python, errors are generally indicated by exceptions, which can be
 | |
| handled out of line from the main flow of control if necessary, and
 | |
| cause immediate program termination (with a stack trace) if ignored.
 | |
| To profit from this feature, all RPC errors that may be encountered by
 | |
| AIL-generated stubs in Python are turned into exceptions.  An extra
 | |
| value passed together with the exception is used to relay the error
 | |
| code returned by the server to the handler.  Since in general RPC
 | |
| failures are rare, Python test programs can usually ignore exceptions
 | |
| --- making the program simpler --- without the risk of occasional
 | |
| errors going undetected.  (I still remember the embarrassment of a
 | |
| hundredfold speed improvement reported, long, long, ago, about a new
 | |
| version of a certain program, which later had to be attributed to a
 | |
| benchmark that silently dumped core...)
 | |
| 
 | |
| \subsubsection{Function Call Syntax}
 | |
| 
 | |
| Amoeba RPC operations always need a capability parameter (this is what
 | |
| the `*' in the AIL function templates stands for); the service is
 | |
| identified by the port field of the capability.  In C, the capability
 | |
| must always be the first parameter of the stub function, but in Python
 | |
| we can do better.
 | |
| 
 | |
| A Python capability is an opaque object type in its own right, which
 | |
| is used, for instance, as parameter to and return value from Amoeba's
 | |
| name server functions.  Python objects can have methods, so it is
 | |
| convenient to make all AIL-generated stubs methods of capabilities
 | |
| instead of just functions.  Therefore, instead of writing
 | |
| \begin{verbatim}
 | |
| some_stub(cap, other_parameters)
 | |
| \end{verbatim}
 | |
| as in C, Python programmers can write
 | |
| \begin{verbatim}
 | |
| cap.some_stub(other_parameters)
 | |
| \end{verbatim}
 | |
| This is better because it reduces name conflicts: in Python, no
 | |
| confusion is possible between a stub and a local or global variable or
 | |
| user-defined function with the same name.
 | |
| 
 | |
| \subsubsection{Example}
 | |
| 
 | |
| All the preceding principles can be seen at work in the following
 | |
| example.  Suppose a function is declared in AIL as follows:
 | |
| \begin{verbatim}
 | |
| some_stub(*, in char buf[size:1000], in int size,
 | |
|              out int n_done, out int status);
 | |
| \end{verbatim}
 | |
| In C it might be called by the following code (including declarations,
 | |
| for clarity, but not initializations):
 | |
| \begin{verbatim}
 | |
| int err, n_done, status;
 | |
| capability cap;
 | |
| char buf[500];
 | |
| ...
 | |
| err = some_stub(&cap, buf, sizeof buf, &n_done, &status);
 | |
| if (err != 0) return err;
 | |
| printf("%d done; status = %d\n", n_done, status);
 | |
| \end{verbatim}
 | |
| Equivalent code in Python might be the following:
 | |
| \begin{verbatim}
 | |
| cap = ...
 | |
| buf = ...
 | |
| n_done, status = cap.some_stub(buf)
 | |
| print n_done, 'done;', 'status =', status
 | |
| \end{verbatim}
 | |
| No explicit error check is required in Python: if the RPC fails, an
 | |
| exception is raised so the {\tt print} statement is never reached.
 | |
| 
 | |
| \subsection{The Implementation}
 | |
| 
 | |
| More or less orthogonal to the issue of how to map AIL operations to
 | |
| the Python language is the question of how they should be implemented.
 | |
| 
 | |
| In principle it would be possible to use the same strategy that is
 | |
| used for C: add an interface to Amoeba's low-level RPC primitives to
 | |
| Python and generate Python code to marshal parameters into and out of
 | |
| a buffer.  However, Python's high-level data types are not well suited
 | |
| for marshalling: byte-level operations are clumsy and expensive, with
 | |
| the result that marshalling a single byte of data can take several
 | |
| Python statements.  This would mean that a large amount of code would
 | |
| be needed to implement a stub, which would cost a lot of time to parse
 | |
| and take up a lot of space in `compiled' form (as parse tree or pseudo
 | |
| code).  Execution of the marshalling code would be sluggish as well.
 | |
| 
 | |
| We therefore chose an alternate approach, writing the marshalling in
 | |
| C, which is efficient at such byte-level operations.  While it is easy
 | |
| enough to generate C code that can be linked with the Python
 | |
| interpreter, it would obviously not stimulate the use of Python for
 | |
| server testing if each change to an interface required relinking the
 | |
| interpreter (dynamic loading of C code is not yet available on
 | |
| Amoeba).  This is circumvented by the following solution: the
 | |
| marshalling is handled by a simple {\em virtual machine}, and AIL
 | |
| generates instructions for this machine.  An interpreter for the
 | |
| machine is linked into the Python interpreter and reads its
 | |
| instructions from a file written by AIL.
 | |
| 
 | |
| The machine language for our virtual machine is dubbed {\em Stubcode}.
 | |
| Stubcode is a super-specialized language.  There are two sets of of
 | |
| about a dozen instructions each: one set marshals Python objects
 | |
| representing parameters into a buffer, the other set (similar but not
 | |
| quite symmetric) unmarshals results from a buffer into Python objects.
 | |
| The Stubcode interpreter uses a stack to hold Python intermediate
 | |
| results.  Other state elements are an Amoeba header and buffer, a
 | |
| pointer indicating the current position in the buffer, and of course a
 | |
| program counter.  Besides (un)marshalling, the virtual machine must
 | |
| also implement type checking, and raise a Python exception when a
 | |
| parameter does not have the expected type.
 | |
| 
 | |
| The Stubcode interpreter marshals Python data types very efficiently,
 | |
| since each instruction can marshal a large amount of data.  For
 | |
| instance, a whole Python string is marshalled by a single Stubcode
 | |
| instruction, which (after some checking) executes the most efficient
 | |
| byte-copying loop possible --- it calls {\tt memcpy()}.
 | |
| 
 | |
| 
 | |
| Construction details of the Stubcode interpreter are straightforward.
 | |
| Most complications are caused by the peculiarities of AIL's strategy
 | |
| module and Python's type system.  By far the most complex single
 | |
| instruction is the `loop' instruction, which is used to marshal
 | |
| arrays.
 | |
| 
 | |
| As an example, here is the complete Stubcode program (with spaces and
 | |
| comments added for clarity) generated for the function {\tt
 | |
| some\_stub()} of the example above.  The stack contains pointers to
 | |
| Python objects, and its initial contents is the parameter to the
 | |
| function, the string {\tt buf}.  The final stack contents will be the
 | |
| function return value, the tuple {\tt (n\_done, status)}.  The name
 | |
| {\tt header} refers to the fixed size Amoeba RPC header structure.
 | |
| \vspace{1em}
 | |
| 
 | |
| {\tt
 | |
| \begin{tabular}{l l l}
 | |
| BufSize     & 1000            & {\em Allocate RPC buffer of 1000 bytes}    \\
 | |
| Dup         & 1               & {\em Duplicate stack top}                  \\
 | |
| StringS     &                 & {\em Replace stack top by its string size} \\
 | |
| PutI        & h\_extra int32  & {\em Store top element in }header.h\_extra \\
 | |
| TStringSlt  & 1000            & {\em Assert string size less than 1000}    \\
 | |
| PutVS       &                 & {\em Marshal variable-size string}         \\
 | |
|             &                 &                                            \\
 | |
| Trans       & 1234            & {\em Execute the RPC (request code 1234)}  \\
 | |
|             &                 &                                            \\
 | |
| GetI        & h\_extra int32  & {\em Push integer from} header.h\_extra    \\
 | |
| GetI        & h\_size int32   & {\em Push integer from} header.h\_size     \\
 | |
| Pack        & 2               & {\em Pack top 2 elements into a tuple}     \\
 | |
| \end{tabular}
 | |
| }
 | |
| \vspace{1em}
 | |
| 
 | |
| As much work as possible is done by the Python back-end in AIL, rather
 | |
| than in the Stubcode interpreter, to make the latter both simple and
 | |
| fast.  For instance, the decision to eliminate an array size parameter
 | |
| from the Python parameter list is taken by AIL, and Stubcode
 | |
| instructions are generated to recover the size from the actual
 | |
| parameter and to marshal it properly.  Similarly, there is a special
 | |
| alignment instruction (not used in the example) to meet alignment
 | |
| requirements.
 | |
| 
 | |
| Communication between AIL and the Stubcode generator is via the file
 | |
| system.  For each stub function, AIL creates a file in its output
 | |
| directory, named after the stub with a specific suffix.  This file
 | |
| contains a machine-readable version of the Stubcode program for the
 | |
| stub.  The Python user can specify a search path containing
 | |
| directories which the interpreter searches for a Stubcode file the
 | |
| first time the definition for a particular stub is needed.
 | |
| 
 | |
| The transformations on the parameter list and data types needed to map
 | |
| AIL data types to Python data types make it necessary to help the
 | |
| Python programmer a bit in figuring out the parameters to a call.
 | |
| Although in most cases the rules are simple enough, it is sometimes
 | |
| hard to figure out exactly what the parameter and return values of a
 | |
| particular stub are.  There are two sources of help in this case:
 | |
| first, the exception contains enough information so that the user can
 | |
| figure what type was expected; second, AIL's Python back-end
 | |
| optionally generates a human-readable `interface specification' file.
 | |
| 
 | |
| \section{Conclusion}
 | |
| 
 | |
| We have succeeded in creating a useful extension to Python that
 | |
| enables Amoeba server writers to test and experiment with their server
 | |
| in a much more interactive manner.  We hope that this facility will
 | |
| add to the popularity of AIL amongst Amoeba programmers.
 | |
| 
 | |
| Python's extensibility was proven convincingly by the exercise
 | |
| (performed by the second author) of adding the Stubcode interpreter to
 | |
| Python.  Standard data abstraction techniques are used to insulate
 | |
| extension modules from details of the rest of the Python interpreter.
 | |
| In the case of the Stubcode interpreter this worked well enough that
 | |
| it survived a major overhaul of the main Python interpreter virtually
 | |
| unchanged.
 | |
| 
 | |
| On the other hand, adding a new back-end to AIL turned out to be quite
 | |
| a bit of work.  One problem, specific to Python, was to be expected:
 | |
| Python's variable-size data types differ considerably from the
 | |
| C-derived data model that AIL favors.  Two additional problems we
 | |
| encountered were the complexity of the interface between AIL's second
 | |
| and third phases, and a number of remaining bugs in the second phase
 | |
| that surfaced when the implementation of the Python back-end was
 | |
| tested.  The bugs have been tracked down and fixed, but nothing
 | |
| has been done about the complexity of the interface.
 | |
| 
 | |
| \subsection{Future Plans}
 | |
| 
 | |
| AIL's C back-end generates server main loop code as well as client
 | |
| stubs.  The Python back-end currently only generates client stubs, so
 | |
| it is not yet possible to write servers in Python.  While it is
 | |
| clearly more important to be able to use Python as a client than as a
 | |
| server, the ability to write server prototypes in Python would be a
 | |
| valuable addition: it allows server designers to experiment with
 | |
| interfaces in a much earlier stage of the design, with a much smaller
 | |
| programming effort.  This makes it possible to concentrate on concepts
 | |
| first, before worrying about efficient implementation.
 | |
| 
 | |
| The unmarshalling done in the server is almost symmetric with the
 | |
| marshalling in the client, and vice versa, so relative small
 | |
| extensions to the Stubcode virtual machine will allow its use in a
 | |
| server main loop.  We hope to find the time to add this feature to a
 | |
| future version of Python.
 | |
| 
 | |
| \section{Availability}
 | |
| 
 | |
| The Python source distribution is available to Internet users by
 | |
| anonymous ftp to site {\tt ftp.cwi.nl} [IP address 192.16.184.180]
 | |
| from directory {\tt /pub}, file name {\tt python*.tar.Z} (where the
 | |
| {\tt *} stands for a version number).  This is a compressed UNIX tar
 | |
| file containing the C source and \LaTeX documentation for the Python
 | |
| interpreter.  It includes the Python library modules and the {\em
 | |
| Stubcode} interpreter, as well as many example Python programs.  Total
 | |
| disk space occupied by the distribution is about 3 Mb; compilation
 | |
| requires 1-3 Mb depending on the configuration built, the compile
 | |
| options, etc.
 | |
| 
 | |
| \bibliographystyle{plain}
 | |
| 
 | |
| \bibliography{quabib}
 | |
| 
 | |
| \end{document}
 | 
