mirror of
				https://github.com/python/cpython.git
				synced 2025-10-26 19:24:34 +00:00 
			
		
		
		
	
		
			
				
	
	
		
			302 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
			
		
		
	
	
			302 lines
		
	
	
	
		
			11 KiB
		
	
	
	
		
			TeX
		
	
	
	
	
	
| % Complete documentation on the extended LaTeX markup used for Python
 | |
| % documentation is available in ``Documenting Python'', which is part
 | |
| % of the standard documentation for Python.  It may be found online
 | |
| % at:
 | |
| %
 | |
| %     http://www.python.org/doc/current/doc/doc.html
 | |
| 
 | |
| \documentclass{howto}
 | |
| 
 | |
| \title{Python compiler package}
 | |
| 
 | |
| \author{Jeremy Hylton}
 | |
| 
 | |
| % Please at least include a long-lived email address;
 | |
| % the rest is at your discretion.
 | |
| \authoraddress{
 | |
|         PythonLabs \\
 | |
|         Zope Corporation \\
 | |
|         Email: \email{jeremy@zope.com}
 | |
| }
 | |
| 
 | |
| \date{August 15, 2001}           % update before release!
 | |
|                                 % Use an explicit date so that reformatting
 | |
|                                 % doesn't cause a new date to be used.  Setting
 | |
|                                 % the date to \today can be used during draft
 | |
|                                 % stages to make it easier to handle versions.
 | |
| 
 | |
| \release{2.2}                   % release version; this is used to define the
 | |
|                                 % \version macro
 | |
| 
 | |
| \makeindex                      % tell \index to actually write the .idx file
 | |
| \makemodindex                   % If this contains a lot of module sections.
 | |
| 
 | |
| 
 | |
| \begin{document}
 | |
| 
 | |
| \maketitle
 | |
| 
 | |
| \begin{abstract}
 | |
| 
 | |
| \noindent
 | |
| The Python compiler package is a tool for analyzing Python source code
 | |
| and generating Python bytecode.  The compiler contains libraries to
 | |
| generate an abstract syntax tree from Python source code and to
 | |
| generate Python bytecode from the tree.
 | |
| 
 | |
| \end{abstract}
 | |
| 
 | |
| \tableofcontents
 | |
| 
 | |
| 
 | |
| \section{Introduction\label{Introduction}}
 | |
| 
 | |
| XXX Need basic intro
 | |
| 
 | |
| XXX what are the major advantages...  the abstract syntax is much
 | |
| closer to the python source...
 | |
| 
 | |
| 
 | |
| \section{The basic interface}
 | |
| 
 | |
| \declaremodule{}{compiler}
 | |
| 
 | |
| The top-level of the package defines four functions.
 | |
| 
 | |
| \begin{funcdesc}{parse}{buf}
 | |
| Returns an abstract syntax tree for the Python source code in \var{buf}.
 | |
| The function raises SyntaxError if there is an error in the source
 | |
| code.  The return value is a \class{compiler.ast.Module} instance that
 | |
| contains the tree.  
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{parseFile}{path}
 | |
| Return an abstract syntax tree for the Python source code in the file
 | |
| specified by \var{path}.  It is equivalent to
 | |
| \code{parse(open(\var{path}).read())}.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{walk}{ast, visitor\optional{, verbose}}
 | |
| Do a pre-order walk over the abstract syntax tree \var{ast}.  Call the
 | |
| appropriate method on the \var{visitor} instance for each node
 | |
| encountered.
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{funcdesc}{compile}{path}
 | |
| Compile the file \var{path} and generate the corresponding \file{.pyc}
 | |
| file.
 | |
| \end{funcdesc}
 | |
| 
 | |
| The \module{compiler} package contains the following modules:
 | |
| \refmodule[compiler.ast]{ast}, \module{consts}, \module{future},
 | |
| \module{misc}, \module{pyassem}, \module{pycodegen}, \module{symbols},
 | |
| \module{transformer}, and \refmodule[compiler.visitor]{visitor}.
 | |
| 
 | |
| 
 | |
| \section{Limitations}
 | |
| 
 | |
| There are some problems with the error checking of the compiler
 | |
| package.  The interpreter detects syntax errors in two distinct
 | |
| phases.  One set of errors is detected by the interpreter's parser,
 | |
| the other set by the compiler.  The compiler package relies on the
 | |
| interpreter's parser, so it get the first phases of error checking for
 | |
| free.  It implements the second phase itself, and that implement is
 | |
| incomplete.  For example, the compiler package does not raise an error
 | |
| if a name appears more than once in an argument list: 
 | |
| \code{def f(x, x): ...}
 | |
| 
 | |
| 
 | |
| \section{Python Abstract Syntax}
 | |
| 
 | |
| The \module{compiler.ast} module defines an abstract syntax for
 | |
| Python.  In the abstract syntax tree, each node represents a syntactic
 | |
| construct.  The root of the tree is \class{Module} object.
 | |
| 
 | |
| The abstract syntax offers a higher level interface to parsed Python
 | |
| source code.  The \ulink{\module{parser}}
 | |
| {http://www.python.org/doc/current/lib/module-parser.html}
 | |
| module and the compiler written in C for the Python interpreter use a
 | |
| concrete syntax tree.  The concrete syntax is tied closely to the
 | |
| grammar description used for the Python parser.  Instead of a single
 | |
| node for a construct, there are often several levels of nested nodes
 | |
| that are introduced by Python's precedence rules.
 | |
| 
 | |
| The abstract syntax tree is created by the
 | |
| \module{compiler.transformer} module.  The transformer relies on the
 | |
| builtin Python parser to generate a concrete syntax tree.  It
 | |
| generates an abstract syntax tree from the concrete tree.  
 | |
| 
 | |
| The \module{transformer} module was created by Greg
 | |
| Stein\index{Stein, Greg} and Bill Tutt\index{Tutt, Bill} for an
 | |
| experimental Python-to-C compiler.  The current version contains a
 | |
| number of modifications and improvements, but the basic form of the
 | |
| abstract syntax and of the transformer are due to Stein and Tutt.
 | |
| 
 | |
| 
 | |
| \section{AST Nodes}
 | |
| 
 | |
| \declaremodule{}{compiler.ast}
 | |
| 
 | |
| The \module{compiler.ast} module is generated from a text file that
 | |
| describes each node type and its elements.  Each node type is
 | |
| represented as a class that inherits from the abstract base class
 | |
| \class{compiler.ast.Node} and defines a set of named attributes for
 | |
| child nodes.
 | |
| 
 | |
| \begin{classdesc}{Node}{}
 | |
|   
 | |
|   The \class{Node} instances are created automatically by the parser
 | |
|   generator.  The recommended interface for specific \class{Node}
 | |
|   instances is to use the public attributes to access child nodes.  A
 | |
|   public attribute may be bound to a single node or to a sequence of
 | |
|   nodes, depending on the \class{Node} type.  For example, the
 | |
|   \member{bases} attribute of the \class{Class} node, is bound to a
 | |
|   list of base class nodes, and the \member{doc} attribute is bound to
 | |
|   a single node.
 | |
|   
 | |
|   Each \class{Node} instance has a \member{lineno} attribute which may
 | |
|   be \code{None}.  XXX Not sure what the rules are for which nodes
 | |
|   will have a useful lineno.
 | |
| \end{classdesc}
 | |
| 
 | |
| All \class{Node} objects offer the following methods:
 | |
| 
 | |
| \begin{methoddesc}{getChildren}{}
 | |
|   Returns a flattened list of the child nodes and objects in the
 | |
|   order they occur.  Specifically, the order of the nodes is the
 | |
|   order in which they appear in the Python grammar.  Not all of the
 | |
|   children are \class{Node} instances.  The names of functions and
 | |
|   classes, for example, are plain strings.
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}{getChildNodes}{}
 | |
|   Returns a flattened list of the child nodes in the order they
 | |
|   occur.  This method is like \method{getChildren()}, except that it
 | |
|   only returns those children that are \class{Node} instances.
 | |
| \end{methoddesc}
 | |
| 
 | |
| Two examples illustrate the general structure of \class{Node}
 | |
| classes.  The \keyword{while} statement is defined by the following
 | |
| grammar production: 
 | |
| 
 | |
| \begin{verbatim}
 | |
| while_stmt:     "while" expression ":" suite
 | |
|                ["else" ":" suite]
 | |
| \end{verbatim}
 | |
| 
 | |
| The \class{While} node has three attributes: \member{test},
 | |
| \member{body}, and \member{else_}.  (If the natural name for an
 | |
| attribute is also a Python reserved word, it can't be used as an
 | |
| attribute name.  An underscore is appended to the word to make it a
 | |
| legal identifier, hence \member{else_} instead of \keyword{else}.)
 | |
| 
 | |
| The \keyword{if} statement is more complicated because it can include
 | |
| several tests.  
 | |
| 
 | |
| \begin{verbatim}
 | |
| if_stmt: 'if' test ':' suite ('elif' test ':' suite)* ['else' ':' suite]
 | |
| \end{verbatim}
 | |
| 
 | |
| The \class{If} node only defines two attributes: \member{tests} and
 | |
| \member{else_}.  The \member{tests} attribute is a sequence of test
 | |
| expression, consequent body pairs.  There is one pair for each
 | |
| \keyword{if}/\keyword{elif} clause.  The first element of the pair is
 | |
| the test expression.  The second elements is a \class{Stmt} node that
 | |
| contains the code to execute if the test is true.
 | |
| 
 | |
| The \method{getChildren()} method of \class{If} returns a flat list of
 | |
| child nodes.  If there are three \keyword{if}/\keyword{elif} clauses
 | |
| and no \keyword{else} clause, then \method{getChildren()} will return
 | |
| a list of six elements: the first test expression, the first
 | |
| \class{Stmt}, the second text expression, etc.
 | |
| 
 | |
| The following table lists each of the \class{Node} subclasses defined
 | |
| in \module{compiler.ast} and each of the public attributes available
 | |
| on their instances.  The values of most of the attributes are
 | |
| themselves \class{Node} instances or sequences of instances.  When the
 | |
| value is something other than an instance, the type is noted in the
 | |
| comment.  The attributes are listed in the order in which they are
 | |
| returned by \method{getChildren()} and \method{getChildNodes()}.
 | |
| 
 | |
| \input{asttable}
 | |
| 
 | |
| 
 | |
| \section{Assignment nodes}
 | |
| 
 | |
| There is a collection of nodes used to represent assignments.  Each
 | |
| assignment statement in the source code becomes a single
 | |
| \class{Assign} node in the AST.  The \member{nodes} attribute is a
 | |
| list that contains a node for each assignment target.  This is
 | |
| necessary because assignment can be chained, e.g. \code{a = b = 2}.
 | |
| Each \class{Node} in the list will be one of the following classes: 
 | |
| \class{AssAttr}, \class{AssList}, \class{AssName}, or
 | |
| \class{AssTuple}. 
 | |
| 
 | |
| XXX Explain what the AssXXX nodes are for.  Mention \code{a.b.c = 2}
 | |
| as an example.  Explain what the flags are for.
 | |
| 
 | |
| 
 | |
| \section{Using Visitors to Walk ASTs}
 | |
| 
 | |
| \declaremodule{}{compiler.visitor}
 | |
| 
 | |
| The visitor pattern is ...  The \refmodule{compiler} package uses a
 | |
| variant on the visitor pattern that takes advantage of Python's
 | |
| introspection features to elminiate the need for much of the visitor's
 | |
| infrastructure.
 | |
| 
 | |
| The classes being visited do not need to be programmed to accept
 | |
| visitors.  The visitor need only define visit methods for classes it
 | |
| is specifically interested in; a default visit method can handle the
 | |
| rest. 
 | |
| 
 | |
| XXX The magic \method{visit()} method for visitors.
 | |
| 
 | |
| \begin{funcdesc}{walk}{tree, visitor\optional{, verbose}}
 | |
| \end{funcdesc}
 | |
| 
 | |
| \begin{classdesc}{ASTVisitor}{}
 | |
| 
 | |
| The \class{ASTVisitor} is responsible for walking over the tree in the
 | |
| correct order.  A walk begins with a call to \method{preorder()}.  For
 | |
| each node, it checks the \var{visitor} argument to \method{preorder()}
 | |
| for a method named `visitNodeType,' where NodeType is the name of the
 | |
| node's class, e.g. for a \class{While} node a \method{visitWhile()}
 | |
| would be called.  If the method exists, it is called with the node as
 | |
| its first argument.
 | |
| 
 | |
| The visitor method for a particular node type can control how child
 | |
| nodes are visited during the walk.  The \class{ASTVisitor} modifies
 | |
| the visitor argument by adding a visit method to the visitor; this
 | |
| method can be used to visit a particular child node.  If no visitor is
 | |
| found for a particular node type, the \method{default()} method is
 | |
| called. 
 | |
| \end{classdesc}
 | |
| 
 | |
| \class{ASTVisitor} objects have the following methods:
 | |
| 
 | |
| XXX describe extra arguments
 | |
| 
 | |
| \begin{methoddesc}{default}{node\optional{, \moreargs}}
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}{dispatch}{node\optional{, \moreargs}}
 | |
| \end{methoddesc}
 | |
| 
 | |
| \begin{methoddesc}{preorder}{tree, visitor}
 | |
| \end{methoddesc}
 | |
| 
 | |
| 
 | |
| \section{Bytecode Generation}
 | |
| 
 | |
| The code generator is a visitor that emits bytecodes.  Each visit method
 | |
| can call the \method{emit()} method to emit a new bytecode.  The basic
 | |
| code generator is specialized for modules, classes, and functions.  An
 | |
| assembler converts that emitted instructions to the low-level bytecode
 | |
| format.  It handles things like generator of constant lists of code
 | |
| objects and calculation of jump offsets.
 | |
| 
 | |
| 
 | |
| \input{compiler.ind}                    % Index
 | |
| 
 | |
| \end{document}
 | 
