| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | \section{\module{parser} --- | 
					
						
							|  |  |  |          Access Python parse trees} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | % Copyright 1995 Virginia Polytechnic Institute and State University
 | 
					
						
							|  |  |  | % and Fred L. Drake, Jr.  This copyright notice must be distributed on
 | 
					
						
							|  |  |  | % all copies, but this document otherwise may be distributed as part
 | 
					
						
							|  |  |  | % of the Python distribution.  No fee may be charged for this document
 | 
					
						
							|  |  |  | % in any representation, either on paper or electronically.  This
 | 
					
						
							|  |  |  | % restriction does not affect other elements in a distributed package
 | 
					
						
							|  |  |  | % in any way.
 | 
					
						
							| 
									
										
										
										
											1999-02-19 22:56:08 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | \declaremodule{builtin}{parser} | 
					
						
							| 
									
										
										
										
											1999-02-19 22:56:08 +00:00
										 |  |  | \modulesynopsis{Access parse trees for Python source code.} | 
					
						
							| 
									
										
										
										
											1998-08-10 19:42:37 +00:00
										 |  |  | \moduleauthor{Fred L. Drake, Jr.}{fdrake@acm.org} | 
					
						
							|  |  |  | \sectionauthor{Fred L. Drake, Jr.}{fdrake@acm.org} | 
					
						
							| 
									
										
										
										
											1998-07-23 17:59:49 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | \index{parsing!Python source code} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | The \module{parser} module provides an interface to Python's internal | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | parser and byte-code compiler.  The primary purpose for this interface | 
					
						
							|  |  |  | is to allow Python code to edit the parse tree of a Python expression | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | and create executable code from this.  This is better than trying | 
					
						
							|  |  |  | to parse and modify an arbitrary Python code fragment as a string | 
					
						
							|  |  |  | because parsing is performed in a manner identical to the code | 
					
						
							|  |  |  | forming the application.  It is also faster. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | There are a few things to note about this module which are important | 
					
						
							|  |  |  | to making use of the data structures created.  This is not a tutorial | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | on editing the parse trees for Python code, but some examples of using | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | the \module{parser} module are presented. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Most importantly, a good understanding of the Python grammar processed | 
					
						
							|  |  |  | by the internal parser is required.  For full information on the | 
					
						
							| 
									
										
										
										
											1999-11-10 16:21:37 +00:00
										 |  |  | language syntax, refer to the \citetitle[../ref/ref.html]{Python | 
					
						
							|  |  |  | Language Reference}.  The parser itself is created from a grammar | 
					
						
							|  |  |  | specification defined in the file \file{Grammar/Grammar} in the | 
					
						
							|  |  |  | standard Python distribution.  The parse trees stored in the AST | 
					
						
							|  |  |  | objects created by this module are the actual output from the internal | 
					
						
							|  |  |  | parser when created by the \function{expr()} or \function{suite()} | 
					
						
							|  |  |  | functions, described below.  The AST objects created by | 
					
						
							|  |  |  | \function{sequence2ast()} faithfully simulate those structures.  Be | 
					
						
							|  |  |  | aware that the values of the sequences which are considered | 
					
						
							|  |  |  | ``correct'' will vary from one version of Python to another as the | 
					
						
							|  |  |  | formal grammar for the language is revised.  However, transporting | 
					
						
							|  |  |  | code from one Python version to another as source text will always | 
					
						
							|  |  |  | allow correct parse trees to be created in the target version, with | 
					
						
							|  |  |  | the only restriction being that migrating to an older version of the | 
					
						
							|  |  |  | interpreter will not support more recent language constructs.  The | 
					
						
							|  |  |  | parse trees are not typically compatible from one version to another, | 
					
						
							|  |  |  | whereas source code has always been forward-compatible. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | Each element of the sequences returned by \function{ast2list()} or | 
					
						
							|  |  |  | \function{ast2tuple()} has a simple form.  Sequences representing | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | non-terminal elements in the grammar always have a length greater than | 
					
						
							|  |  |  | one.  The first element is an integer which identifies a production in | 
					
						
							|  |  |  | the grammar.  These integers are given symbolic names in the C header | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | file \file{Include/graminit.h} and the Python module | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | \refmodule{symbol}.  Each additional element of the sequence represents | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | a component of the production as recognized in the input string: these | 
					
						
							|  |  |  | are always sequences which have the same form as the parent.  An | 
					
						
							|  |  |  | important aspect of this structure which should be noted is that | 
					
						
							|  |  |  | keywords used to identify the parent node type, such as the keyword | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | \keyword{if} in an \constant{if_stmt}, are included in the node tree without | 
					
						
							|  |  |  | any special treatment.  For example, the \keyword{if} keyword is | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | represented by the tuple \code{(1, 'if')}, where \code{1} is the | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | numeric value associated with all \constant{NAME} tokens, including | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | variable and function names defined by the user.  In an alternate form | 
					
						
							|  |  |  | returned when line number information is requested, the same token | 
					
						
							|  |  |  | might be represented as \code{(1, 'if', 12)}, where the \code{12} | 
					
						
							|  |  |  | represents the line number at which the terminal symbol was found. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Terminal elements are represented in much the same way, but without | 
					
						
							|  |  |  | any child elements and the addition of the source text which was | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | identified.  The example of the \keyword{if} keyword above is | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | representative.  The various types of terminal symbols are defined in | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | the C header file \file{Include/token.h} and the Python module | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | \refmodule{token}. | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The AST objects are not required to support the functionality of this | 
					
						
							|  |  |  | module, but are provided for three purposes: to allow an application | 
					
						
							|  |  |  | to amortize the cost of processing complex parse trees, to provide a | 
					
						
							|  |  |  | parse tree representation which conserves memory space when compared | 
					
						
							|  |  |  | to the Python list or tuple representation, and to ease the creation | 
					
						
							|  |  |  | of additional modules in C which manipulate parse trees.  A simple | 
					
						
							|  |  |  | ``wrapper'' class may be created in Python to hide the use of AST | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | objects. | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | The \module{parser} module defines functions for a few distinct | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | purposes.  The most important purposes are to create AST objects and | 
					
						
							|  |  |  | to convert AST objects to other representations such as parse trees | 
					
						
							|  |  |  | and compiled code objects, but there are also functions which serve to | 
					
						
							|  |  |  | query the type of parse tree represented by an AST object. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-05-11 15:15:54 +00:00
										 |  |  | \begin{seealso} | 
					
						
							|  |  |  |   \seemodule{symbol}{Useful constants representing internal nodes of | 
					
						
							|  |  |  |                      the parse tree.} | 
					
						
							|  |  |  |   \seemodule{token}{Useful constants representing leaf nodes of the | 
					
						
							|  |  |  |                     parse tree and functions for testing node values.} | 
					
						
							|  |  |  | \end{seealso} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \subsection{Creating AST Objects \label{Creating ASTs}} | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | AST objects may be created from source code or from a parse tree. | 
					
						
							|  |  |  | When creating an AST object from source, different functions are used | 
					
						
							|  |  |  | to create the \code{'eval'} and \code{'exec'} forms. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-09-09 14:16:36 +00:00
										 |  |  | \begin{funcdesc}{expr}{source} | 
					
						
							|  |  |  | The \function{expr()} function parses the parameter \var{source} | 
					
						
							| 
									
										
										
										
											2000-05-09 17:10:23 +00:00
										 |  |  | as if it were an input to \samp{compile(\var{source}, 'file.py', | 
					
						
							|  |  |  | 'eval')}.  If the parse succeeds, an AST object is created to hold the | 
					
						
							|  |  |  | internal parse tree representation, otherwise an appropriate exception | 
					
						
							|  |  |  | is thrown. | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-09-09 14:16:36 +00:00
										 |  |  | \begin{funcdesc}{suite}{source} | 
					
						
							|  |  |  | The \function{suite()} function parses the parameter \var{source} | 
					
						
							| 
									
										
										
										
											2000-05-09 17:10:23 +00:00
										 |  |  | as if it were an input to \samp{compile(\var{source}, 'file.py', | 
					
						
							|  |  |  | 'exec')}.  If the parse succeeds, an AST object is created to hold the | 
					
						
							|  |  |  | internal parse tree representation, otherwise an appropriate exception | 
					
						
							|  |  |  | is thrown. | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{sequence2ast}{sequence} | 
					
						
							|  |  |  | This function accepts a parse tree represented as a sequence and | 
					
						
							|  |  |  | builds an internal representation if possible.  If it can validate | 
					
						
							|  |  |  | that the tree conforms to the Python grammar and all nodes are valid | 
					
						
							|  |  |  | node types in the host version of Python, an AST object is created | 
					
						
							|  |  |  | from the internal representation and returned to the called.  If there | 
					
						
							|  |  |  | is a problem creating the internal representation, or if the tree | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | cannot be validated, a \exception{ParserError} exception is thrown.  An AST | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | object created this way should not be assumed to compile correctly; | 
					
						
							|  |  |  | normal exceptions thrown by compilation may still be initiated when | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | the AST object is passed to \function{compileast()}.  This may indicate | 
					
						
							|  |  |  | problems not related to syntax (such as a \exception{MemoryError} | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | exception), but may also be due to constructs such as the result of | 
					
						
							|  |  |  | parsing \code{del f(0)}, which escapes the Python parser but is | 
					
						
							|  |  |  | checked by the bytecode compiler. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Sequences representing terminal tokens may be represented as either | 
					
						
							|  |  |  | two-element lists of the form \code{(1, 'name')} or as three-element | 
					
						
							|  |  |  | lists of the form \code{(1, 'name', 56)}.  If the third element is | 
					
						
							|  |  |  | present, it is assumed to be a valid line number.  The line number | 
					
						
							|  |  |  | may be specified for any subset of the terminal symbols in the input | 
					
						
							|  |  |  | tree. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{tuple2ast}{sequence} | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | This is the same function as \function{sequence2ast()}.  This entry point | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | is maintained for backward compatibility. | 
					
						
							|  |  |  | \end{funcdesc} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \subsection{Converting AST Objects \label{Converting ASTs}} | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | AST objects, regardless of the input used to create them, may be | 
					
						
							|  |  |  | converted to parse trees represented as list- or tuple- trees, or may | 
					
						
							|  |  |  | be compiled into executable code objects.  Parse trees may be | 
					
						
							|  |  |  | extracted with or without line numbering information. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | \begin{funcdesc}{ast2list}{ast\optional{, line_info}} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | This function accepts an AST object from the caller in | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \var{ast} and returns a Python list representing the | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | equivalent parse tree.  The resulting list representation can be used | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | for inspection or the creation of a new parse tree in list form.  This | 
					
						
							|  |  |  | function does not fail so long as memory is available to build the | 
					
						
							|  |  |  | list representation.  If the parse tree will only be used for | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | inspection, \function{ast2tuple()} should be used instead to reduce memory | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | consumption and fragmentation.  When the list representation is | 
					
						
							|  |  |  | required, this function is significantly faster than retrieving a | 
					
						
							|  |  |  | tuple representation and converting that to nested lists. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | If \var{line_info} is true, line number information will be | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | included for all terminal tokens as a third element of the list | 
					
						
							| 
									
										
										
										
											1996-12-05 22:28:43 +00:00
										 |  |  | representing the token.  Note that the line number provided specifies | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | the line on which the token \emph{ends}.  This information is | 
					
						
							| 
									
										
										
										
											1996-12-05 22:28:43 +00:00
										 |  |  | omitted if the flag is false or omitted. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | \begin{funcdesc}{ast2tuple}{ast\optional{, line_info}} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | This function accepts an AST object from the caller in | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \var{ast} and returns a Python tuple representing the | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | equivalent parse tree.  Other than returning a tuple instead of a | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | list, this function is identical to \function{ast2list()}. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | If \var{line_info} is true, line number information will be | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | included for all terminal tokens as a third element of the list | 
					
						
							|  |  |  | representing the token.  This information is omitted if the flag is | 
					
						
							|  |  |  | false or omitted. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-03-17 06:33:25 +00:00
										 |  |  | \begin{funcdesc}{compileast}{ast\optional{, filename\code{ = '<ast>'}}} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | The Python byte compiler can be invoked on an AST object to produce | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | code objects which can be used as part of an \keyword{exec} statement or | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | a call to the built-in \function{eval()}\bifuncindex{eval} function. | 
					
						
							|  |  |  | This function provides the interface to the compiler, passing the | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | internal parse tree from \var{ast} to the parser, using the | 
					
						
							|  |  |  | source file name specified by the \var{filename} parameter. | 
					
						
							|  |  |  | The default value supplied for \var{filename} indicates that | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | the source was an AST object. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Compiling an AST object may result in exceptions related to | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | compilation; an example would be a \exception{SyntaxError} caused by the | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | parse tree for \code{del f(0)}: this statement is considered legal | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | within the formal grammar for Python but is not a legal language | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | construct.  The \exception{SyntaxError} raised for this condition is | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | actually generated by the Python byte-compiler normally, which is why | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | it can be raised at this point by the \module{parser} module.  Most | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | causes of compilation failure can be diagnosed programmatically by | 
					
						
							|  |  |  | inspection of the parse tree. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \subsection{Queries on AST Objects \label{Querying ASTs}} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | Two functions are provided which allow an application to determine if | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | an AST was created as an expression or a suite.  Neither of these | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | functions can be used to determine if an AST was created from source | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | code via \function{expr()} or \function{suite()} or from a parse tree | 
					
						
							|  |  |  | via \function{sequence2ast()}. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{isexpr}{ast} | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | When \var{ast} represents an \code{'eval'} form, this function | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | returns true, otherwise it returns false.  This is useful, since code | 
					
						
							|  |  |  | objects normally cannot be queried for this information using existing | 
					
						
							|  |  |  | built-in functions.  Note that the code objects created by | 
					
						
							|  |  |  | \function{compileast()} cannot be queried like this either, and are | 
					
						
							|  |  |  | identical to those created by the built-in | 
					
						
							|  |  |  | \function{compile()}\bifuncindex{compile} function. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{funcdesc}{issuite}{ast} | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | This function mirrors \function{isexpr()} in that it reports whether an | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | AST object represents an \code{'exec'} form, commonly known as a | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | ``suite.''  It is not safe to assume that this function is equivalent | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | to \samp{not isexpr(\var{ast})}, as additional syntactic fragments may | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | be supported in the future. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | \end{funcdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \subsection{Exceptions and Error Handling \label{AST Errors}} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | The parser module defines a single exception, but may also pass other | 
					
						
							|  |  |  | built-in exceptions from other portions of the Python runtime | 
					
						
							|  |  |  | environment.  See each function for information about the exceptions | 
					
						
							|  |  |  | it can raise. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{excdesc}{ParserError} | 
					
						
							|  |  |  | Exception raised when a failure occurs within the parser module.  This | 
					
						
							|  |  |  | is generally produced for validation failures rather than the built in | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | \exception{SyntaxError} thrown during normal parsing. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | The exception argument is either a string describing the reason of the | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | failure or a tuple containing a sequence causing the failure from a parse | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | tree passed to \function{sequence2ast()} and an explanatory string.  Calls to | 
					
						
							|  |  |  | \function{sequence2ast()} need to be able to handle either type of exception, | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | while calls to other functions in the module will only need to be | 
					
						
							|  |  |  | aware of the simple string values. | 
					
						
							|  |  |  | \end{excdesc} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | Note that the functions \function{compileast()}, \function{expr()}, and | 
					
						
							|  |  |  | \function{suite()} may throw exceptions which are normally thrown by the | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | parsing and compilation process.  These include the built in | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | exceptions \exception{MemoryError}, \exception{OverflowError}, | 
					
						
							|  |  |  | \exception{SyntaxError}, and \exception{SystemError}.  In these cases, these | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | exceptions carry all the meaning normally associated with them.  Refer | 
					
						
							|  |  |  | to the descriptions of each function for detailed information. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \subsection{AST Objects \label{AST Objects}} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-05 20:23:02 +00:00
										 |  |  | Ordered and equality comparisons are supported between AST objects. | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | Pickling of AST objects (using the \refmodule{pickle} module) is also | 
					
						
							| 
									
										
										
										
											1998-04-13 16:27:27 +00:00
										 |  |  | supported. | 
					
						
							| 
									
										
										
										
											1998-04-05 20:23:02 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-03-08 06:47:24 +00:00
										 |  |  | \begin{datadesc}{ASTType} | 
					
						
							|  |  |  | The type of the objects returned by \function{expr()}, | 
					
						
							|  |  |  | \function{suite()} and \function{sequence2ast()}. | 
					
						
							|  |  |  | \end{datadesc} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-13 18:46:16 +00:00
										 |  |  | AST objects have the following methods: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[AST]{compile}{\optional{filename}} | 
					
						
							|  |  |  | Same as \code{compileast(\var{ast}, \var{filename})}. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[AST]{isexpr}{} | 
					
						
							|  |  |  | Same as \code{isexpr(\var{ast})}. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[AST]{issuite}{} | 
					
						
							|  |  |  | Same as \code{issuite(\var{ast})}. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[AST]{tolist}{\optional{line_info}} | 
					
						
							|  |  |  | Same as \code{ast2list(\var{ast}, \var{line_info})}. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | \begin{methoddesc}[AST]{totuple}{\optional{line_info}} | 
					
						
							|  |  |  | Same as \code{ast2tuple(\var{ast}, \var{line_info})}. | 
					
						
							|  |  |  | \end{methoddesc} | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | \subsection{Examples \label{AST Examples}} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | The parser modules allows operations to be performed on the parse tree | 
					
						
							|  |  |  | of Python source code before the bytecode is generated, and provides | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | for inspection of the parse tree for information gathering purposes. | 
					
						
							|  |  |  | Two examples are presented.  The simple example demonstrates emulation | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | of the \function{compile()}\bifuncindex{compile} built-in function and | 
					
						
							|  |  |  | the complex example shows the use of a parse tree for information | 
					
						
							|  |  |  | discovery. | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-04-05 20:23:02 +00:00
										 |  |  | \subsubsection{Emulation of \function{compile()}} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | While many useful operations may take place between parsing and | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | bytecode generation, the simplest operation is to do nothing.  For | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | this purpose, using the \module{parser} module to produce an | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | intermediate data structure is equivalent to the code | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											2000-05-09 17:10:23 +00:00
										 |  |  | >>> code = compile('a + 5', 'file.py', 'eval') | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | >>> a = 5 | 
					
						
							|  |  |  | >>> eval(code) | 
					
						
							|  |  |  | 10 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | The equivalent operation using the \module{parser} module is somewhat | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | longer, and allows the intermediate internal parse tree to be retained | 
					
						
							|  |  |  | as an AST object: | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | >>> import parser | 
					
						
							|  |  |  | >>> ast = parser.expr('a + 5') | 
					
						
							| 
									
										
										
										
											2000-05-09 17:10:23 +00:00
										 |  |  | >>> code = ast.compile('file.py') | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | >>> a = 5 | 
					
						
							|  |  |  | >>> eval(code) | 
					
						
							|  |  |  | 10 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | An application which needs both AST and code objects can package this | 
					
						
							|  |  |  | code into readily available functions: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | import parser | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | def load_suite(source_string): | 
					
						
							|  |  |  |     ast = parser.suite(source_string) | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  |     return ast, ast.compile() | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | def load_expression(source_string): | 
					
						
							|  |  |  |     ast = parser.expr(source_string) | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  |     return ast, ast.compile() | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | \subsubsection{Information Discovery} | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | Some applications benefit from direct access to the parse tree.  The | 
					
						
							|  |  |  | remainder of this section demonstrates how the parse tree provides | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | access to module documentation defined in | 
					
						
							|  |  |  | docstrings\index{string!documentation}\index{docstrings} without | 
					
						
							|  |  |  | requiring that the code being examined be loaded into a running | 
					
						
							|  |  |  | interpreter via \keyword{import}.  This can be very useful for | 
					
						
							|  |  |  | performing analyses of untrusted code. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Generally, the example will demonstrate how the parse tree may be | 
					
						
							|  |  |  | traversed to distill interesting information.  Two functions and a set | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | of classes are developed which provide programmatic access to high | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | level function and class definitions provided by a module.  The | 
					
						
							|  |  |  | classes extract information from the parse tree and provide access to | 
					
						
							|  |  |  | the information at a useful semantic level, one function provides a | 
					
						
							|  |  |  | simple low-level pattern matching capability, and the other function | 
					
						
							|  |  |  | defines a high-level interface to the classes by handling file | 
					
						
							|  |  |  | operations on behalf of the caller.  All source files mentioned here | 
					
						
							|  |  |  | which are not part of the Python installation are located in the | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | \file{Demo/parser/} directory of the distribution. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | The dynamic nature of Python allows the programmer a great deal of | 
					
						
							|  |  |  | flexibility, but most modules need only a limited measure of this when | 
					
						
							|  |  |  | defining classes, functions, and methods.  In this example, the only | 
					
						
							|  |  |  | definitions that will be considered are those which are defined in the | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | top level of their context, e.g., a function defined by a \keyword{def} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | statement at column zero of a module, but not a function defined | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | within a branch of an \keyword{if} ... \keyword{else} construct, though | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | there are some good reasons for doing so in some situations.  Nesting | 
					
						
							|  |  |  | of definitions will be handled by the code developed in the example. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | To construct the upper-level extraction methods, we need to know what | 
					
						
							|  |  |  | the parse tree structure looks like and how much of it we actually | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | need to be concerned about.  Python uses a moderately deep parse tree | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | so there are a large number of intermediate nodes.  It is important to | 
					
						
							|  |  |  | read and understand the formal grammar used by Python.  This is | 
					
						
							|  |  |  | specified in the file \file{Grammar/Grammar} in the distribution. | 
					
						
							|  |  |  | Consider the simplest case of interest when searching for docstrings: | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | a module consisting of a docstring and nothing else.  (See file | 
					
						
							|  |  |  | \file{docstring.py}.) | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | """Some documentation. | 
					
						
							|  |  |  | """ | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | Using the interpreter to take a look at the parse tree, we find a | 
					
						
							|  |  |  | bewildering mass of numbers and parentheses, with the documentation | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | buried deep in nested tuples. | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | >>> import parser | 
					
						
							|  |  |  | >>> import pprint | 
					
						
							|  |  |  | >>> ast = parser.suite(open('docstring.py').read()) | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | >>> tup = ast.totuple() | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | >>> pprint.pprint(tup) | 
					
						
							|  |  |  | (257, | 
					
						
							|  |  |  |  (264, | 
					
						
							|  |  |  |   (265, | 
					
						
							|  |  |  |    (266, | 
					
						
							|  |  |  |     (267, | 
					
						
							|  |  |  |      (307, | 
					
						
							|  |  |  |       (287, | 
					
						
							|  |  |  |        (288, | 
					
						
							|  |  |  |         (289, | 
					
						
							|  |  |  |          (290, | 
					
						
							|  |  |  |           (292, | 
					
						
							|  |  |  |            (293, | 
					
						
							|  |  |  |             (294, | 
					
						
							|  |  |  |              (295, | 
					
						
							|  |  |  |               (296, | 
					
						
							|  |  |  |                (297, | 
					
						
							|  |  |  |                 (298, | 
					
						
							|  |  |  |                  (299, | 
					
						
							| 
									
										
										
										
											2001-01-24 17:19:08 +00:00
										 |  |  |                   (300, (3, '"""Some documentation.\n"""'))))))))))))))))), | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  |    (4, ''))), | 
					
						
							|  |  |  |  (4, ''), | 
					
						
							|  |  |  |  (0, '')) | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | The numbers at the first element of each node in the tree are the node | 
					
						
							|  |  |  | types; they map directly to terminal and non-terminal symbols in the | 
					
						
							|  |  |  | grammar.  Unfortunately, they are represented as integers in the | 
					
						
							|  |  |  | internal representation, and the Python structures generated do not | 
					
						
							| 
									
										
										
										
											1999-04-22 21:23:22 +00:00
										 |  |  | change that.  However, the \refmodule{symbol} and \refmodule{token} modules | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | provide symbolic names for the node types and dictionaries which map | 
					
						
							|  |  |  | from the integers to the symbolic names for the node types. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In the output presented above, the outermost tuple contains four | 
					
						
							|  |  |  | elements: the integer \code{257} and three additional tuples.  Node | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | type \code{257} has the symbolic name \constant{file_input}.  Each of | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | these inner tuples contains an integer as the first element; these | 
					
						
							|  |  |  | integers, \code{264}, \code{4}, and \code{0}, represent the node types | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | \constant{stmt}, \constant{NEWLINE}, and \constant{ENDMARKER}, | 
					
						
							|  |  |  | respectively. | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | Note that these values may change depending on the version of Python | 
					
						
							|  |  |  | you are using; consult \file{symbol.py} and \file{token.py} for | 
					
						
							|  |  |  | details of the mapping.  It should be fairly clear that the outermost | 
					
						
							|  |  |  | node is related primarily to the input source rather than the contents | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | of the file, and may be disregarded for the moment.  The \constant{stmt} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | node is much more interesting.  In particular, all docstrings are | 
					
						
							|  |  |  | found in subtrees which are formed exactly as this node is formed, | 
					
						
							|  |  |  | with the only difference being the string itself.  The association | 
					
						
							|  |  |  | between the docstring in a similar tree and the defined entity (class, | 
					
						
							|  |  |  | function, or module) which it describes is given by the position of | 
					
						
							|  |  |  | the docstring subtree within the tree defining the described | 
					
						
							|  |  |  | structure. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | By replacing the actual docstring with something to signify a variable | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | component of the tree, we allow a simple pattern matching approach to | 
					
						
							| 
									
										
										
										
											2000-04-03 20:13:55 +00:00
										 |  |  | check any given subtree for equivalence to the general pattern for | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | docstrings.  Since the example demonstrates information extraction, we | 
					
						
							|  |  |  | can safely require that the tree be in tuple form rather than list | 
					
						
							|  |  |  | form, allowing a simple variable representation to be | 
					
						
							|  |  |  | \code{['variable_name']}.  A simple recursive function can implement | 
					
						
							| 
									
										
										
										
											2001-10-01 17:04:10 +00:00
										 |  |  | the pattern matching, returning a Boolean and a dictionary of variable | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | name to value mappings.  (See file \file{example.py}.) | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | from types import ListType, TupleType | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | def match(pattern, data, vars=None): | 
					
						
							|  |  |  |     if vars is None: | 
					
						
							|  |  |  |         vars = {} | 
					
						
							|  |  |  |     if type(pattern) is ListType: | 
					
						
							|  |  |  |         vars[pattern[0]] = data | 
					
						
							|  |  |  |         return 1, vars | 
					
						
							|  |  |  |     if type(pattern) is not TupleType: | 
					
						
							|  |  |  |         return (pattern == data), vars | 
					
						
							|  |  |  |     if len(data) != len(pattern): | 
					
						
							|  |  |  |         return 0, vars | 
					
						
							|  |  |  |     for pattern, data in map(None, pattern, data): | 
					
						
							|  |  |  |         same, vars = match(pattern, data, vars) | 
					
						
							|  |  |  |         if not same: | 
					
						
							|  |  |  |             break | 
					
						
							|  |  |  |     return same, vars | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | Using this simple representation for syntactic variables and the symbolic | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | node types, the pattern for the candidate docstring subtrees becomes | 
					
						
							|  |  |  | fairly readable.  (See file \file{example.py}.) | 
					
						
							| 
									
										
										
										
											1995-10-11 17:30:04 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | import symbol | 
					
						
							|  |  |  | import token | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | DOCSTRING_STMT_PATTERN = ( | 
					
						
							|  |  |  |     symbol.stmt, | 
					
						
							|  |  |  |     (symbol.simple_stmt, | 
					
						
							|  |  |  |      (symbol.small_stmt, | 
					
						
							|  |  |  |       (symbol.expr_stmt, | 
					
						
							|  |  |  |        (symbol.testlist, | 
					
						
							|  |  |  |         (symbol.test, | 
					
						
							|  |  |  |          (symbol.and_test, | 
					
						
							|  |  |  |           (symbol.not_test, | 
					
						
							|  |  |  |            (symbol.comparison, | 
					
						
							|  |  |  |             (symbol.expr, | 
					
						
							|  |  |  |              (symbol.xor_expr, | 
					
						
							|  |  |  |               (symbol.and_expr, | 
					
						
							|  |  |  |                (symbol.shift_expr, | 
					
						
							|  |  |  |                 (symbol.arith_expr, | 
					
						
							|  |  |  |                  (symbol.term, | 
					
						
							|  |  |  |                   (symbol.factor, | 
					
						
							|  |  |  |                    (symbol.power, | 
					
						
							|  |  |  |                     (symbol.atom, | 
					
						
							|  |  |  |                      (token.STRING, ['docstring']) | 
					
						
							|  |  |  |                      )))))))))))))))), | 
					
						
							|  |  |  |      (token.NEWLINE, '') | 
					
						
							|  |  |  |      )) | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | Using the \function{match()} function with this pattern, extracting the | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | module docstring from the parse tree created previously is easy: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | >>> found, vars = match(DOCSTRING_STMT_PATTERN, tup[1]) | 
					
						
							|  |  |  | >>> found | 
					
						
							|  |  |  | 1 | 
					
						
							|  |  |  | >>> vars | 
					
						
							| 
									
										
										
										
											2001-01-24 17:19:08 +00:00
										 |  |  | {'docstring': '"""Some documentation.\n"""'} | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | Once specific data can be extracted from a location where it is | 
					
						
							|  |  |  | expected, the question of where information can be expected | 
					
						
							|  |  |  | needs to be answered.  When dealing with docstrings, the answer is | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | fairly simple: the docstring is the first \constant{stmt} node in a code | 
					
						
							|  |  |  | block (\constant{file_input} or \constant{suite} node types).  A module | 
					
						
							|  |  |  | consists of a single \constant{file_input} node, and class and function | 
					
						
							|  |  |  | definitions each contain exactly one \constant{suite} node.  Classes and | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | functions are readily identified as subtrees of code block nodes which | 
					
						
							|  |  |  | start with \code{(stmt, (compound_stmt, (classdef, ...} or | 
					
						
							|  |  |  | \code{(stmt, (compound_stmt, (funcdef, ...}.  Note that these subtrees | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | cannot be matched by \function{match()} since it does not support multiple | 
					
						
							| 
									
										
										
										
											1996-08-21 14:32:37 +00:00
										 |  |  | sibling nodes to match without regard to number.  A more elaborate | 
					
						
							|  |  |  | matching function could be used to overcome this limitation, but this | 
					
						
							|  |  |  | is sufficient for the example. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | Given the ability to determine whether a statement might be a | 
					
						
							|  |  |  | docstring and extract the actual string from the statement, some work | 
					
						
							|  |  |  | needs to be performed to walk the parse tree for an entire module and | 
					
						
							|  |  |  | extract information about the names defined in each context of the | 
					
						
							|  |  |  | module and associate any docstrings with the names.  The code to | 
					
						
							|  |  |  | perform this work is not complicated, but bears some explanation. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The public interface to the classes is straightforward and should | 
					
						
							|  |  |  | probably be somewhat more flexible.  Each ``major'' block of the | 
					
						
							|  |  |  | module is described by an object providing several methods for inquiry | 
					
						
							|  |  |  | and a constructor which accepts at least the subtree of the complete | 
					
						
							| 
									
										
										
										
											1998-02-18 15:59:13 +00:00
										 |  |  | parse tree which it represents.  The \class{ModuleInfo} constructor | 
					
						
							|  |  |  | accepts an optional \var{name} parameter since it cannot | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | otherwise determine the name of the module. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | The public classes include \class{ClassInfo}, \class{FunctionInfo}, | 
					
						
							|  |  |  | and \class{ModuleInfo}.  All objects provide the | 
					
						
							|  |  |  | methods \method{get_name()}, \method{get_docstring()}, | 
					
						
							|  |  |  | \method{get_class_names()}, and \method{get_class_info()}.  The | 
					
						
							|  |  |  | \class{ClassInfo} objects support \method{get_method_names()} and | 
					
						
							|  |  |  | \method{get_method_info()} while the other classes provide | 
					
						
							|  |  |  | \method{get_function_names()} and \method{get_function_info()}. | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  | Within each of the forms of code block that the public classes | 
					
						
							|  |  |  | represent, most of the required information is in the same form and is | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | accessed in the same way, with classes having the distinction that | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | functions defined at the top level are referred to as ``methods.'' | 
					
						
							|  |  |  | Since the difference in nomenclature reflects a real semantic | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | distinction from functions defined outside of a class, the | 
					
						
							|  |  |  | implementation needs to maintain the distinction. | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | Hence, most of the functionality of the public classes can be | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | implemented in a common base class, \class{SuiteInfoBase}, with the | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | accessors for function and method information provided elsewhere. | 
					
						
							|  |  |  | Note that there is only one class which represents function and method | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | information; this parallels the use of the \keyword{def} statement to | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | define both types of elements. | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | Most of the accessor functions are declared in \class{SuiteInfoBase} | 
					
						
							| 
									
										
										
										
											2000-07-16 19:01:10 +00:00
										 |  |  | and do not need to be overridden by subclasses.  More importantly, the | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | extraction of most information from a parse tree is handled through a | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | method called by the \class{SuiteInfoBase} constructor.  The example | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | code for most of the classes is clear when read alongside the formal | 
					
						
							|  |  |  | grammar, but the method which recursively creates new information | 
					
						
							|  |  |  | objects requires further examination.  Here is the relevant part of | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | the \class{SuiteInfoBase} definition from \file{example.py}: | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | class SuiteInfoBase: | 
					
						
							|  |  |  |     _docstring = '' | 
					
						
							|  |  |  |     _name = '' | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def __init__(self, tree = None): | 
					
						
							|  |  |  |         self._class_info = {} | 
					
						
							|  |  |  |         self._function_info = {} | 
					
						
							|  |  |  |         if tree: | 
					
						
							|  |  |  |             self._extract_info(tree) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def _extract_info(self, tree): | 
					
						
							|  |  |  |         # extract docstring | 
					
						
							|  |  |  |         if len(tree) == 2: | 
					
						
							|  |  |  |             found, vars = match(DOCSTRING_STMT_PATTERN[1], tree[1]) | 
					
						
							|  |  |  |         else: | 
					
						
							|  |  |  |             found, vars = match(DOCSTRING_STMT_PATTERN, tree[3]) | 
					
						
							|  |  |  |         if found: | 
					
						
							|  |  |  |             self._docstring = eval(vars['docstring']) | 
					
						
							|  |  |  |         # discover inner definitions | 
					
						
							|  |  |  |         for node in tree[1:]: | 
					
						
							|  |  |  |             found, vars = match(COMPOUND_STMT_PATTERN, node) | 
					
						
							|  |  |  |             if found: | 
					
						
							|  |  |  |                 cstmt = vars['compound'] | 
					
						
							|  |  |  |                 if cstmt[0] == symbol.funcdef: | 
					
						
							|  |  |  |                     name = cstmt[2][1] | 
					
						
							|  |  |  |                     self._function_info[name] = FunctionInfo(cstmt) | 
					
						
							|  |  |  |                 elif cstmt[0] == symbol.classdef: | 
					
						
							|  |  |  |                     name = cstmt[2][1] | 
					
						
							|  |  |  |                     self._class_info[name] = ClassInfo(cstmt) | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | After initializing some internal state, the constructor calls the | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | \method{_extract_info()} method.  This method performs the bulk of the | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | information extraction which takes place in the entire example.  The | 
					
						
							|  |  |  | extraction has two distinct phases: the location of the docstring for | 
					
						
							|  |  |  | the parse tree passed in, and the discovery of additional definitions | 
					
						
							|  |  |  | within the code block represented by the parse tree. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | The initial \keyword{if} test determines whether the nested suite is of | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | the ``short form'' or the ``long form.''  The short form is used when | 
					
						
							|  |  |  | the code block is on the same line as the definition of the code | 
					
						
							|  |  |  | block, as in | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-01-09 22:24:14 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | def square(x): "Square an argument."; return x ** 2 | 
					
						
							| 
									
										
										
										
											1998-01-09 22:24:14 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | while the long form uses an indented block and allows nested | 
					
						
							|  |  |  | definitions: | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | def make_power(exp): | 
					
						
							|  |  |  |     "Make a function that raises an argument to the exponent `exp'." | 
					
						
							|  |  |  |     def raiser(x, y=exp): | 
					
						
							|  |  |  |         return x ** y | 
					
						
							|  |  |  |     return raiser | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | When the short form is used, the code block may contain a docstring as | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | the first, and possibly only, \constant{small_stmt} element.  The | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | extraction of such a docstring is slightly different and requires only | 
					
						
							|  |  |  | a portion of the complete pattern used in the more common case.  As | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | implemented, the docstring will only be found if there is only | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | one \constant{small_stmt} node in the \constant{simple_stmt} node. | 
					
						
							|  |  |  | Since most functions and methods which use the short form do not | 
					
						
							|  |  |  | provide a docstring, this may be considered sufficient.  The | 
					
						
							|  |  |  | extraction of the docstring proceeds using the \function{match()} function | 
					
						
							|  |  |  | as described above, and the value of the docstring is stored as an | 
					
						
							|  |  |  | attribute of the \class{SuiteInfoBase} object. | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | After docstring extraction, a simple definition discovery | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | algorithm operates on the \constant{stmt} nodes of the | 
					
						
							|  |  |  | \constant{suite} node.  The special case of the short form is not | 
					
						
							|  |  |  | tested; since there are no \constant{stmt} nodes in the short form, | 
					
						
							|  |  |  | the algorithm will silently skip the single \constant{simple_stmt} | 
					
						
							|  |  |  | node and correctly not discover any nested definitions. | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | Each statement in the code block is categorized as | 
					
						
							|  |  |  | a class definition, function or method definition, or | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | something else.  For the definition statements, the name of the | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | element defined is extracted and a representation object | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | appropriate to the definition is created with the defining subtree | 
					
						
							| 
									
										
										
										
											2000-07-16 19:01:10 +00:00
										 |  |  | passed as an argument to the constructor.  The representation objects | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | are stored in instance variables and may be retrieved by name using | 
					
						
							|  |  |  | the appropriate accessor methods. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The public classes provide any accessors required which are more | 
					
						
							| 
									
										
										
										
											1998-02-09 20:52:48 +00:00
										 |  |  | specific than those provided by the \class{SuiteInfoBase} class, but | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | the real extraction algorithm remains common to all forms of code | 
					
						
							|  |  |  | blocks.  A high-level function can be used to extract the complete set | 
					
						
							| 
									
										
										
										
											1996-09-11 21:57:40 +00:00
										 |  |  | of information from a source file.  (See file \file{example.py}.) | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \begin{verbatim} | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | def get_docs(fileName): | 
					
						
							|  |  |  |     import os | 
					
						
							|  |  |  |     import parser | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  | 
 | 
					
						
							|  |  |  |     source = open(fileName).read() | 
					
						
							|  |  |  |     basename = os.path.basename(os.path.splitext(fileName)[0]) | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  |     ast = parser.suite(source) | 
					
						
							| 
									
										
										
										
											1999-08-02 14:30:52 +00:00
										 |  |  |     return ModuleInfo(ast.totuple(), basename) | 
					
						
							| 
									
										
										
										
											1998-02-13 06:58:54 +00:00
										 |  |  | \end{verbatim} | 
					
						
							| 
									
										
										
										
											1998-04-03 05:31:45 +00:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											1996-08-26 00:33:29 +00:00
										 |  |  | This provides an easy-to-use interface to the documentation of a | 
					
						
							|  |  |  | module.  If information is required which is not extracted by the code | 
					
						
							|  |  |  | of this example, the code may be extended at clearly defined points to | 
					
						
							|  |  |  | provide additional capabilities. |