1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\section{Built-in Module \sectcode{regex}}
							 | 
						
					
						
							
								
									
										
										
										
											1997-07-17 16:34:52 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\label{module-regex}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\bimodindex{regex}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								This module provides regular expression matching operations similar to
							 | 
						
					
						
							
								
									
										
										
										
											1997-12-09 19:45:47 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								those found in Emacs.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\strong{Obsolescence note:}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								This module is obsolete as of Python version 1.5; it is still being
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								maintained because much existing code still uses it.  All new code in
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								need of regular expressions should use the new
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{re}\refstmodindex{re} module, which supports the more powerful
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								and regular Perl-style regular expressions.  Existing code should be
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								converted.  The standard library module
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{reconvert}\refstmodindex{reconvert} helps in converting
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{regex} style regular expressions to \code{re}\refstmodindex{re}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								style regular expressions.  (For more conversion help, see the URL
							 | 
						
					
						
							
								
									
										
										
										
											1997-12-30 04:53:49 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\file{http://starship.skyport.net/crew/amk/regex/regex-to-re.html}.)
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								By default the patterns are Emacs-style regular expressions
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								(with one exception).  There is
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								a way to change the syntax to match that of several well-known
							 | 
						
					
						
							
								
									
										
										
										
											1995-08-11 00:31:57 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\UNIX{} utilities.  The exception is that Emacs' \samp{\e s}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								pattern is not supported, since the original implementation references
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the Emacs syntax tables.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								This module is 8-bit clean: both patterns and strings may contain null
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								bytes and characters whose high bit is set.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\strong{Please note:} There is a little-known fact about Python string
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								literals which means that you don't usually have to worry about
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								doubling backslashes, even though they are used to escape special
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								characters in string literals as well as in regular expressions.  This
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								is because Python doesn't remove backslashes from string literals if
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								they are followed by an unrecognized escape character.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\emph{However}, if you want to include a literal \dfn{backslash} in a
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								regular expression represented as a string literal, you have to
							 | 
						
					
						
							
								
									
										
										
										
											1997-03-14 04:10:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\emph{quadruple} it or enclose it in a singleton character class.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								E.g.\  to extract \LaTeX\ \samp{\e section\{{\rm
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\ldots}\}} headers from a document, you can use this pattern:
							 | 
						
					
						
							
								
									
										
										
										
											1997-12-30 20:38:16 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\code{'[\e ]section\{\e (.*\e )\}'}.  \emph{Another exception:}
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								the escape sequece \samp{\e b} is significant in string literals
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								(where it means the ASCII bell character) as well as in Emacs regular
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								expressions (where it stands for a word boundary), so in order to
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								search for a word boundary, you should use the pattern \code{'\e \e b'}.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Similarly, a backslash followed by a digit 0-7 should be doubled to
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								avoid interpretation as an octal escape.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\subsection{Regular Expressions}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								A regular expression (or RE) specifies a set of strings that matches
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								it; the functions in this module let you check if a particular string
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								matches a given regular expression (or if a given regular expression
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								matches a particular string, which comes down to the same thing).
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Regular expressions can be concatenated to form new regular
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								expressions; if \emph{A} and \emph{B} are both regular expressions,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								then \emph{AB} is also an regular expression.  If a string \emph{p}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								matches A and another string \emph{q} matches B, the string \emph{pq}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								will match AB.  Thus, complex expressions can easily be constructed
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								from simpler ones like the primitives described here.  For details of
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the theory and implementation of regular expressions, consult almost
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								any textbook about compiler construction.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								% XXX The reference could be made more specific, say to 
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								% "Compilers: Principles, Techniques and Tools", by Alfred V. Aho, 
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								% Ravi Sethi, and Jeffrey D. Ullman, or some FA text.   
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								A brief explanation of the format of regular expressions follows.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Regular expressions can contain both special and ordinary characters.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Ordinary characters, like '\code{A}', '\code{a}', or '\code{0}', are
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the simplest regular expressions; they simply match themselves.  You
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								can concatenate ordinary characters, so '\code{last}' matches the
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								characters 'last'.  (In the rest of this section, we'll write RE's in
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{this special font}, usually without quotes, and strings to be
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								matched 'in single quotes'.)
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Special characters either stand for classes of ordinary characters, or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								affect how the regular expressions around them are interpreted.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The special characters are:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{itemize}
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{.}] (Dot.)  Matches any character except a newline.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\^}] (Caret.)  Matches the start of the string.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\$}] Matches the end of the string.  
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\code{foo} matches both 'foo' and 'foobar', while the regular
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								expression '\code{foo\$}' matches only 'foo'.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{*}] Causes the resulting RE to
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match 0 or more repetitions of the preceding RE.  \code{ab*} will
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match 'a', 'ab', or 'a' followed by any number of 'b's.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{+}] Causes the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								resulting RE to match 1 or more repetitions of the preceding RE.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{ab+} will match 'a' followed by any non-zero number of 'b's; it
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								will not match just 'a'.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{?}] Causes the resulting RE to
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match 0 or 1 repetitions of the preceding RE.  \code{ab?} will
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match either 'a' or 'ab'.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e}] Either escapes special characters (permitting you to match
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								characters like '*?+\&\$'), or signals a special sequence; special
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								sequences are discussed below.  Remember that Python also uses the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								backslash as an escape sequence in string literals; if the escape
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								sequence isn't recognized by Python's parser, the backslash and
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								subsequent character are included in the resulting string.  However,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								if Python would recognize the resulting sequence, the backslash should
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								be repeated twice.  
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{[]}] Used to indicate a set of characters.  Characters can
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								be listed individually, or a range is indicated by giving two
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								characters and separating them by a '-'.  Special characters are
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								not active inside sets.  For example, \code{[akm\$]}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								will match any of the characters 'a', 'k', 'm', or '\$'; \code{[a-z]} will
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match any lowercase letter.  
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								If you want to include a \code{]} inside a
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								set, it must be the first character of the set; to include a \code{-},
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								place it as the first or last character. 
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Characters \emph{not} within a range can be matched by including a
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{\^} as the first character of the set; \code{\^} elsewhere will
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								simply match the '\code{\^}' character.  
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{itemize}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The special sequences consist of '\code{\e}' and a character
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								from the list below.  If the ordinary character is not on the list,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								then the resulting RE will match the second character.  For example,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{\e\$} matches the character '\$'.  Ones where the backslash
							 | 
						
					
						
							
								
									
										
										
										
											1997-12-30 20:38:16 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								should be doubled in string literals are indicated.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{itemize}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e|}]\code{A\e|B}, where A and B can be arbitrary REs,
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								creates a regular expression that will match either A or B.  This can
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								be used inside groups (see below) as well.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e( \e)}] Indicates the start and end of a group; the
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								contents of a group can be matched later in the string with the
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\code{\e [1-9]} special sequence, described next.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								{\fulllineitems\item[\code{\e \e 1, ... \e \e 7, \e 8, \e 9}]
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								Matches the contents of the group of the same
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								number.  For example, \code{\e (.+\e ) \e \e 1} matches 'the the' or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								'55 55', but not 'the end' (note the space after the group).  This
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								special sequence can only be used to match one of the first 9 groups;
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								groups with higher numbers can be matched using the \code{\e v}
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								sequence.  (\code{\e 8} and \code{\e 9} don't need a double backslash
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								because they are not octal digits.)}
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e \e b}] Matches the empty string, but only at the
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								beginning or end of a word.  A word is defined as a sequence of
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								alphanumeric characters, so the end of a word is indicated by
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								whitespace or a non-alphanumeric character.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e B}] Matches the empty string, but when it is \emph{not} at the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								beginning or end of a word.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e v}] Must be followed by a two digit decimal number, and
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								matches the contents of the group of the same number.  The group
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								number must be between 1 and 99, inclusive.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e w}]Matches any alphanumeric character; this is
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								equivalent to the set \code{[a-zA-Z0-9]}.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e W}] Matches any non-alphanumeric character; this is
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								equivalent to the set \code{[\^a-zA-Z0-9]}.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e <}] Matches the empty string, but only at the beginning of a
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								word.  A word is defined as a sequence of alphanumeric characters, so
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the end of a word is indicated by whitespace or a non-alphanumeric 
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								character.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e >}] Matches the empty string, but only at the end of a
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								word.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e \e \e \e}] Matches a literal backslash.
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								% In Emacs, the following two are start of buffer/end of buffer.  In
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								% Python they seem to be synonyms for ^$.
							 | 
						
					
						
							
								
									
										
										
										
											1996-12-13 22:04:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e `}] Like \code{\^}, this only matches at the start of the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								string.
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\item[\code{\e \e '}] Like \code{\$}, this only matches at the end of
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the string.
							 | 
						
					
						
							
								
									
										
										
										
											1996-06-26 19:43:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								% end of buffer
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{itemize}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\subsection{Module Contents}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The module defines these functions, and an exception:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\renewcommand{\indexsubitem}{(in module regex)}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{match}{pattern\, string}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Return how many characters at the beginning of \var{string} match
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  the regular expression \var{pattern}.  Return \code{-1} if the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  string does not match the pattern (this is different from a
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  zero-length match!).
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{search}{pattern\, string}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Return the first position in \var{string} that matches the regular
							 | 
						
					
						
							
								
									
										
										
										
											1996-10-24 22:49:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  expression \var{pattern}.  Return \code{-1} if no position in the string
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  matches the pattern (this is different from a zero-length match
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  anywhere!).
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-08-08 12:30:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{compile}{pattern\optional{\, translate}}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Compile a regular expression pattern into a regular expression
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  object, which can be used for matching using its \code{match()} and
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  \code{search()} methods, described below.  The optional argument
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  \var{translate}, if present, must be a 256-character string
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  indicating how characters (both of the pattern and of the strings to
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  be matched) are translated before comparing them; the \var{i}-th
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  element of the string gives the translation for the character with
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  \ASCII{} code \var{i}.  This can be used to implement
							 | 
						
					
						
							
								
									
										
										
										
											1995-03-17 16:07:09 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  case-insensitive matching; see the \code{casefold} data item below.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  The sequence
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\bcode\begin{verbatim}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								prog = regex.compile(pat)
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								result = prog.match(str)
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{verbatim}\ecode
							 | 
						
					
						
							
								
									
										
										
										
											1997-07-17 16:34:52 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								%
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								is equivalent to
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\bcode\begin{verbatim}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								result = regex.match(pat, str)
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{verbatim}\ecode
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								but the version using \code{compile()} is more efficient when multiple
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								regular expressions are used concurrently in a single program.  (The
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								compiled version of the last pattern passed to \code{regex.match()} or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{regex.search()} is cached, so programs that use only a single
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								regular expression at a time needn't worry about compiling regular
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								expressions.)
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{set_syntax}{flags}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  Set the syntax to be used by future calls to \code{compile()},
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  \code{match()} and \code{search()}.  (Already compiled expression
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  objects are not affected.)  The argument is an integer which is the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  OR of several flag bits.  The return value is the previous value of
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  the syntax flags.  Names for the flags are defined in the standard 
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  module \code{regex_syntax}\refstmodindex{regex_syntax}; read the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  file \file{regex_syntax.py} for more information.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1997-02-18 18:54:30 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{get_syntax}{}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Returns the current value of the syntax flags as an integer.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-08-08 12:30:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{symcomp}{pattern\optional{\, translate}}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								This is like \code{compile()}, but supports symbolic group names: if a
							 | 
						
					
						
							
								
									
										
										
										
											1995-03-07 10:14:09 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								parenthesis-enclosed group begins with a group name in angular
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								brackets, e.g. \code{'\e(<id>[a-z][a-z0-9]*\e)'}, the group can
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								be referenced by its name in arguments to the \code{group()} method of
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								the resulting compiled regular expression object, like this:
							 | 
						
					
						
							
								
									
										
										
										
											1995-02-27 17:52:35 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\code{p.group('id')}.  Group names may contain alphanumeric characters
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								and \code{'_'} only.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{excdesc}{error}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Exception raised when a string passed to one of the functions here
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  is not a valid regular expression (e.g., unmatched parentheses) or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  when some other error occurs during compilation or matching.  (It is
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  never an error if a string contains no match for a pattern.)
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{excdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{casefold}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								A string suitable to pass as the \var{translate} argument to
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{compile()} to map all upper case characters to their lowercase
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								equivalents.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\noindent
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Compiled regular expression objects support these methods:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\renewcommand{\indexsubitem}{(regex method)}
							 | 
						
					
						
							
								
									
										
										
										
											1994-08-08 12:30:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{match}{string\optional{\, pos}}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Return how many characters at the beginning of \var{string} match
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  the compiled regular expression.  Return \code{-1} if the string
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  does not match the pattern (this is different from a zero-length
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  match!).
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  The optional second parameter, \var{pos}, gives an index in the string
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  where the search is to start; it defaults to \code{0}.  This is not
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  completely equivalent to slicing the string; the \code{'\^'} pattern
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  character matches at the real begin of the string and at positions
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  just after a newline, not necessarily at the index where the search
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  is to start.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-08-08 12:30:22 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{search}{string\optional{\, pos}}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  Return the first position in \var{string} that matches the regular
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  expression \code{pattern}.  Return \code{-1} if no position in the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  string matches the pattern (this is different from a zero-length
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  match anywhere!).
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								  The optional second parameter has the same meaning as for the
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								  \code{match()} method.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{funcdesc}{group}{index\, index\, ...}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								This method is only valid when the last call to the \code{match()}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								or \code{search()} method found a match.  It returns one or more
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								groups of the match.  If there is a single \var{index} argument,
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the result is a single string; if there are multiple arguments, the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								result is a tuple with one item per argument.  If the \var{index} is
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								zero, the corresponding return value is the entire matching string; if
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								it is in the inclusive range [1..99], it is the string matching the
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the corresponding parenthesized group (using the default syntax,
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-02 02:50:13 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								groups are parenthesized using \code{{\e}(} and \code{{\e})}).  If no
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								such group exists, the corresponding result is \code{None}.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								If the regular expression was compiled by \code{symcomp()} instead of
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{compile()}, the \var{index} arguments may also be strings
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								identifying groups by their group name.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{funcdesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\noindent
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								Compiled regular expressions support these data attributes:
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\renewcommand{\indexsubitem}{(regex attribute)}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{regs}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								When the last call to the \code{match()} or \code{search()} method found a
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match, this is a tuple of pairs of indexes corresponding to the
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								beginning and end of all parenthesized groups in the pattern.  Indices
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								are relative to the string argument passed to \code{match()} or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{search()}.  The 0-th tuple gives the beginning and end or the
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								whole pattern.  When the last match or search failed, this is
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{None}.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{last}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								When the last call to the \code{match()} or \code{search()} method found a
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								match, this is the string argument passed to that method.  When the
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								last match or search failed, this is \code{None}.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{translate}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								This is the value of the \var{translate} argument to
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\code{regex.compile()} that created this regular expression object.  If
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								the \var{translate} argument was omitted in the \code{regex.compile()}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-02 01:22:07 +00:00
										 
									 
								 
							 | 
							
								
							 | 
							
								
							 | 
							
							
								call, this is \code{None}.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{givenpat}
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								The regular expression pattern as passed to \code{compile()} or
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\code{symcomp()}.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{realpat}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								The regular expression after stripping the group names for regular
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								expressions compiled with \code{symcomp()}.  Same as \code{givenpat}
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								otherwise.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\begin{datadesc}{groupindex}
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								A dictionary giving the mapping from symbolic group names to numerical
							 | 
						
					
						
							
								
									
										
										
										
											1998-01-12 18:28:20 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								group indexes for regular expressions compiled with \code{symcomp()}.
							 | 
						
					
						
							
								
									
										
										
										
											1994-01-03 00:00:31 +00:00
										 
									 
								 
							 | 
							
								
									
										
									
								
							 | 
							
								
							 | 
							
							
								\code{None} otherwise.
							 | 
						
					
						
							| 
								
							 | 
							
								
							 | 
							
								
							 | 
							
							
								\end{datadesc}
							 |