mirror of
				https://github.com/python/cpython.git
				synced 2025-10-31 13:41:24 +00:00 
			
		
		
		
	 895627ff27
			
		
	
	
		895627ff27
		
	
	
	
	
		
			
			svn+ssh://pythondev@svn.python.org/python/trunk ........ r59407 | armin.rigo | 2007-12-07 20:19:55 +0100 (Fri, 07 Dec 2007) | 2 lines This is probably what was meant here. ........ r59410 | guido.van.rossum | 2007-12-08 05:38:23 +0100 (Sat, 08 Dec 2007) | 2 lines Be (just a bit :) more specific about release date. ........ r59411 | alexandre.vassalotti | 2007-12-08 05:49:22 +0100 (Sat, 08 Dec 2007) | 3 lines Fix issue #1530. Return an error exit status if not all tests passes. ........ r59413 | georg.brandl | 2007-12-08 11:56:39 +0100 (Sat, 08 Dec 2007) | 2 lines Fix tpyo. ........ r59414 | georg.brandl | 2007-12-08 12:05:05 +0100 (Sat, 08 Dec 2007) | 2 lines Fix markup in whatsnew, use new directive in ACKS. ........ r59415 | georg.brandl | 2007-12-08 12:05:36 +0100 (Sat, 08 Dec 2007) | 2 lines Fix Eren's name. ........ r59416 | georg.brandl | 2007-12-08 12:23:13 +0100 (Sat, 08 Dec 2007) | 2 lines Add examples to the datetime documentation. Written for GHOP by "h4wk.cz". ........ r59417 | skip.montanaro | 2007-12-08 15:37:43 +0100 (Sat, 08 Dec 2007) | 2 lines Note that open() is the preferred way to open files (issue 1510). ........ r59418 | skip.montanaro | 2007-12-08 16:23:31 +0100 (Sat, 08 Dec 2007) | 1 line + "context manager" ........ r59419 | skip.montanaro | 2007-12-08 16:26:16 +0100 (Sat, 08 Dec 2007) | 1 line correct email address ........ r59420 | skip.montanaro | 2007-12-08 16:33:24 +0100 (Sat, 08 Dec 2007) | 3 lines When splitting, avoid making a copy of the string if the split doesn't find anything (issue 1538). ........
		
			
				
	
	
		
			71 lines
		
	
	
	
		
			2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			71 lines
		
	
	
	
		
			2 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| 
 | |
| :mod:`robotparser` ---  Parser for robots.txt
 | |
| =============================================
 | |
| 
 | |
| .. module:: robotparser
 | |
|    :synopsis: Loads a robots.txt file and answers questions about fetchability of other URLs.
 | |
| .. sectionauthor:: Skip Montanaro <skip@pobox.com>
 | |
| 
 | |
| 
 | |
| .. index::
 | |
|    single: WWW
 | |
|    single: World Wide Web
 | |
|    single: URL
 | |
|    single: robots.txt
 | |
| 
 | |
| This module provides a single class, :class:`RobotFileParser`, which answers
 | |
| questions about whether or not a particular user agent can fetch a URL on the
 | |
| Web site that published the :file:`robots.txt` file.  For more details on  the
 | |
| structure of :file:`robots.txt` files, see
 | |
| http://www.robotstxt.org/wc/norobots.html.
 | |
| 
 | |
| 
 | |
| .. class:: RobotFileParser()
 | |
| 
 | |
|    This class provides a set of methods to read, parse and answer questions about a
 | |
|    single :file:`robots.txt` file.
 | |
| 
 | |
| 
 | |
|    .. method:: RobotFileParser.set_url(url)
 | |
| 
 | |
|       Sets the URL referring to a :file:`robots.txt` file.
 | |
| 
 | |
| 
 | |
|    .. method:: RobotFileParser.read()
 | |
| 
 | |
|       Reads the :file:`robots.txt` URL and feeds it to the parser.
 | |
| 
 | |
| 
 | |
|    .. method:: RobotFileParser.parse(lines)
 | |
| 
 | |
|       Parses the lines argument.
 | |
| 
 | |
| 
 | |
|    .. method:: RobotFileParser.can_fetch(useragent, url)
 | |
| 
 | |
|       Returns ``True`` if the *useragent* is allowed to fetch the *url* according to
 | |
|       the rules contained in the parsed :file:`robots.txt` file.
 | |
| 
 | |
| 
 | |
|    .. method:: RobotFileParser.mtime()
 | |
| 
 | |
|       Returns the time the ``robots.txt`` file was last fetched.  This is useful for
 | |
|       long-running web spiders that need to check for new ``robots.txt`` files
 | |
|       periodically.
 | |
| 
 | |
| 
 | |
|    .. method:: RobotFileParser.modified()
 | |
| 
 | |
|       Sets the time the ``robots.txt`` file was last fetched to the current time.
 | |
| 
 | |
| The following example demonstrates basic use of the RobotFileParser class. ::
 | |
| 
 | |
|    >>> import robotparser
 | |
|    >>> rp = robotparser.RobotFileParser()
 | |
|    >>> rp.set_url("http://www.musi-cal.com/robots.txt")
 | |
|    >>> rp.read()
 | |
|    >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco")
 | |
|    False
 | |
|    >>> rp.can_fetch("*", "http://www.musi-cal.com/")
 | |
|    True
 | |
| 
 |