Michael Lazar 
								
							 
						 
						
							
							
							
							
								
							
							
								bd08a0af2d 
								
							 
						 
						
							
							
								
								bpo-32861: urllib.robotparser fix incomplete __str__ methods. (GH-5711)  
							
							... 
							
							
							
							The urllib.robotparser's __str__ representation now includes wildcard
entries and the "Crawl-delay" and "Request-rate" fields. Also removes extra
newlines that were being appended to the end of the string. 
							
						 
						
							2018-05-14 17:10:41 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Berker Peksag 
								
							 
						 
						
							
							
							
							
								
							
							
								3df02dbc8e 
								
							 
						 
						
							
							
								
								bpo-31325: Fix usage of namedtuple in RobotFileParser.parse() ( #4529 )  
							
							
							
						 
						
							2017-11-23 15:40:26 -08:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Berker Peksag 
								
							 
						 
						
							
							
							
							
								
							
							
								9a7bbb2e3f 
								
							 
						 
						
							
							
								
								Issue  #25400 : RobotFileParser now correctly returns default values for crawl_delay and request_rate  
							
							... 
							
							
							
							Initial patch by Peter Wirtz. 
							
						 
						
							2016-09-18 20:17:58 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Panter 
								
							 
						 
						
							
							
							
							
								
							
							
								1ce738e08f 
								
							 
						 
						
							
							
								
								Merge typo fixes from 3.5  
							
							
							
						 
						
							2016-05-08 14:02:35 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Panter 
								
							 
						 
						
							
							
							
							
								
							
							
								f0564164ba 
								
							 
						 
						
							
							
								
								Fix typos in comments, documentation and test method names  
							
							
							
						 
						
							2016-05-08 13:48:10 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Berker Peksag 
								
							 
						 
						
							
							
							
							
								
							
							
								960e848f0d 
								
							 
						 
						
							
							
								
								Issue  #16099 : RobotFileParser now supports Crawl-delay and Request-rate  
							
							... 
							
							
							
							extensions.
Patch by Nikolay Bogoychev. 
							
						 
						
							2015-10-08 12:27:06 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Raymond Hettinger 
								
							 
						 
						
							
							
							
							
								
							
							
								38acd4c028 
								
							 
						 
						
							
							
								
								Issue 21469:  Minor code modernization (convert and/or expression to an if/else expression).  
							
							... 
							
							
							
							Suggested by: Tal Einat 
							
						 
						
							2014-05-12 22:22:46 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Raymond Hettinger 
								
							 
						 
						
							
							
							
							
								
							
							
								122541bece 
								
							 
						 
						
							
							
								
								Issue 21469:  Mitigate risk of false positives with robotparser.  
							
							... 
							
							
							
							* Repair the broken link to norobots-rfc.txt.
* HTTP response codes >= 500 treated as a failed read rather than as a not
found.  Not found means that we can assume the entire site is allowed.  A 5xx
server error tells us nothing.
* A successful read() or parse() updates the mtime (which is defined to be "the
  time the robots.txt file was last fetched").
* The can_fetch() method returns False unless we've had a read() with a 2xx or
4xx response.  This avoids false positives in the case where a user calls
can_fetch() before calling read().
* I don't see any easy way to test this patch without hitting internet
resources that might change or without use of mock objects that wouldn't
provide must reassurance. 
							
						 
						
							2014-05-12 21:56:33 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Senthil Kumaran 
								
							 
						 
						
							
							
							
							
								
							
							
								c70a6ae49b 
								
							 
						 
						
							
							
								
								#17403 : urllib.parse.robotparser normalizes the urls before adding to ruleline.  
							
							... 
							
							
							
							This helps in handling certain types invalid urls in a conservative manner. 
							
						 
						
							2013-05-29 05:54:31 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Georg Brandl 
								
							 
						 
						
							
							
							
							
								
							
							
								0a0fc07d37 
								
							 
						 
						
							
							
								
								#4108 : the first default entry (User-agent: *) wins.  
							
							
							
						 
						
							2010-07-29 17:55:01 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Senthil Kumaran 
								
							 
						 
						
							
							
							
							
								
							
							
								3f8ab965f7 
								
							 
						 
						
							
							
								
								Fix Issue6325 - robotparse to honor urls with query strings.  
							
							
							
						 
						
							2010-07-28 16:27:56 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Benjamin Peterson 
								
							 
						 
						
							
							
							
							
								
							
							
								d63137159b 
								
							 
						 
						
							
							
								
								Merged revisions 65209-65216,65225-65226,65233,65239,65246-65247,65255-65256 via svnmerge from  
							
							... 
							
							
							
							svn+ssh://pythondev@svn.python.org/python/trunk
........
  r65209 | raymond.hettinger | 2008-07-23 19:08:18 -0500 (Wed, 23 Jul 2008) | 1 line
  Finish-up the partial conversion from int to Py_ssize_t for deque indices and length.
........
  r65210 | raymond.hettinger | 2008-07-23 19:53:49 -0500 (Wed, 23 Jul 2008) | 1 line
  Parse to the correct datatype.
........
  r65211 | benjamin.peterson | 2008-07-23 21:27:46 -0500 (Wed, 23 Jul 2008) | 1 line
  fix spacing
........
  r65212 | benjamin.peterson | 2008-07-23 21:31:28 -0500 (Wed, 23 Jul 2008) | 1 line
  fix markup
........
  r65213 | benjamin.peterson | 2008-07-23 21:45:37 -0500 (Wed, 23 Jul 2008) | 1 line
  add some documentation for 2to3
........
  r65214 | raymond.hettinger | 2008-07-24 00:38:48 -0500 (Thu, 24 Jul 2008) | 1 line
  Finish conversion from int to Py_ssize_t.
........
  r65215 | raymond.hettinger | 2008-07-24 02:04:55 -0500 (Thu, 24 Jul 2008) | 1 line
  Convert from long to Py_ssize_t.
........
  r65216 | georg.brandl | 2008-07-24 02:09:21 -0500 (Thu, 24 Jul 2008) | 2 lines
  Fix indentation.
........
  r65225 | benjamin.peterson | 2008-07-25 11:55:37 -0500 (Fri, 25 Jul 2008) | 1 line
  teach .bzrignore about doc tools
........
  r65226 | benjamin.peterson | 2008-07-25 12:02:11 -0500 (Fri, 25 Jul 2008) | 1 line
  document default value for fillvalue
........
  r65233 | raymond.hettinger | 2008-07-25 13:43:33 -0500 (Fri, 25 Jul 2008) | 1 line
  Issue 1592:  Better error reporting for operations on closed shelves.
........
  r65239 | benjamin.peterson | 2008-07-25 16:59:53 -0500 (Fri, 25 Jul 2008) | 1 line
  fix indentation
........
  r65246 | andrew.kuchling | 2008-07-26 08:08:19 -0500 (Sat, 26 Jul 2008) | 1 line
  This sentence continues to bug me; rewrite it for the second time
........
  r65247 | andrew.kuchling | 2008-07-26 08:09:06 -0500 (Sat, 26 Jul 2008) | 1 line
  Remove extra words
........
  r65255 | skip.montanaro | 2008-07-26 19:49:02 -0500 (Sat, 26 Jul 2008) | 3 lines
  Close issue 3437 - missing state change when Allow lines are processed.
  Adds test cases which use Allow: as well.
........
  r65256 | skip.montanaro | 2008-07-26 19:50:41 -0500 (Sat, 26 Jul 2008) | 2 lines
  note robotparser bug fix.
........ 
							
						 
						
							2008-07-31 16:23:04 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeremy Hylton 
								
							 
						 
						
							
							
							
							
								
							
							
								73fd46d24e 
								
							 
						 
						
							
							
								
								Bug 3347: robotparser failed because it didn't convert bytes to string.  
							
							... 
							
							
							
							The solution is to convert bytes to text via utf-8.  I'm not entirely
sure if this is safe, but it looks like robots.txt is expected to be
ascii. 
							
						 
						
							2008-07-18 20:59:44 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jeremy Hylton 
								
							 
						 
						
							
							
							
							
								
							
							
								1afc169616 
								
							 
						 
						
							
							
								
								Make a new urllib package .  
							
							... 
							
							
							
							It consists of code from urllib, urllib2, urlparse, and robotparser.
The old modules have all been removed.  The new package has five
submodules: urllib.parse, urllib.request, urllib.response,
urllib.error, and urllib.robotparser.  The urllib.request.urlopen()
function uses the url opener from urllib2.
Note that the unittests have not been renamed for the
beta, but they will be renamed in the future.
Joint work with Senthil Kumaran. 
							
						 
						
							2008-06-18 20:49:58 +00:00