Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								7636a66635 
								
							 
						 
						
							
							
								
								gh-135661: Fix parsing unterminated bogus comments in HTMLParser (GH-137873)  
							
							... 
							
							
							
							Bogus comments that start with "<![CDATA[" should not include the starting "!"
in its value. 
							
						 
						
							2025-08-17 13:37:50 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0cbbfc4621 
								
							 
						 
						
							
							
								
								gh-135661: Fix CDATA section parsing in HTMLParser (GH-135665)  
							
							... 
							
							
							
							"] ]>" and "]] >" no longer end the CDATA section.
Make CDATA section parsing  context depending.
Add private method HTMLParser._set_support_cdata() to change the context.
If called with True, "<[CDATA[" starts a CDATA section which ends with "]]>".
If called with False, "<[CDATA[" starts a bogus comments which ends with ">". 
							
						 
						
							2025-08-14 18:13:22 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Timon Viola 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4d02f31cdd 
								
							 
						 
						
							
							
								
								gh-118350: Fix support of elements "textarea" and "title" in HTMLParser ( #135310 )  
							
							... 
							
							
							
							Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Ćukasz Langa <lukasz@langa.pl> 
							
						 
						
							2025-07-22 13:27:13 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dee6501894 
								
							 
						 
						
							
							
								
								gh-135661: Fix parsing attributes with whitespaces around the "=" separator in HTMLParser (GH-136908)  
							
							... 
							
							
							
							This fixes a regression introduced in GH-135930. 
							
						 
						
							2025-07-21 12:07:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8ac7613dc8 
								
							 
						 
						
							
							
								
								gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664)  
							
							... 
							
							
							
							* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".
---------
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> 
							
						 
						
							2025-07-04 07:00:23 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0243f97cba 
								
							 
						 
						
							
							
								
								gh-135661: Fix parsing start and end tags in HTMLParser according to the HTML5 standard (GH-135930)  
							
							... 
							
							
							
							* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.
* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.
* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.
* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".
* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.
* Fix Sphinx errors.
* Apply suggestions from code review
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
* Address review comments.
* Move to Security.
---------
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> 
							
						 
						
							2025-07-03 23:33:02 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6eb6c5dbfb 
								
							 
						 
						
							
							
								
								gh-135462: Fix quadratic complexity in processing special input in HTMLParser (GH-135464)  
							
							... 
							
							
							
							End-of-file errors are now handled according to the HTML5 specs --
comments and declarations are automatically closed, tags are ignored. 
							
						 
						
							2025-06-13 19:57:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Waylan Limberg 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53383e90e4 
								
							 
						 
						
							
							
								
								gh-86155: Fix data loss after unclosed script or style tag in HTMLParser (GH-22658)  
							
							... 
							
							
							
							When calling .close() the HTMLParser should flush all remaining content,
even when that content is in an unclosed script or style tag. 
							
						 
						
							2025-05-10 17:36:06 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								76c0b01bc4 
								
							 
						 
						
							
							
								
								gh-77057: Fix handling of invalid markup declarations in HTMLParser (GH-9295)  
							
							... 
							
							
							
							Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> 
							
						 
						
							2025-05-10 17:31:43 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Sascha IĂbrĂŒcker 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								77b14a6d58 
								
							 
						 
						
							
							
								
								gh-69426: HTMLParser: only unescape properly terminated character entities in attribute values (GH-95215)  
							
							... 
							
							
							
							According to the HTML5 spec, named character references in attribute values
should only be processed if they are not followed by an ASCII alphanumeric,
or an equals sign.
https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state  
							
						 
						
							2025-05-07 18:49:49 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Dong-hee Na 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								157aef79b0 
								
							 
						 
						
							
							
								
								gh-95813: Improve HTMLParser from the view of inheritance ( #95874 )  
							
							... 
							
							
							
							* gh-95813: Improve HTMLParser from the view of inheritance
* gh-95813: Add unittest
* Address code review 
							
						 
						
							2022-08-18 13:16:33 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Karl Dubost 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9eb11a139f 
								
							 
						 
						
							
							
								
								bpo-41748: Handles unquoted attributes with commas ( #24072 )  
							
							... 
							
							
							
							* bpo-41748: Adds tests for unquoted attributes with comma
* bpo-41748: Handles unquoted attributes with comma
* bpo-41748: Addresses review comments
* bpo-41748: Addresses review comments
* Adds more test cases
* Simplifies the regex for handling spaces
* bpo-41748: Moves attributes tests under the right class
* bpo-41748: Addresses review about duplicate attributes
* bpo-41748: Adds NEWS.d entry for this patch 
							
						 
						
							2021-02-01 21:32:50 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Inada Naoki 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fae0ed5099 
								
							 
						 
						
							
							
								
								bpo-37328: remove deprecated HTMLParser.unescape (GH-14186)  
							
							... 
							
							
							
							It is deprecated since Python 3.4. 
							
						 
						
							2019-08-27 11:48:06 +09:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									R David Murray 
								
							 
						 
						
							
							
							
							
								
							
							
								44b548dda8 
								
							 
						 
						
							
							
								
								#27364 : fix "incorrect" uses of escape character in the stdlib.  
							
							... 
							
							
							
							And most of the tools.
Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter. 
							
						 
						
							2016-09-08 13:59:53 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
							
							
								
							
							
								597d15afe4 
								
							 
						 
						
							
							
								
								Issue  #23277 : Remove unused support.run_unittest import.  
							
							
							
						 
						
							2016-04-24 13:45:58 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								20a2c6482e 
								
							 
						 
						
							
							
								
								#23144 : merge with 3.4.  
							
							
							
						 
						
							2015-09-06 21:44:45 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								6f2bb98966 
								
							 
						 
						
							
							
								
								#23144 : Make sure that HTMLParser.feed() returns all the data, even when convert_charrefs is True.  
							
							
							
						 
						
							2015-09-06 21:38:06 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								6fc16d81af 
								
							 
						 
						
							
							
								
								#21047 : set the default value for the *convert_charrefs* argument of HTMLParser to True.  Patch by Berker Peksag.  
							
							
							
						 
						
							2014-08-02 18:36:12 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								73a4359eb0 
								
							 
						 
						
							
							
								
								#15114 : the strict mode and argument of HTMLParser, HTMLParser.error, and the HTMLParserError exception have been removed.  
							
							
							
						 
						
							2014-08-02 14:10:30 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								153d97b24e 
								
							 
						 
						
							
							
								
								#20288 : merge with 3.3.  
							
							
							
						 
						
							2014-02-01 21:22:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f27b9a741a 
								
							 
						 
						
							
							
								
								#20288 : fix handling of invalid numeric charrefs in HTMLParser.  
							
							
							
						 
						
							2014-02-01 21:21:01 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								95401c5f6b 
								
							 
						 
						
							
							
								
								#13633 : Added a new convert_charrefs keyword arg to HTMLParser that, when True, automatically converts all character references.  
							
							
							
						 
						
							2013-11-23 19:52:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f6de9eb2bb 
								
							 
						 
						
							
							
								
								#19688 : add back and deprecate the internal HTMLParser.unescape() method.  
							
							
							
						 
						
							2013-11-22 05:49:29 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								4a9ee26750 
								
							 
						 
						
							
							
								
								#2927 : Added the unescape() function to the html module.  
							
							
							
						 
						
							2013-11-19 20:28:45 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								b7038817fe 
								
							 
						 
						
							
							
								
								#19480 : merge with 3.3.  
							
							
							
						 
						
							2013-11-07 18:35:27 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								7165d8b9ba 
								
							 
						 
						
							
							
								
								#19480 : HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard.  
							
							
							
						 
						
							2013-11-07 18:33:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								1943c8a112 
								
							 
						 
						
							
							
								
								Merge test_htmlparser changes from 3.3.  
							
							
							
						 
						
							2013-11-02 17:50:02 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								5028f4d461 
								
							 
						 
						
							
							
								
								Use unittest.main() in test_htmlparser.  
							
							
							
						 
						
							2013-11-02 17:49:08 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								88ebfb129b 
								
							 
						 
						
							
							
								
								#15114 : The html.parser module now raises a DeprecationWarning when the strict argument of HTMLParser or the HTMLParser.error method are used.  
							
							
							
						 
						
							2013-11-02 17:08:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								8e596a765c 
								
							 
						 
						
							
							
								
								#17802 : Fix an UnboundLocalError in html.parser.  Initial tests by Thomas Barlow.  
							
							
							
						 
						
							2013-05-01 16:18:25 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								46495182d0 
								
							 
						 
						
							
							
								
								#15156 : HTMLParser now uses the new "html.entities.html5" dictionary.  
							
							
							
						 
						
							2012-06-24 22:02:56 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								3861d8b271 
								
							 
						 
						
							
							
								
								#15114 : the strict mode of HTMLParser and the HTMLParseError exception are deprecated now that the parser is able to parse invalid markup.  
							
							
							
						 
						
							2012-06-23 15:27:51 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								0780b6bc58 
								
							 
						 
						
							
							
								
								#14538 : HTMLParser can now parse correctly start tags that contain a bare /.  
							
							
							
						 
						
							2012-04-18 19:18:22 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								29877e8e04 
								
							 
						 
						
							
							
								
								HTMLParser is now able to handle slashes in the start tag.  
							
							
							
						 
						
							2012-02-21 09:25:00 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								e31ddedb0e 
								
							 
						 
						
							
							
								
								Fix an index and clean up comments.  
							
							
							
						 
						
							2012-02-13 20:20:00 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f4ab491901 
								
							 
						 
						
							
							
								
								Improve handling of declarations in HTMLParser.  
							
							
							
						 
						
							2012-02-13 15:50:37 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								86f67123be 
								
							 
						 
						
							
							
								
								Fix htmlparser tests to always use the right collector.  
							
							
							
						 
						
							2012-02-13 14:11:27 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								5211ffe4df 
								
							 
						 
						
							
							
								
								#13993 : HTMLParser is now able to handle broken end tags when strict=False.  
							
							
							
						 
						
							2012-02-13 11:24:50 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								fa3702dc28 
								
							 
						 
						
							
							
								
								#13960 : HTMLParser is now able to handle broken comments when strict=False.  
							
							
							
						 
						
							2012-02-10 10:45:44 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								62f3d0300e 
								
							 
						 
						
							
							
								
								#13576 : add tests about the handling of (possibly broken) condcoms.  
							
							
							
						 
						
							2011-12-19 07:29:03 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								15cb489234 
								
							 
						 
						
							
							
								
								#13358 : HTMLParser now calls handle_data only once for each CDATA.  
							
							
							
						 
						
							2011-11-18 18:01:49 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								c2fe57762b 
								
							 
						 
						
							
							
								
								#1745761 ,  #755670 ,  #13357 ,  #12629 ,  #1200313 : improve attribute handling in HTMLParser.  
							
							
							
						 
						
							2011-11-14 18:53:33 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								b245ed1cdf 
								
							 
						 
						
							
							
								
								Group tests about attributes in a separate class.  
							
							
							
						 
						
							2011-11-14 18:13:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								c1e73c30e9 
								
							 
						 
						
							
							
								
								Make sure that the tolerant parser still parses valid HTML correctly.  
							
							
							
						 
						
							2011-11-01 18:57:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								b9a48f7144 
								
							 
						 
						
							
							
								
								Avoid reusing the same collector in the tests.  
							
							
							
						 
						
							2011-11-01 15:00:59 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								18b0e5b79b 
								
							 
						 
						
							
							
								
								#12008 : add a test.  
							
							
							
						 
						
							2011-11-01 14:42:54 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								7de56f6a04 
								
							 
						 
						
							
							
								
								#670664 : Fix HTMLParser to correctly handle the content of `<script>...</script> and <style>...</style>`.  
							
							
							
						 
						
							2011-11-01 14:12:22 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f50ffa94ab 
								
							 
						 
						
							
							
								
								#13273 : fix a bug that prevented HTMLParser to properly detect some tags when strict=False.  
							
							
							
						 
						
							2011-10-28 13:21:09 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								d9e0b068af 
								
							 
						 
						
							
							
								
								#12888 : Fix a bug in HTMLParser.unescape that prevented it to escape more than 128 entities.  Patch by Peter Otten.  
							
							
							
						 
						
							2011-09-05 17:11:06 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								2e3607c1e7 
								
							 
						 
						
							
							
								
								#7311 : fix html.parser to accept non-ASCII attribute values.  
							
							
							
						 
						
							2011-04-07 22:03:31 +03:00