Timon Viola 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								4d02f31cdd 
								
							 
						 
						
							
							
								
								gh-118350: Fix support of elements "textarea" and "title" in HTMLParser ( #135310 )  
							
							... 
							
							
							
							Co-authored-by: Serhiy Storchaka <storchaka@gmail.com>
Co-authored-by: Łukasz Langa <lukasz@langa.pl> 
							
						 
						
							2025-07-22 13:27:13 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								dee6501894 
								
							 
						 
						
							
							
								
								gh-135661: Fix parsing attributes with whitespaces around the "=" separator in HTMLParser (GH-136908)  
							
							... 
							
							
							
							This fixes a regression introduced in GH-135930. 
							
						 
						
							2025-07-21 12:07:15 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								8ac7613dc8 
								
							 
						 
						
							
							
								
								gh-102555: Fix comment parsing in HTMLParser according to the HTML5 standard (GH-135664)  
							
							... 
							
							
							
							* "--!>" now ends the comment.
* "-- >" no longer ends the comment.
* Support abnormally ended empty comments "<-->" and "<--->".
---------
Co-author: Kerim Kabirov <the.privat33r+gh@pm.me>
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> 
							
						 
						
							2025-07-04 07:00:23 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								0243f97cba 
								
							 
						 
						
							
							
								
								gh-135661: Fix parsing start and end tags in HTMLParser according to the HTML5 standard (GH-135930)  
							
							... 
							
							
							
							* Whitespaces no longer accepted between `</` and the tag name.
  E.g. `</ script>` does not end the script section.
* Vertical tabulation (`\v`) and non-ASCII whitespaces no longer recognized
  as whitespaces. The only whitespaces are `\t\n\r\f `.
* Null character (U+0000) no longer ends the tag name.
* Attributes and slashes after the tag name in end tags are now ignored,
  instead of terminating after the first `>` in quoted attribute value.
  E.g. `</script/foo=">"/>`.
* Multiple slashes and whitespaces between the last attribute and closing `>`
  are now ignored in both start and end tags. E.g. `<a foo=bar/ //>`.
* Multiple `=` between attribute name and value are no longer collapsed.
  E.g. `<a foo==bar>` produces attribute "foo" with value "=bar".
* Whitespaces between the `=` separator and attribute name or value are no
  longer ignored. E.g. `<a foo =bar>` produces two attributes "foo" and
  "=bar", both with value None; `<a foo= bar>` produces two attributes:
  "foo" with value "" and "bar" with value None.
* Fix Sphinx errors.
* Apply suggestions from code review
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com>
* Address review comments.
* Move to Security.
---------
Co-authored-by: Ezio Melotti <ezio.melotti@gmail.com> 
							
						 
						
							2025-07-03 23:33:02 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								6eb6c5dbfb 
								
							 
						 
						
							
							
								
								gh-135462: Fix quadratic complexity in processing special input in HTMLParser (GH-135464)  
							
							... 
							
							
							
							End-of-file errors are now handled according to the HTML5 specs --
comments and declarations are automatically closed, tags are ignored. 
							
						 
						
							2025-06-13 19:57:48 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Waylan Limberg 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								53383e90e4 
								
							 
						 
						
							
							
								
								gh-86155: Fix data loss after unclosed script or style tag in HTMLParser (GH-22658)  
							
							... 
							
							
							
							When calling .close() the HTMLParser should flush all remaining content,
even when that content is in an unclosed script or style tag. 
							
						 
						
							2025-05-10 17:36:06 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								76c0b01bc4 
								
							 
						 
						
							
							
								
								gh-77057: Fix handling of invalid markup declarations in HTMLParser (GH-9295)  
							
							... 
							
							
							
							Co-authored-by: Serhiy Storchaka <storchaka@gmail.com> 
							
						 
						
							2025-05-10 17:31:43 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Sascha Ißbrücker 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								77b14a6d58 
								
							 
						 
						
							
							
								
								gh-69426: HTMLParser: only unescape properly terminated character entities in attribute values (GH-95215)  
							
							... 
							
							
							
							According to the HTML5 spec, named character references in attribute values
should only be processed if they are not followed by an ASCII alphanumeric,
or an equals sign.
https://html.spec.whatwg.org/multipage/parsing.html#named-character-reference-state  
							
						 
						
							2025-05-07 18:49:49 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jean-Christophe Amiel 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9a07eff628 
								
							 
						 
						
							
							
								
								gh-100210: Correct the comment link for unescaping HTML ( #100212 )  
							
							... 
							
							
							
							gh-100210: correct the comment link for unescaping HTML 
							
						 
						
							2023-02-19 11:18:12 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Victor Stinner 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								1863302d61 
								
							 
						 
						
							
							
								
								gh-97669: Create Tools/build/ directory ( #97963 )  
							
							... 
							
							
							
							Create Tools/build/ directory. Move the following scripts from
Tools/scripts/ to Tools/build/:
* check_extension_modules.py
* deepfreeze.py
* freeze_modules.py
* generate_global_objects.py
* generate_levenshtein_examples.py
* generate_opcode_h.py
* generate_re_casefix.py
* generate_sre_constants.py
* generate_stdlib_module_names.py
* generate_token.py
* parse_html5_entities.py
* smelly.py
* stable_abi.py
* umarshal.py
* update_file.py
* verify_ensurepip_wheels.py
Update references to these scripts. 
							
						 
						
							2022-10-17 12:01:00 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Dong-hee Na 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								157aef79b0 
								
							 
						 
						
							
							
								
								gh-95813: Improve HTMLParser from the view of inheritance ( #95874 )  
							
							... 
							
							
							
							* gh-95813: Improve HTMLParser from the view of inheritance
* gh-95813: Add unittest
* Address code review 
							
						 
						
							2022-08-18 13:16:33 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								f28ec34c5c 
								
							 
						 
						
							
							
								
								gh-82927: Update files related to HTML entities. (GH-92504)  
							
							
							
						 
						
							2022-06-21 22:03:12 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									slateny 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								d707d073be 
								
							 
						 
						
							
							
								
								Add source for character mappings ( #92014 )  
							
							
							
						 
						
							2022-05-06 12:28:09 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Alberto Mardegan 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								562c0d7398 
								
							 
						 
						
							
							
								
								bpo-45421: Remove dead code from html.parser (GH-28847)  
							
							... 
							
							
							
							Support for HtmlParserError was removed back in 2014 with commit
73a4359eb0 
							
						 
						
							2021-10-12 10:12:21 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Christian Clauss 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								745c9d9dfc 
								
							 
						 
						
							
							
								
								Fix typos in the Lib directory (GH-28775)  
							
							... 
							
							
							
							Fix typos in the Lib directory as identified by codespell.
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> 
							
						 
						
							2021-10-06 16:13:48 -07:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Karl Dubost 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								9eb11a139f 
								
							 
						 
						
							
							
								
								bpo-41748: Handles unquoted attributes with commas ( #24072 )  
							
							... 
							
							
							
							* bpo-41748: Adds tests for unquoted attributes with comma
* bpo-41748: Handles unquoted attributes with comma
* bpo-41748: Addresses review comments
* bpo-41748: Addresses review comments
* Adds more test cases
* Simplifies the regex for handling spaces
* bpo-41748: Moves attributes tests under the right class
* bpo-41748: Addresses review about duplicate attributes
* bpo-41748: Adds NEWS.d entry for this patch 
							
						 
						
							2021-02-01 21:32:50 +01:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Inada Naoki 
								
							 
						 
						
							
							
								
								
							
							
							
								
							
							
								fae0ed5099 
								
							 
						 
						
							
							
								
								bpo-37328: remove deprecated HTMLParser.unescape (GH-14186)  
							
							... 
							
							
							
							It is deprecated since Python 3.4. 
							
						 
						
							2019-08-27 11:48:06 +09:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Motoki Naruse 
								
							 
						 
						
							
							
							
							
								
							
							
								3358d589fb 
								
							 
						 
						
							
							
								
								bpo-30629: Remove second call of str.lower() in html.parser.parse_endtag. ( #2099 )  
							
							... 
							
							
							
							elem is the result of .lower() 6 lines above the handle_endtag call.
Patch by Motoki Naruse 
							
						 
						
							2017-06-16 21:15:25 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
							
							
								
							
							
								c842efc6ae 
								
							 
						 
						
							
							
								
								Revert "Fixed a typo in the HTMLParser.feed docstrings" ( #1771 )  
							
							... 
							
							
							
							* Revert "Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a The docstring was correct. I read the patch in opposite direction, as *adding* the "r" prefix.
This reverts commit 5ba185039f 
							
						 
						
							2017-05-24 07:20:45 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Jani Šumak 
								
							 
						 
						
							
							
							
							
								
							
							
								5ba185039f 
								
							 
						 
						
							
							
								
								Fixed a typo in the HTMLParser.feed docstrings. The docstring started with an 'r', like a rawstring. ( #1759 )  
							
							
							
						 
						
							2017-05-23 16:40:54 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									R David Murray 
								
							 
						 
						
							
							
							
							
								
							
							
								44b548dda8 
								
							 
						 
						
							
							
								
								#27364 : fix "incorrect" uses of escape character in the stdlib.  
							
							... 
							
							
							
							And most of the tools.
Patch by Emanual Barry, reviewed by me, Serhiy Storchaka, and
Martin Panter. 
							
						 
						
							2016-09-08 13:59:53 -04:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Panter 
								
							 
						 
						
							
							
							
							
								
							
							
								46f50726a0 
								
							 
						 
						
							
							
								
								Issue  #27076 : Doc, comment and tests spelling fixes  
							
							... 
							
							
							
							Most fixes to Doc/ and Lib/ directories by Ville Skyttä. 
							
						 
						
							2016-05-26 05:35:26 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Panter 
								
							 
						 
						
							
							
							
							
								
							
							
								4827e488a4 
								
							 
						 
						
							
							
								
								Merge spelling fixes from 3.4 into 3.5  
							
							
							
						 
						
							2015-10-31 12:16:18 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Martin Panter 
								
							 
						 
						
							
							
							
							
								
							
							
								1f1177d69a 
								
							 
						 
						
							
							
								
								Fix some spelling errors in documentation and code comments  
							
							
							
						 
						
							2015-10-31 11:48:53 +00:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								20a2c6482e 
								
							 
						 
						
							
							
								
								#23144 : merge with 3.4.  
							
							
							
						 
						
							2015-09-06 21:44:45 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								6f2bb98966 
								
							 
						 
						
							
							
								
								#23144 : Make sure that HTMLParser.feed() returns all the data, even when convert_charrefs is True.  
							
							
							
						 
						
							2015-09-06 21:38:06 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
							
							
								
							
							
								82e07b92b3 
								
							 
						 
						
							
							
								
								Issue  #23181 : More "codepoint" -> "code point".  
							
							
							
						 
						
							2015-01-18 11:33:31 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Serhiy Storchaka 
								
							 
						 
						
							
							
							
							
								
							
							
								d3faf43f9b 
								
							 
						 
						
							
							
								
								Issue  #23181 : More "codepoint" -> "code point".  
							
							
							
						 
						
							2015-01-18 11:28:37 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								6fc16d81af 
								
							 
						 
						
							
							
								
								#21047 : set the default value for the *convert_charrefs* argument of HTMLParser to True.  Patch by Berker Peksag.  
							
							
							
						 
						
							2014-08-02 18:36:12 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								11bec7a1b8 
								
							 
						 
						
							
							
								
								Add an __all__ to html.entities.  
							
							
							
						 
						
							2014-08-02 15:15:02 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								73a4359eb0 
								
							 
						 
						
							
							
								
								#15114 : the strict mode and argument of HTMLParser, HTMLParser.error, and the HTMLParserError exception have been removed.  
							
							
							
						 
						
							2014-08-02 14:10:30 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								153d97b24e 
								
							 
						 
						
							
							
								
								#20288 : merge with 3.3.  
							
							
							
						 
						
							2014-02-01 21:22:26 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f27b9a741a 
								
							 
						 
						
							
							
								
								#20288 : fix handling of invalid numeric charrefs in HTMLParser.  
							
							
							
						 
						
							2014-02-01 21:21:01 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								95401c5f6b 
								
							 
						 
						
							
							
								
								#13633 : Added a new convert_charrefs keyword arg to HTMLParser that, when True, automatically converts all character references.  
							
							
							
						 
						
							2013-11-23 19:52:05 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f6de9eb2bb 
								
							 
						 
						
							
							
								
								#19688 : add back and deprecate the internal HTMLParser.unescape() method.  
							
							
							
						 
						
							2013-11-22 05:49:29 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								4a9ee26750 
								
							 
						 
						
							
							
								
								#2927 : Added the unescape() function to the html module.  
							
							
							
						 
						
							2013-11-19 20:28:45 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								b7038817fe 
								
							 
						 
						
							
							
								
								#19480 : merge with 3.3.  
							
							
							
						 
						
							2013-11-07 18:35:27 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								7165d8b9ba 
								
							 
						 
						
							
							
								
								#19480 : HTMLParser now accepts all valid start-tag names as defined by the HTML5 standard.  
							
							
							
						 
						
							2013-11-07 18:33:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								88ebfb129b 
								
							 
						 
						
							
							
								
								#15114 : The html.parser module now raises a DeprecationWarning when the strict argument of HTMLParser or the HTMLParser.error method are used.  
							
							
							
						 
						
							2013-11-02 17:08:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								4603487dc9 
								
							 
						 
						
							
							
								
								#18020 : improve html.escape speed by an order of magnitude.  Patch by Matt Bryant.  
							
							
							
						 
						
							2013-07-07 11:11:24 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								f6ca26fbff 
								
							 
						 
						
							
							
								
								#17802 : merge with 3.3.  
							
							
							
						 
						
							2013-05-01 16:20:00 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								8e596a765c 
								
							 
						 
						
							
							
								
								#17802 : Fix an UnboundLocalError in html.parser.  Initial tests by Thomas Barlow.  
							
							
							
						 
						
							2013-05-01 16:18:25 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								1698babd1b 
								
							 
						 
						
							
							
								
								#14679 : add an __all__ (that contains only HTMLParser) to html.parser.  
							
							
							
						 
						
							2013-05-01 16:09:34 +03:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								e6e96eea51 
								
							 
						 
						
							
							
								
								#16245 : Fix the value of a few entities in html.entities.html5.  
							
							
							
						 
						
							2012-10-23 15:51:27 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								518dbfd7b5 
								
							 
						 
						
							
							
								
								Reorder html.entities.html5 entities to make updates easier.  Patch by Iuliia Proskurnia.  
							
							
							
						 
						
							2012-10-23 14:45:58 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								46495182d0 
								
							 
						 
						
							
							
								
								#15156 : HTMLParser now uses the new "html.entities.html5" dictionary.  
							
							
							
						 
						
							2012-06-24 22:02:56 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								dc44f55cc9 
								
							 
						 
						
							
							
								
								#11113 : add a new "html5" dictionary containing the named character references defined by the HTML5 standard and the equivalent Unicode character(s) to the html.entities module.  
							
							
							
						 
						
							2012-06-24 04:37:41 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								3861d8b271 
								
							 
						 
						
							
							
								
								#15114 : the strict mode of HTMLParser and the HTMLParseError exception are deprecated now that the parser is able to parse invalid markup.  
							
							
							
						 
						
							2012-06-23 15:27:51 +02:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								0780b6bc58 
								
							 
						 
						
							
							
								
								#14538 : HTMLParser can now parse correctly start tags that contain a bare /.  
							
							
							
						 
						
							2012-04-18 19:18:22 -06:00 
							
								 
							
							
								 
							
						 
					 
				
					
						
							
								
								
									Ezio Melotti 
								
							 
						 
						
							
							
							
							
								
							
							
								29877e8e04 
								
							 
						 
						
							
							
								
								HTMLParser is now able to handle slashes in the start tag.  
							
							
							
						 
						
							2012-02-21 09:25:00 +02:00