cpython/Tools/unicode
Serhiy Storchaka 794b42ff8a
gh-95555: Support Unicode property escapes \p{...} in regular expressions (GH-151969)
Add support for \p{property} and \P{property} escapes in Unicode (str)
regular expressions, for the properties the engine can resolve without
the unicodedata database.  They are matched as CATEGORY opcodes or as
fixed sets of character ranges.

Supported in this change: many General_Category values (the groups L, N,
Z, C and the values Lu, Lt, Lm, Nd, Nl, No, Zs, Zl, Zp, Cc, Cf, Cs, Co
and Cn); the binary properties Alphabetic, Lowercase, Uppercase, Numeric,
Printable, XID_Start, XID_Continue, Cased and Case_Ignorable; the POSIX
compatibility classes; the code-point classes ASCII, Any, Assigned,
Noncharacter_Code_Point, Join_Control, Pattern_Syntax and
Pattern_White_Space; and Regional_Indicator, ASCII_Hex_Digit and
Hex_Digit.

Property and value names use loose matching (UAX #44 UAX44-LM3), so a
property may be spelled \p{Lu}, \p{gc=Lu} or \p{name=yes}.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
2026-06-26 07:33:33 +03:00
..
python-mappings Revert "gh-84508: Add mapping files for Korean and Japanese. (gh-93309)" (#93320) 2022-05-29 09:49:19 +09:00
comparecodecs.py Issue #19936: Added executable bits or shebang lines to Python scripts which 2014-01-16 17:15:49 +02:00
dawg.py gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names (#97906) 2023-11-04 15:56:58 +01:00
gen_expat_table.py gh-62259: Add Tools/unicode/gen_expat_table.py (GH-150503) 2026-06-10 18:04:03 +03:00
gencjkcodecs.py bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927) 2019-03-30 08:33:02 +02:00
gencodec.py bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927) 2019-03-30 08:33:02 +02:00
genmap_japanese.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_korean.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_schinese.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_support.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_tchinese.py gh-84508: tool to generate cjk traditional chinese mappings (gh-93272) 2022-06-11 23:19:41 +09:00
genwincodec.py
genwincodecs.bat bpo-27425: Be more explicit in .gitattributes (GH-840) 2017-06-10 14:58:42 -05:00
listcodecs.py
Makefile
makeunicodedata.py gh-95555: Support Unicode property escapes \p{...} in regular expressions (GH-151969) 2026-06-26 07:33:33 +03:00
mkstringprep.py bpo-37758: Cut always-constant conditionals on sys.maxunicode. (GH-15302) 2019-09-09 08:20:40 -07:00