cpython/Tools/unicode
Pieter Eendebak 97dea30914
gh-150889: Improve performance of unicodedata.normalize() (GH-150890)
Scan the nfc_first/nfc_last reindex tables comparing only .start, range-check
the candidate once, and terminate on a sentinel above every codepoint, so each
entry costs a single comparison. ~2x faster on non-Latin and combining-heavy
NFC/NFKC input; no new data tables.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-06 11:34:33 +03:00
..
python-mappings Revert "gh-84508: Add mapping files for Korean and Japanese. (gh-93309)" (#93320) 2022-05-29 09:49:19 +09:00
comparecodecs.py
dawg.py gh-96954: use a directed acyclic word graph for storing the unicodedata codepoint names (#97906) 2023-11-04 15:56:58 +01:00
gencjkcodecs.py bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927) 2019-03-30 08:33:02 +02:00
gencodec.py bpo-22831: Use "with" to avoid possible fd leaks in tools (part 2). (GH-10927) 2019-03-30 08:33:02 +02:00
genmap_japanese.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_korean.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_schinese.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_support.py Code: Update Donghee Na's name (#109744) 2023-09-25 18:17:34 +03:00
genmap_tchinese.py gh-84508: tool to generate cjk traditional chinese mappings (gh-93272) 2022-06-11 23:19:41 +09:00
genwincodec.py
genwincodecs.bat
listcodecs.py
Makefile
makeunicodedata.py gh-150889: Improve performance of unicodedata.normalize() (GH-150890) 2026-06-06 11:34:33 +03:00
mkstringprep.py bpo-37758: Cut always-constant conditionals on sys.maxunicode. (GH-15302) 2019-09-09 08:20:40 -07:00