cpython

mirror of https://github.com/python/cpython.git synced 2026-06-29 04:10:54 +00:00

Author	SHA1	Message	Date
Serhiy Storchaka	bd4bd3e76a	gh-152100: Support set operations in character classes (GH-152153) Implement set difference [A--B], intersection [A&&B] and union [A\|\|B] in regular expression character classes (Unicode Technical Standard #18), including nested, complemented and compound set operands. Symmetric difference [A~~B] remains reserved. Also use the new syntax in the standard library (_strptime, textwrap, doctest, pkgutil). Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-25 10:09:41 +03:00
Pieter Eendebak	21c4b7359d	gh-152056: Compile single-category character sets to a bare CATEGORY opcode (GH-152057) A character set containing exactly one category, e.g. [\d] or [^\s], now compiles to a single CATEGORY opcode (like \d or \S) instead of an IN block. The negated form maps to the complementary category. This speeds up matching and reduces the size of the compiled byte code. Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>	2026-06-24 11:09:50 +00:00
Serhiy Storchaka	fde4cf862c	gh-152033: Optimize category escapes outside character sets (GH-152035) Character class escapes (``\d``, ``\D``, ``\s``, ``\S``, ``\w`` and ``\W``) that occur outside a character set are now compiled directly to a single CATEGORY opcode instead of being wrapped in an IN block. This removes the IN wrapper (three code words) and an indirect charset() call, and makes such an escape a simple repeatable unit so that, for example, ``\d+`` uses the REPEAT_ONE fast path; a CATEGORY case is added to SRE(count). The transformation preserves behaviour exactly. For category-heavy patterns the compiled byte code is about 20% smaller and matching is up to ~2x faster, with no effect on patterns that do not use bare category escapes. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>	2026-06-24 08:49:14 +03:00
Victor Stinner	0b8c348f27	Fix pyflakes warnings: variable is assigned to but never used (#142294 ) Example of fixed warning: Lib/netrc.py:98:13: local variable 'toplevel' is assigned to but never used	2025-12-08 14:00:31 +01:00
Serhiy Storchaka	ac56f8cc8d	gh-133306: Support \z as a synonym for \Z in regular expressions (GH-133314) \Z was an error inherited from PCRE 0.95. It was fixed in PCRE 2.0. In other engines, \Z means not “anchor at string end”, but “anchor before optional newline at string end”. \z means “anchor at string end” in most RE engines.	2025-05-03 07:54:33 +00:00
Serhiy Storchaka	f9637b4ba3	Remove dead code in the RE parser (GH-122796)	2024-08-07 19:44:18 +00:00
Serhiy Storchaka	e2b3d831fd	gh-109747: Improve errors for unsupported look-behind patterns (GH-109859) Now re.error is raised instead of OverflowError or RuntimeError for too large width of look-behind pattern. The limit is increased to 232-1 (was 231-1).	2023-10-14 09:13:02 +03:00
Serhiy Storchaka	ed64204716	gh-106566: Optimize (?!) in regular expressions (GH-106567)	2023-08-07 18:09:56 +03:00
Serhiy Storchaka	74ec02e949	gh-106510: Fix DEBUG output for atomic group (GH-106511)	2023-07-08 14:31:25 +03:00
Nikita Sobolev	67f69dba0a	gh-105687: Remove deprecated objects from `re` module (#105688 )	2023-06-14 12:26:20 +02:00
Serhiy Storchaka	75a6fadf36	gh-91524: Speed up the regular expression substitution (#91525 ) Functions re.sub() and re.subn() and corresponding re.Pattern methods are now 2-3 times faster for replacement strings containing group references. Closes #91524 Primarily authored by serhiy-storchaka Serhiy Storchaka Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>	2022-10-23 15:57:30 -07:00
Miro Hrončok	16a7e4a0b7	gh-92728: Restore re.template, but deprecate it (GH-93161) Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)" This reverts commit `b09184bf05`.	2022-05-25 09:05:35 +03:00
Serhiy Storchaka	a84a56d80f	gh-91760: More strict rules for numerical group references and group names in RE (GH-91792) Only sequence of ASCII digits is now accepted as a numerical reference. The group name in bytes patterns and replacement strings can now only contain ASCII letters and digits and underscore.	2022-05-08 19:19:29 +03:00
Serhiy Storchaka	19dca04121	gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794) Only sequence of ASCII digits will be accepted as a numerical reference. The group name in bytes patterns and replacement strings could only contain ASCII letters and digits and underscore.	2022-04-30 13:13:46 +03:00
Serhiy Storchaka	f703c96cf0	gh-91870: Remove unsupported SRE opcode CALL (GH-91872) It was initially added to support atomic groups, but that support was never fully implemented, and CALL was only left in the compiler, but not interpreter and parser. ATOMIC_GROUP is now used to support atomic groups.	2022-04-26 21:07:25 +03:00
Serhiy Storchaka	130a8c386b	gh-91308: Simplify parsing inline flag "x" (verbose) (GH-91855)	2022-04-23 12:50:42 +03:00
Serhiy Storchaka	48ec61a89a	gh-91700: Validate the group number in conditional expression in RE (GH-91702) In expression (?(group)...) an appropriate re.error is now raised if the group number refers to not defined group. Previously it raised RuntimeError: invalid SRE code.	2022-04-22 19:53:10 +03:00
Serhiy Storchaka	6ccfa31421	gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665) re.error is now raised instead of TypeError.	2022-04-22 18:35:28 +03:00
Serhiy Storchaka	50872dbadc	bpo-47227: Suppress expression chaining for more RE parsing errors (GH-32333)	2022-04-06 19:54:44 +03:00
Serhiy Storchaka	b09184bf05	bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300) They were undocumented and never working.	2022-04-06 19:53:50 +03:00
Serhiy Storchaka	1be3260a90	bpo-47152: Convert the re module into a package (GH-32177) The sre_* modules are now deprecated.	2022-04-02 11:35:13 +03:00

21 commits