Commit graph

35 commits

Author SHA1 Message Date
Serhiy Storchaka
ac56f8cc8d
gh-133306: Support \z as a synonym for \Z in regular expressions (GH-133314)
\Z was an error inherited from PCRE 0.95. It was fixed in PCRE 2.0.
In other engines, \Z means not “anchor at string end”, but
“anchor before optional newline at string end”.

\z means “anchor at string end” in most RE engines.
2025-05-03 07:54:33 +00:00
Serhiy Storchaka
819830f34a
gh-126505: Fix bugs in compiling case-insensitive character classes (GH-126557)
* upper-case non-BMP character was ignored
* the ASCII flag was ignored when matching a character range whose
  upper bound is beyond the BMP region
2024-11-11 18:27:26 +02:00
Serhiy Storchaka
f9637b4ba3
Remove dead code in the RE parser (GH-122796) 2024-08-07 19:44:18 +00:00
Jelle Zijlstra
b72c748d7f
Fix syntax in generate_re_casefix.py (#122699)
This was broken in gh-97963.
2024-08-05 23:16:29 -07:00
Serhiy Storchaka
8bc76ae45f
gh-111259: Optimize complementary character sets in RE (GH-120742)
Patterns like "[\s\S]" or "\s|\S" which match any character are now compiled
to the same effective code as a dot with the DOTALL modifier ("(?s:.)").
2024-06-20 07:19:32 +00:00
Victor Stinner
6ae254aaa0
gh-120417: Add #noqa to used imports in the stdlib (#120421)
Tools such as ruff can ignore "imported but unused" warnings if a
line ends with "# noqa: F401". It avoids the temptation to remove
an import which is used effectively.
2024-06-13 16:14:50 +02:00
achhina
a01022af23
GH-83162: Rename re.error for better clarity. (#101677)
Renamed re.error for clarity, and kept re.error for backward compatibility.
Updated idlelib files at TJR's request.
---------

Co-authored-by: Matthias Bussonnier <mbussonnier@ucmerced.edu>
Co-authored-by: Hugo van Kemenade <hugovk@users.noreply.github.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
2023-12-11 15:45:08 -05:00
Serhiy Storchaka
e2b3d831fd
gh-109747: Improve errors for unsupported look-behind patterns (GH-109859)
Now re.error is raised instead of OverflowError or RuntimeError for
too large width of look-behind pattern.

The limit is increased to 2**32-1 (was 2**31-1).
2023-10-14 09:13:02 +03:00
Serhiy Storchaka
882cb79afa
gh-56166: Deprecate passing confusing positional arguments in re functions (#107778)
Deprecate passing optional arguments maxsplit, count and flags in
module-level functions re.split(), re.sub() and re.subn() as positional.
They should only be passed by keyword.
2023-08-16 13:35:35 -07:00
SKO
abd9cc52d9
gh-100061: Proper fix of the bug in the matching of possessive quantifiers (GH-102612)
Restore the global Input Stream pointer after trying to match a sub-pattern.

Co-authored-by: Ma Lin <animalize@users.noreply.github.com>
2023-08-16 10:43:45 +03:00
Serhiy Storchaka
7b6e34e5ba
gh-106052: Fix bug in the matching of possessive quantifiers (gh-106515)
It did not work in the case of a subpattern containing backtracking.

Temporary implement possessive quantifiers as equivalent greedy qualifiers
in atomic groups.
2023-08-09 08:47:57 +03:00
Serhiy Storchaka
ed64204716
gh-106566: Optimize (?!) in regular expressions (GH-106567) 2023-08-07 18:09:56 +03:00
Serhiy Storchaka
74ec02e949
gh-106510: Fix DEBUG output for atomic group (GH-106511) 2023-07-08 14:31:25 +03:00
Nikita Sobolev
67f69dba0a
gh-105687: Remove deprecated objects from re module (#105688) 2023-06-14 12:26:20 +02:00
Serhiy Storchaka
75a6fadf36
gh-91524: Speed up the regular expression substitution (#91525)
Functions re.sub() and re.subn() and corresponding re.Pattern methods
are now 2-3 times faster for replacement strings containing group references.

Closes #91524

Primarily authored by serhiy-storchaka Serhiy Storchaka
Minor-cleanups-by: Gregory P. Smith [Google] <greg@krypto.org>
2022-10-23 15:57:30 -07:00
Serhiy Storchaka
c11b667a1d
gh-96346: Use double caching for re._compile() (#96347) 2022-10-07 12:21:42 -07:00
Gregory P. Smith
4beee0c7b0
gh-91404: Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or allocation failure (GH-32283) (#93882)
Revert "bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283)"

This reverts commit 6e3eee5c11.

Manual fixups to increase the MAGIC number and to handle conflicts with
a couple of changes that landed after that.

Thanks for reviews by Ma Lin and Serhiy Storchaka.
2022-06-17 01:19:44 -07:00
Miro Hrončok
16a7e4a0b7
gh-92728: Restore re.template, but deprecate it (GH-93161)
Revert "bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)"

This reverts commit b09184bf05.
2022-05-25 09:05:35 +03:00
Serhiy Storchaka
a84a56d80f
gh-91760: More strict rules for numerical group references and group names in RE (GH-91792)
Only sequence of ASCII digits is now accepted as a numerical reference.
The group name in bytes patterns and replacement strings can now only
contain ASCII letters and digits and underscore.
2022-05-08 19:19:29 +03:00
Serhiy Storchaka
19dca04121
gh-91760: Deprecate group names and numbers which will be invalid in future (GH-91794)
Only sequence of ASCII digits will be accepted as a numerical reference.
The group name in bytes patterns and replacement strings could only
contain ASCII letters and digits and underscore.
2022-04-30 13:13:46 +03:00
Serhiy Storchaka
6d0d547033
gh-92049: Forbid pickling constants re._constants.SUCCESS etc (GH-92070)
Previously, pickling did not fail, but the result could not be unpickled.
2022-04-30 13:03:23 +03:00
Serhiy Storchaka
f703c96cf0
gh-91870: Remove unsupported SRE opcode CALL (GH-91872)
It was initially added to support atomic groups, but that
support was never fully implemented, and CALL was only left
in the compiler, but not interpreter and parser.

ATOMIC_GROUP is now used to support atomic groups.
2022-04-26 21:07:25 +03:00
Serhiy Storchaka
28890427c5
RE: Pre-split the list of opcode names (GH-91859)
1. It makes them interned.
2. It allows to add comments to individual opcodes.
2022-04-23 18:49:23 +03:00
Serhiy Storchaka
130a8c386b
gh-91308: Simplify parsing inline flag "x" (verbose) (GH-91855) 2022-04-23 12:50:42 +03:00
Serhiy Storchaka
f912cc0e41
gh-91575: Add a script for generating data for case-insensitive matching in re (GH-91660)
Also test that all extra cases are in BMP.
2022-04-22 21:37:46 +03:00
Serhiy Storchaka
48ec61a89a
gh-91700: Validate the group number in conditional expression in RE (GH-91702)
In expression (?(group)...) an appropriate re.error is now
raised if the group number refers to not defined group.

Previously it raised RuntimeError: invalid SRE code.
2022-04-22 19:53:10 +03:00
Serhiy Storchaka
6ccfa31421
gh-90568: Fix exception type for \N with a named sequence in RE (GH-91665)
re.error is now raised instead of TypeError.
2022-04-22 18:35:28 +03:00
Serhiy Storchaka
1c2fcebf3c
gh-91575: Update case-insensitive matching in re to the latest Unicode version (GH-91580) 2022-04-18 12:26:30 +03:00
Serhiy Storchaka
474fdbe9e4
bpo-47152: Automatically regenerate sre_constants.h (GH-91439)
* Move the code for generating Modules/_sre/sre_constants.h from
  Lib/re/_constants.py into a separate script
  Tools/scripts/generate_sre_constants.py.
* Add target `regen-sre` in the makefile.
* Make target `regen-all` depending on `regen-sre`.
2022-04-12 18:34:06 +03:00
Serhiy Storchaka
50872dbadc
bpo-47227: Suppress expression chaining for more RE parsing errors (GH-32333) 2022-04-06 19:54:44 +03:00
Serhiy Storchaka
b09184bf05
bpo-47211: Remove function re.template() and flag re.TEMPLATE (GH-32300)
They were undocumented and never working.
2022-04-06 19:53:50 +03:00
Serhiy Storchaka
ff2cf1d7d5
bpo-47152: Remove unused import in re (GH-32298) 2022-04-04 12:00:53 +03:00
Serhiy Storchaka
1578f06c1c
bpo-47152: Move sources of the _sre module into a subdirectory (GH-32290) 2022-04-04 10:53:26 +03:00
Ma Lin
6e3eee5c11
bpo-23689: re module, fix memory leak when a match is terminated by a signal or memory allocation failure (GH-32283) 2022-04-03 19:16:20 +03:00
Serhiy Storchaka
1be3260a90
bpo-47152: Convert the re module into a package (GH-32177)
The sre_* modules are now deprecated.
2022-04-02 11:35:13 +03:00