Loading a small data which does not even involve arbitrary code execution
could consume arbitrary large amount of memory. There were three issues:
* PUT and LONG_BINPUT with large argument (the C implementation only).
Since the memo is implemented in C as a continuous dynamic array, a single
opcode can cause its resizing to arbitrary size. Now the sparsity of
memo indices is limited.
* BINBYTES, BINBYTES8 and BYTEARRAY8 with large argument. They allocated
the bytes or bytearray object of the specified size before reading into
it. Now they read very large data by chunks.
* BINSTRING, BINUNICODE, LONG4, BINUNICODE8 and FRAME with large
argument. They read the whole data by calling the read() method of
the underlying file object, which usually allocates the bytes object of
the specified size before reading into it. Now they read very large data
by chunks.
Also add comprehensive benchmark suite to measure performance and memory
impact of chunked reading optimization in PR #119204.
Features:
- Normal mode: benchmarks legitimate pickles (time/memory metrics)
- Antagonistic mode: tests malicious pickles (DoS protection)
- Baseline comparison: side-by-side comparison of two Python builds
- Support for truncated data and sparse memo attack vectors
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
The Python pickle module looks for "00" and "01" but _pickle only looked
for 2 characters that parsed to 0 or 1, meaning some payloads like "+0" or
" 0" would lead to different results in different implementations.
The use of PySys_GetObject() and _PySys_GetAttr(), which return a borrowed
reference, has been replaced by using one of the following functions, which
return a strong reference and distinguish a missing attribute from an error:
_PySys_GetOptionalAttr(), _PySys_GetOptionalAttrString(),
_PySys_GetRequiredAttr(), and _PySys_GetRequiredAttrString().
Instead of be limited just by the size of addressable memory (2**63
bytes), Python integers are now also limited by the number of bits, so
the number of bit now always fit in a 64-bit integer.
Both limits are much larger than what might be available in practice,
so it doesn't affect users.
_PyLong_NumBits() and _PyLong_Frexp() are now always successful.
* Raise PicklingError instead of UnicodeEncodeError, ValueError
and AttributeError in both implementations.
* Chain the original exception to the pickle-specific one as __context__.
* Include the error message of ImportError and some AttributeError in
the PicklingError error message.
* Unify error messages between Python and C implementations.
* Refer to documented __reduce__ and __newobj__ callables instead of
internal methods (e.g. save_reduce()) or pickle opcodes (e.g. NEWOBJ).
* Include more details in error messages (what expected, what got).
* Avoid including a potentially long repr of an arbitrary object in
error messages.
This checks are redundant in normal circumstances and can only work if
the extension registry was intentionally broken.
* The Python implementation now raises exception for the extension code
with false boolean value.
* Simplify the C code. RuntimeError is now raised in explicit checks.
* Add many tests.
Serializing objects with complex __qualname__ (such as unbound methods and
nested classes) by name no longer involves serializing parent objects by value
in pickle protocols < 4.
* Add an InternalDocs file describing how interning should work and how to use it.
* Add internal functions to *explicitly* request what kind of interning is done:
- `_PyUnicode_InternMortal`
- `_PyUnicode_InternImmortal`
- `_PyUnicode_InternStatic`
* Switch uses of `PyUnicode_InternInPlace` to those.
* Disallow using `_Py_SetImmortal` on strings directly.
You should use `_PyUnicode_InternImmortal` instead:
- Strings should be interned before immortalization, otherwise you're possibly
interning a immortalizing copy.
- `_Py_SetImmortal` doesn't handle the `SSTATE_INTERNED_MORTAL` to
`SSTATE_INTERNED_IMMORTAL` update, and those flags can't be changed in
backports, as they are now part of public API and version-specific ABI.
* Add private `_only_immortal` argument for `sys.getunicodeinternedsize`, used in refleak test machinery.
* Make sure the statically allocated string singletons are unique. This means these sets are now disjoint:
- `_Py_ID`
- `_Py_STR` (including the empty string)
- one-character latin-1 singletons
Now, when you intern a singleton, that exact singleton will be interned.
* Add a `_Py_LATIN1_CHR` macro, use it instead of `_Py_ID`/`_Py_STR` for one-character latin-1 singletons everywhere (including Clinic).
* Intern `_Py_STR` singletons at startup.
* For free-threaded builds, intern `_Py_LATIN1_CHR` singletons at startup.
* Beef up the tests. Cover internal details (marked with `@cpython_only`).
* Add lots of assertions
Co-Authored-By: Eric Snow <ericsnowcurrently@gmail.com>
This PR adds the ability to enable the GIL if it was disabled at
interpreter startup, and modifies the multi-phase module initialization
path to enable the GIL when loading a module, unless that module's spec
includes a slot indicating it can run safely without the GIL.
PEP 703 called the constant for the slot `Py_mod_gil_not_used`; I went
with `Py_MOD_GIL_NOT_USED` for consistency with gh-104148.
A warning will be issued up to once per interpreter for the first
GIL-using module that is loaded. If `-v` is given, a shorter message
will be printed to stderr every time a GIL-using module is loaded
(including the first one that issues a warning).
Previously the C implementation of pickle.Pickler and pickle.Unpickler
classes did not have such methods and they could only be used if
they were overloaded in subclasses or set as instance attributes.
Fixed calling super().persistent_id() and super().persistent_load() in
subclasses of the C implementation of pickle.Pickler and pickle.Unpickler
classes. It no longer causes an infinite recursion.
Move private functions to the internal C API (pycore_sysmodule.h):
* _PySys_GetAttr()
* _PySys_GetSizeOf()
No longer export most of these functions.
Fix also a typo in Include/cpython/optimizer.h: add a missing space.
All fields must be explicitly initialised to prevent manipulation of
uninitialised fields in dealloc.
Align initialisation order with the layout of the object structs.