Loading a small data which does not even involve arbitrary code execution
could consume arbitrary large amount of memory. There were three issues:
* PUT and LONG_BINPUT with large argument (the C implementation only).
Since the memo is implemented in C as a continuous dynamic array, a single
opcode can cause its resizing to arbitrary size. Now the sparsity of
memo indices is limited.
* BINBYTES, BINBYTES8 and BYTEARRAY8 with large argument. They allocated
the bytes or bytearray object of the specified size before reading into
it. Now they read very large data by chunks.
* BINSTRING, BINUNICODE, LONG4, BINUNICODE8 and FRAME with large
argument. They read the whole data by calling the read() method of
the underlying file object, which usually allocates the bytes object of
the specified size before reading into it. Now they read very large data
by chunks.
Also add comprehensive benchmark suite to measure performance and memory
impact of chunked reading optimization in PR #119204.
Features:
- Normal mode: benchmarks legitimate pickles (time/memory metrics)
- Antagonistic mode: tests malicious pickles (DoS protection)
- Baseline comparison: side-by-side comparison of two Python builds
- Support for truncated data and sparse memo attack vectors
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>