gh-115952: Fix a potential virtual memory allocation denial of service in pickle (GH-119204)

Loading a small data which does not even involve arbitrary code execution
could consume arbitrary large amount of memory. There were three issues:

* PUT and LONG_BINPUT with large argument (the C implementation only).
  Since the memo is implemented in C as a continuous dynamic array, a single
  opcode can cause its resizing to arbitrary size. Now the sparsity of
  memo indices is limited.
* BINBYTES, BINBYTES8 and BYTEARRAY8 with large argument.  They allocated
  the bytes or bytearray object of the specified size before reading into
  it.  Now they read very large data by chunks.
* BINSTRING, BINUNICODE, LONG4, BINUNICODE8 and FRAME with large
  argument.  They read the whole data by calling the read() method of
  the underlying file object, which usually allocates the bytes object of
  the specified size before reading into it.  Now they read very large data
  by chunks.

Also add comprehensive benchmark suite to measure performance and memory
impact of chunked reading optimization in PR #119204.

Features:
- Normal mode: benchmarks legitimate pickles (time/memory metrics)
- Antagonistic mode: tests malicious pickles (DoS protection)
- Baseline comparison: side-by-side comparison of two Python builds
- Support for truncated data and sparse memo attack vectors

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
This commit is contained in:
Serhiy Storchaka 2025-12-05 19:17:01 +02:00 committed by GitHub
parent 4085ff7b32
commit 59f247e43b
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
7 changed files with 1767 additions and 177 deletions

View file

@ -59,6 +59,8 @@ class PyUnpicklerTests(AbstractUnpickleTests, unittest.TestCase):
truncated_errors = (pickle.UnpicklingError, EOFError,
AttributeError, ValueError,
struct.error, IndexError, ImportError)
truncated_data_error = (EOFError, '')
size_overflow_error = (pickle.UnpicklingError, 'exceeds')
def loads(self, buf, **kwds):
f = io.BytesIO(buf)
@ -103,6 +105,8 @@ class InMemoryPickleTests(AbstractPickleTests, AbstractUnpickleTests,
truncated_errors = (pickle.UnpicklingError, EOFError,
AttributeError, ValueError,
struct.error, IndexError, ImportError)
truncated_data_error = ((pickle.UnpicklingError, EOFError), '')
size_overflow_error = ((OverflowError, pickle.UnpicklingError), 'exceeds')
def dumps(self, arg, protocol=None, **kwargs):
return pickle.dumps(arg, protocol, **kwargs)
@ -375,6 +379,8 @@ class CUnpicklerTests(PyUnpicklerTests):
unpickler = _pickle.Unpickler
bad_stack_errors = (pickle.UnpicklingError,)
truncated_errors = (pickle.UnpicklingError,)
truncated_data_error = (pickle.UnpicklingError, 'truncated')
size_overflow_error = (OverflowError, 'exceeds')
class CPicklingErrorTests(PyPicklingErrorTests):
pickler = _pickle.Pickler
@ -478,7 +484,7 @@ def test_pickler(self):
0) # Write buffer is cleared after every dump().
def test_unpickler(self):
basesize = support.calcobjsize('2P2n2P 2P2n2i5P 2P3n8P2n2i')
basesize = support.calcobjsize('2P2n3P 2P2n2i5P 2P3n8P2n2i')
unpickler = _pickle.Unpickler
P = struct.calcsize('P') # Size of memo table entry.
n = struct.calcsize('n') # Size of mark table entry.