gh-115952: Fix a potential virtual memory allocation denial of service in pickle (GH-119204)

Loading a small data which does not even involve arbitrary code execution could consume arbitrary large amount of memory. There were three issues: * PUT and LONG_BINPUT with large argument (the C implementation only). Since the memo is implemented in C as a continuous dynamic array, a single opcode can cause its resizing to arbitrary size. Now the sparsity of memo indices is limited. * BINBYTES, BINBYTES8 and BYTEARRAY8 with large argument. They allocated the bytes or bytearray object of the specified size before reading into it. Now they read very large data by chunks. * BINSTRING, BINUNICODE, LONG4, BINUNICODE8 and FRAME with large argument. They read the whole data by calling the read() method of the underlying file object, which usually allocates the bytes object of the specified size before reading into it. Now they read very large data by chunks. Also add comprehensive benchmark suite to measure performance and memory impact of chunked reading optimization in PR #119204. Features: - Normal mode: benchmarks legitimate pickles (time/memory metrics) - Antagonistic mode: tests malicious pickles (DoS protection) - Baseline comparison: side-by-side comparison of two Python builds - Support for truncated data and sparse memo attack vectors Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com> Co-authored-by: Gregory P. Smith <greg@krypto.org>
2025-12-08 06:10:17 +00:00 · 2025-12-05 19:17:01 +02:00 · 2025-12-05 19:17:01 +02:00 · 59f247e43b
commit 59f247e43b
parent 4085ff7b32
7 changed files with 1767 additions and 177 deletions
--- a/Lib/test/test_pickle.py
+++ b/Lib/test/test_pickle.py
@ -59,6 +59,8 @@ class PyUnpicklerTests(AbstractUnpickleTests, unittest.TestCase):
    truncated_errors = (pickle.UnpicklingError, EOFError,
                        AttributeError, ValueError,
                        struct.error, IndexError, ImportError)
+    truncated_data_error = (EOFError, '')
+    size_overflow_error = (pickle.UnpicklingError, 'exceeds')

    def loads(self, buf, **kwds):
        f = io.BytesIO(buf)
@ -103,6 +105,8 @@ class InMemoryPickleTests(AbstractPickleTests, AbstractUnpickleTests,
    truncated_errors = (pickle.UnpicklingError, EOFError,
                        AttributeError, ValueError,
                        struct.error, IndexError, ImportError)
+    truncated_data_error = ((pickle.UnpicklingError, EOFError), '')
+    size_overflow_error = ((OverflowError, pickle.UnpicklingError), 'exceeds')

    def dumps(self, arg, protocol=None, **kwargs):
        return pickle.dumps(arg, protocol, **kwargs)
@ -375,6 +379,8 @@ class CUnpicklerTests(PyUnpicklerTests):
        unpickler = _pickle.Unpickler
        bad_stack_errors = (pickle.UnpicklingError,)
        truncated_errors = (pickle.UnpicklingError,)
+        truncated_data_error = (pickle.UnpicklingError, 'truncated')
+        size_overflow_error = (OverflowError, 'exceeds')

    class CPicklingErrorTests(PyPicklingErrorTests):
        pickler = _pickle.Pickler
@ -478,7 +484,7 @@ def test_pickler(self):
                0)  # Write buffer is cleared after every dump().

        def test_unpickler(self):
-            basesize = support.calcobjsize('2P2n2P 2P2n2i5P 2P3n8P2n2i')
+            basesize = support.calcobjsize('2P2n3P 2P2n2i5P 2P3n8P2n2i')
            unpickler = _pickle.Unpickler
            P = struct.calcsize('P')  # Size of memo table entry.
            n = struct.calcsize('n')  # Size of mark table entry.