cpython/Lib
Serhiy Storchaka 59f247e43b
gh-115952: Fix a potential virtual memory allocation denial of service in pickle (GH-119204)
Loading a small data which does not even involve arbitrary code execution
could consume arbitrary large amount of memory. There were three issues:

* PUT and LONG_BINPUT with large argument (the C implementation only).
  Since the memo is implemented in C as a continuous dynamic array, a single
  opcode can cause its resizing to arbitrary size. Now the sparsity of
  memo indices is limited.
* BINBYTES, BINBYTES8 and BYTEARRAY8 with large argument.  They allocated
  the bytes or bytearray object of the specified size before reading into
  it.  Now they read very large data by chunks.
* BINSTRING, BINUNICODE, LONG4, BINUNICODE8 and FRAME with large
  argument.  They read the whole data by calling the read() method of
  the underlying file object, which usually allocates the bytes object of
  the specified size before reading into it.  Now they read very large data
  by chunks.

Also add comprehensive benchmark suite to measure performance and memory
impact of chunked reading optimization in PR #119204.

Features:
- Normal mode: benchmarks legitimate pickles (time/memory metrics)
- Antagonistic mode: tests malicious pickles (DoS protection)
- Baseline comparison: side-by-side comparison of two Python builds
- Support for truncated data and sparse memo attack vectors

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
Co-authored-by: Gregory P. Smith <greg@krypto.org>
2025-12-05 19:17:01 +02:00
..
__phello__
_pyrepl Drop three unused imports (#141875) 2025-11-23 16:33:05 +00:00
asyncio gh-141863: use bytearray.take_bytes in asyncio streams for better performance (#141864) 2025-11-24 21:06:53 +05:30
collections gh-140911: Ensure that UserString.index() and UserString.rindex() accept UserString as argument (GH-140945) 2025-11-25 15:25:46 +02:00
compression gh-132983: Split `_zstd_set_c_parameters` (#133921) 2025-05-28 14:45:08 +00:00
concurrent gh-139462: Make the ProcessPoolExecutor BrokenProcessPool exception report which child process terminated (GH-139486) 2025-11-11 22:09:58 +00:00
ctypes gh-140041: Fix import of ctypes on Android and Cygwin when ABI flags are present (#140178) 2025-10-16 05:40:39 +08:00
curses gh-133575: eliminate legacy checks in Lib/curses/__init__.py (#133576) 2025-05-07 20:28:32 +02:00
dbm gh-135386: Fix "unable to open database file" errors on readonly DB (GH-135566) 2025-08-22 14:11:59 +03:00
email gh-136702: Deprecate passing non-ascii *encoding* (str) to encodings.normalize_encoding (#140030) 2025-11-09 13:37:34 +01:00
encodings gh-141968: Use take_bytes in encodings.punycode (#141974) 2025-11-28 17:47:14 +00:00
ensurepip gh-140874: Upgrade bundled pip to 25.3 (GH-140876) 2025-11-01 10:25:19 +00:00
html gh-140875: Fix handling of unclosed charrefs before EOF in HTMLParser (GH-140904) 2025-11-19 13:55:10 +02:00
http gh-119451: Fix a potential denial of service in http.client (GH-119454) 2025-12-01 17:26:07 +02:00
idlelib Minor fixes to idle.rst and regenerate help.html (#140037) 2025-11-06 03:21:02 -05:00
importlib gh-141930: Use the regular IO stack to write .pyc files for a better error message on failure (GH-141931) 2025-11-27 19:17:59 +00:00
json GH-141686: Break cycles created by JSONEncoder.iterencode (GH-141687) 2025-11-18 09:51:18 -08:00
logging gh-138162: Fix logging.LoggerAdapter with merge_extra=True and without the extra argument (GH-140511) 2025-10-30 12:52:02 +02:00
multiprocessing gh-142206: multiprocessing.resource_tracker: Decode messages using older protocol (GH-142215) 2025-12-03 12:59:14 +00:00
pathlib GH-139174: Prepare pathlib.Path.info for new methods (part 2) (#140155) 2025-10-18 02:13:25 +01:00
profiling gh-140677 Add heatmap visualization to Tachyon sampling profiler (#140680) 2025-12-02 20:33:40 +00:00
pydoc_data Python 3.15.0a2 2025-11-18 16:51:17 +02:00
re gh-141968: Use take_bytes in re._compiler (#141995) 2025-11-28 17:46:10 +00:00
site-packages
sqlite3 gh-133390: sqlite3 CLI completion for tables, columns, indices, triggers, views, functions, schemata (GH-136101) 2025-10-24 08:26:36 +02:00
string GH-132661: Add `string.templatelib.convert()` (#135217) 2025-07-15 11:56:42 +02:00
sysconfig Replace obsolete platforms with more recent examples (#132455) 2025-10-10 05:38:13 +00:00
test gh-115952: Fix a potential virtual memory allocation denial of service in pickle (GH-119204) 2025-12-05 19:17:01 +02:00
tkinter gh-130693: Support more options for search in tkinter.Text (GH-130848) 2025-11-17 14:42:26 +00:00
tomllib gh-133117: Enable stricter mypy checks for tomllib (#133206) 2025-05-03 16:57:09 +03:00
turtledemo gh-128062: Fix the font size and shortcut display of the turtledemo menu (#128063) 2024-12-19 15:24:47 -05:00
unittest gh-136442: Fix unittest to return exit code 5 when setUpClass raises an exception (#136487) 2025-11-14 16:59:51 -08:00
urllib gh-140691: urllib.request: Close FTP control socket if data socket can't connect (GH-140835) 2025-11-05 11:52:11 +01:00
venv gh-133951: Remove lib64->lib symlink in venv creation (#137139) 2025-10-04 14:55:17 +01:00
wsgiref gh-133810: remove http.server.CGIHTTPRequestHandler and --cgi flag (#133811) 2025-05-17 09:58:16 +02:00
xml gh-142145: Remove quadratic behavior in node ID cache clearing (GH-142146) 2025-12-02 23:16:37 -08:00
xmlrpc gh-136839: Refactor simple dict.update calls (#136811) 2025-07-19 10:12:10 -07:00
zipfile gh-139700: Check consistency of the zip64 end of central directory record (GH-139702) 2025-10-07 20:15:26 +03:00
zoneinfo gh-137976: Explicitly exclude localtime from available_timezones (#138012) 2025-09-18 17:32:14 +01:00
__future__.py
__hello__.py
_aix_support.py
_android_support.py Make Android streams respect the unbuffered (-u) option (#138806) 2025-09-18 11:41:21 +01:00
_apple_support.py
_ast_unparse.py gh-138774: use value to ast.unparse code when str is None in ast.Interpolation (#139415) 2025-10-23 13:56:05 +00:00
_collections_abc.py gh-118803: Make ByteString deprecations louder; remove ByteString from typing.__all__ and collections.abc.__all__ (#139127) 2025-09-18 18:58:16 +00:00
_colorize.py gh-141679: Add colour to defaults in argparse help (#141680) 2025-11-23 00:26:50 +00:00
_compat_pickle.py gh-133810: remove http.server.CGIHTTPRequestHandler and --cgi flag (#133811) 2025-05-17 09:58:16 +02:00
_ios_support.py
_markupbase.py _markupbase.py: Use a permalink for the analysis of MS-Word extensions (GH-129017) 2025-02-06 11:40:43 +01:00
_opcode_metadata.py GH-139109: Support switch/case dispatch with the tracing interpreter. (GH-141703) 2025-11-18 13:31:48 +00:00
_osx_support.py
_py_abc.py
_py_warnings.py gh-140691: urllib.request: Close FTP control socket if data socket can't connect (GH-140835) 2025-11-05 11:52:11 +01:00
_pydatetime.py gh-97517: Add documentation links to datetime strftime/strptime docstrings (#138559) 2025-09-15 19:50:46 +01:00
_pydecimal.py gh-76007: Deprecate __version__ attribute in decimal (#140302) 2025-10-26 12:01:04 +01:00
_pyio.py gh-129005: Remove copies from _pyio using take_bytes (#141539) 2025-11-18 10:10:32 +01:00
_pylong.py
_sitebuiltins.py
_strptime.py gh-81148: Eliminate unnecessary check in _strptime when determining AM/PM (#13428) 2025-09-19 10:23:12 +00:00
_threading_local.py gh-107006: Move threading.local docstring to docs (#131840) 2025-05-05 15:00:15 +03:00
_weakrefset.py
abc.py
annotationlib.py gh-141489: Simplify closure/freevar iteration in annotationlib._build_closure() (#141490) 2025-11-19 18:08:08 -10:00
antigravity.py
argparse.py GH-142267: Cache formatter to avoid repeated _set_color calls (#142268) 2025-12-05 16:47:50 +00:00
ast.py gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
base64.py gh-141968: Use bytearray.take_bytes in base64 _b32encode and _b32decode (#141971) 2025-11-26 21:14:25 +05:30
bdb.py gh-136057: Allow step and next to step over for loops (#136160) 2025-11-16 13:57:07 -08:00
bisect.py
bz2.py gh-132983: Introduce compression package and move _compression module (GH-133018) 2025-04-27 14:41:30 -07:00
calendar.py gh-140212: Add html for year-month option in Calendar (#140230) 2025-10-31 17:28:53 +02:00
cmd.py gh-133363: Fix Cmd completion for lines beginning with ! (#133364) 2025-05-03 22:50:37 -04:00
code.py gh-135103: Remove an unused local variable in Lib/code.py (GH-135104) 2025-06-04 13:57:31 +09:00
codecs.py gh-52876: Implement missing parameter in codecs.StreamReaderWriter functions (#136498) 2025-07-10 17:42:14 +02:00
codeop.py gh-132449: Improve syntax error messages for keywords with typos (#132450) 2025-04-22 11:01:55 +02:00
colorsys.py
compileall.py gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
configparser.py gh-65697: Improved error msg for configparser key validation (#135527) 2025-06-15 12:13:19 -04:00
contextlib.py
contextvars.py
copy.py gh-132657: improve deepcopy and copy scaling on free-threading (#138429) 2025-09-04 13:20:23 +05:30
copyreg.py gh-132882: Fix copying of unions with members that do not support __or__ (#132883) 2025-04-24 16:49:09 +00:00
cProfile.py gh-138122: Implement PEP 799 (#138142) 2025-08-27 17:52:50 +01:00
csv.py gh-137627: Make csv.Sniffer.sniff() delimiter detection 1.6x faster (#137628) 2025-10-23 15:28:29 +03:00
dataclasses.py gh-142214: Fix two regressions in dataclasses (#142223) 2025-12-04 20:04:42 -08:00
datetime.py
decimal.py gh-76007: Deprecate __version__ attribute in decimal (#140302) 2025-10-26 12:01:04 +01:00
difflib.py gh-95953: Add a css class to changed lines of difflib.HtmlDiff make_table (#139226) 2025-09-22 13:19:37 +00:00
dis.py gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
doctest.py GH-90344: replace single-call io.IncrementalNewlineDecoder usage with non-incremental newline decoders (GH-30276) 2025-11-15 00:13:37 +00:00
enum.py gh-140766: [Enum] add show_flag_values and bin to enum.__all__ (GH-140765) 2025-10-30 10:32:55 -07:00
filecmp.py
fileinput.py
fnmatch.py gh-133306: Use \z instead of \Z in fnmatch.translate() and glob.translate() (GH-133338) 2025-05-03 17:58:21 +03:00
fractions.py gh-87790: support thousands separators for formatting fractional part of Fraction (#132204) 2025-07-07 11:16:31 +03:00
ftplib.py
functools.py gh-140873: Add support of non-descriptor callables in functools.singledispatchmethod() (GH-140884) 2025-11-13 19:48:52 +02:00
genericpath.py gh-71189: Support all-but-last mode in os.path.realpath() (GH-117562) 2025-07-30 10:19:19 +03:00
getopt.py
getpass.py gh-138514: getpass: restrict echo_char to a single ASCII character (#138591) 2025-09-16 16:21:55 +02:00
gettext.py gettext: Remove outdated "TODO" comment (#130890) 2025-03-06 23:41:03 +00:00
glob.py docs: be clearer that glob results are unordered (#140184) 2025-10-19 16:16:35 -04:00
graphlib.py gh-130914: Make graphlib.TopologicalSorter.prepare() idempotent (#131317) 2025-03-18 16:28:00 -05:00
gzip.py Remove some dead code from gzip and tarfile (#138123) 2025-08-25 16:23:47 +03:00
hashlib.py gh-136565: use SHA-256 for hashlib.__doc__ example instead of MD5 (#138157) 2025-08-26 10:38:53 +00:00
heapq.py Remove unnecessary slice in heapq.py (gh-139087) 2025-09-18 00:52:46 -05:00
hmac.py gh-136912: fix handling of OverflowError in hmac.digest (#136917) 2025-07-26 08:22:06 +00:00
imaplib.py gh-76007: Deprecate __version__ attribute in imaplib (#140299) 2025-10-20 15:20:44 +03:00
inspect.py gh-131116: Fix inspect.getdoc() to work with cached_property objects (GH-131165) 2025-11-12 10:07:21 +00:00
io.py gh-132952: Speed up startup by importing _io instead of io (#132957) 2025-04-28 08:38:56 -07:00
ipaddress.py gh-141497: Make ipaddress.IP{v4,v6}Network.hosts() always returning an iterator (GH-141547) 2025-11-17 19:29:06 +02:00
keyword.py
linecache.py gh-122255: Synchronize warnings in C and Python implementations of the warnings module (GH-122824) 2025-11-14 16:49:28 +02:00
locale.py gh-137729: Fix support for locales with @-modifiers (GH-137253) 2025-08-18 10:11:15 +03:00
lzma.py gh-132983: Introduce compression package and move _compression module (GH-133018) 2025-04-27 14:41:30 -07:00
mailbox.py gh-140808: Remove __class_getitem__ from mailbox._ProxyFile (#140838) 2025-11-02 13:56:59 -08:00
mimetypes.py gh-141018: Update .exe, .dll, .rtf and .jpg mime types in mimetypes (#141023) 2025-11-17 13:32:00 +02:00
modulefinder.py gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
netrc.py gh-135823: improve error message in netrc security checks (#135827) 2025-06-23 12:49:27 +02:00
ntpath.py gh-136065: Fix quadratic complexity in os.path.expandvars() (GH-134952) 2025-10-31 14:49:51 +01:00
nturl2path.py GH-125866: Deprecate nturl2path module (#131432) 2025-03-19 19:33:01 +00:00
numbers.py gh-122450: Expand documentation for `Rational and Fraction` (#136800) 2025-08-04 02:15:59 +00:00
opcode.py gh-131738: optimize builtin any/all/tuple calls with a generator expression arg (#131737) 2025-03-28 10:35:20 +00:00
operator.py
optparse.py gh-76007: Deprecate __version__ attribute (#138675) 2025-09-29 12:03:23 +03:00
os.py gh-120057: add os.reload_environ to __all__ (#140763) 2025-10-29 21:21:26 +00:00
pdb.py gh-141982: Fix pdb can't set breakpoints on async functions (#141983) 2025-12-01 23:40:02 -08:00
pickle.py gh-115952: Fix a potential virtual memory allocation denial of service in pickle (GH-119204) 2025-12-05 19:17:01 +02:00
pickletools.py gh-131178: Add tests for pickletools command-line interface (#131287) 2025-11-22 19:17:06 +02:00
pkgutil.py gh-131152, pkgutil: Remove unused imports (#131149) 2025-03-12 15:03:36 +01:00
platform.py gh-141600: Fix musl version detection on Void Linux (GH-141602) 2025-11-22 12:17:40 -06:00
plistlib.py gh-119342: Fix a potential denial of service in plistlib (GH-119343) 2025-12-01 17:28:15 +02:00
poplib.py gh-130637: Add validation for numeric response data in stat() method (#130646) 2025-03-02 08:05:40 -05:00
posixpath.py gh-136065: Fix quadratic complexity in os.path.expandvars() (GH-134952) 2025-10-31 14:49:51 +01:00
pprint.py GH-90117: Check for list and tuple before MappingView in pprint (GH-135779) 2025-06-24 14:41:41 -07:00
profile.py gh-138122: Implement PEP 799 (#138142) 2025-08-27 17:52:50 +01:00
pstats.py gh-140137: Handle empty collections in profiling.sampling (#140154) 2025-10-15 14:59:12 +01:00
pty.py
py_compile.py gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
pyclbr.py
pydoc.py gh-132686: Add parameters inherit_class_doc and fallback_to_class_doc for inspect.getdoc() (GH-132691) 2025-11-12 00:01:25 +02:00
queue.py Fix Queue.shutdown docs for condition to unblock a join (gh-137088) 2025-07-25 07:56:28 -06:00
quopri.py
random.py Minor edit: Move comments closer to the code they describe (gh-136477) 2025-07-09 10:23:46 -07:00
reprlib.py gh-135487: fix reprlib.Repr.repr_int when given very large integers (#135506) 2025-06-24 11:09:46 +00:00
rlcompleter.py
runpy.py gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
sched.py
secrets.py
selectors.py
shelve.py Drop three unused imports (#141875) 2025-11-23 16:33:05 +00:00
shlex.py gh-138804: Check type in shlex.quote (GH-138809) 2025-09-12 14:26:21 -04:00
shutil.py gh-132983: Add missing references to Zstandard in shutil docstrings (GH-136617) 2025-07-23 18:09:53 +00:00
signal.py
site.py gh-138993: Dedent credits text (#138994) 2025-10-14 11:15:17 +03:00
smtplib.py gh-139434: Update selected RFC 2822 references to RFC 5322 (#139435) 2025-11-04 14:46:07 -05:00
socket.py gh-99813: Start using SSL_sendfile when available (#99907) 2025-07-12 12:42:35 +00:00
socketserver.py gh-76007: Deprecate __version__ attribute (#138675) 2025-09-29 12:03:23 +03:00
ssl.py gh-88046: remove impossible conditional import for _ssl.RAND_egd (#139648) 2025-10-09 11:14:36 +02:00
stat.py gh-83714: Implement os.statx() function (#139178) 2025-10-15 13:44:08 +00:00
statistics.py gh-140938: Raise ValueError for infinite inputs to stdev/pstdev (GH-141531) 2025-11-14 23:25:45 +00:00
stringprep.py
struct.py
subprocess.py gh-87512: Fix subprocess using timeout= on Windows blocking with a large input= (GH-142058) 2025-11-28 22:07:03 -08:00
symtable.py gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00
tabnanny.py gh-76007: Deprecate __version__ attribute (#138675) 2025-09-29 12:03:23 +03:00
tarfile.py gh-57911: Sanitize symlink targets in tarfile on win32 (GH-138309) 2025-09-05 16:19:47 +02:00
tempfile.py gh-136156: Allow using linkat() with TemporaryFile (#136281) 2025-07-08 18:39:47 +02:00
textwrap.py gh-139065: Fix trailing space before long word in textwrap (GH-139070) 2025-10-10 16:29:18 +03:00
this.py
threading.py gh-114827: clarify threading.Event.wait timeout behavior (#114834) 2025-10-14 11:34:53 +03:00
timeit.py gh-139374: colorize traceback when using timeit command-line interface (#139375) 2025-09-28 11:49:18 +00:00
token.py gh-131507: Add support for syntax highlighting in PyREPL (GH-133247) 2025-05-02 20:22:31 +02:00
tokenize.py gh-63161: Fix tokenize.detect_encoding() (GH-139446) 2025-10-20 20:08:47 +03:00
trace.py gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
traceback.py gh-139707: Add mechanism for distributors to supply error messages for missing stdlib modules (GH-140783) 2025-12-01 14:36:17 +01:00
tracemalloc.py
tty.py
turtle.py gh-138772: Fix and improve documentation for turtle color functions (GH-139325) 2025-10-13 18:32:16 +03:00
types.py gh-136492: Add FrameLocalsProxyType to types (GH-136546) 2025-07-20 20:49:00 +02:00
typing.py gh-133601: Remove deprecated typing.no_type_check_decorator (#133602) 2025-10-20 21:10:44 +00:00
uuid.py gh-76760: test that uuid.uuid1() sets the version field (#139033) 2025-09-17 13:31:51 +00:00
warnings.py gh-128384: Use a context variable for warnings.catch_warnings (gh-130010) 2025-04-09 16:18:54 -07:00
wave.py gh-141968: use bytearray.take_bytes in wave._byteswap (#141973) 2025-11-26 21:15:12 +05:30
weakref.py
webbrowser.py gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
zipapp.py gh-130645: Add color to stdlib argparse CLIs (gh-133380) 2025-05-05 19:46:46 +02:00
zipimport.py gh-135801: Add the module parameter to compile() etc (GH-139652) 2025-11-13 13:21:32 +02:00