cpython

mirror of https://github.com/python/cpython.git synced 2025-12-08 06:10:17 +00:00

Author	SHA1	Message	Date
Brandt Bucher	336366fd7c	GH-140643: Add `<native>` and `<GC>` frames to the sampling profiler (#141108 ) - Introduce a new field in the GC state to store the frame that initiated garbage collection. - Update RemoteUnwinder to include options for including "<native>" and "<GC>" frames in the stack trace. - Modify the sampling profiler to accept parameters for controlling the inclusion of native and GC frames. - Enhance the stack collector to properly format and append these frames during profiling. - Add tests to verify the correct behavior of the profiler with respect to native and GC frames, including options to exclude them. Co-authored-by: Pablo Galindo Salgado <pablogsal@gmail.com>	2025-11-17 13:39:00 +00:00
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Neil Schemenauer	c98c5b3449	gh-131253: free-threaded build support for pystats (gh-137189) Allow the --enable-pystats build option to be used with free-threading. The stats are now stored on a per-interpreter basis, rather than process global. For free-threaded builds, the stats structure is allocated per-thread and then periodically merged into the per-interpreter stats structure (on thread exit or when the reporting function is called). Most of the pystats related code has be moved into the file Python/pystats.c.	2025-11-03 11:36:37 -08:00
alm	1753ccb432	gh-138050: [WIP] JIT - Streamline MAKE_WARM - move coldness check to executor creation (GH-138240)	2025-10-27 16:37:37 +00:00
Kumar Aditya	58c44c2bf2	gh-140067: Fix memory leak in sub-interpreter creation (#140111 ) (#140261 ) Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally. Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-10-18 16:36:58 +05:30
Peter Bierma	0bcb1c25f7	Revert "gh-140067: Fix memory leak in sub-interpreter creation (#140111 )" (#140140 ) This reverts commit `59547a251f`.	2025-10-15 07:16:43 +05:30
Shamil	59547a251f	gh-140067: Fix memory leak in sub-interpreter creation (#140111 ) Fix memory leak in sub-interpreter creation caused by overwriting of the previously used `_malloced` field. Now the pointer is stored in the first word of the memory block to avoid it being overwritten accidentally. Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-10-14 14:42:17 +00:00
Sergey Miryanov	e6e376a760	gh-132042: Remove resolve_slotdups() to speedup class creation (#132156 ) Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: sobolevn <mail@sobolevn.me> Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-10-03 11:58:00 +02:00
Sergey Miryanov	b42af37ced	GH-138355: Remove trash_delete_later from _gc_runtime_state (#138767 ) Remove trash_delete_later and trash_delete_nesting from _gc_runtime_state.	2025-09-17 21:25:24 +01:00
Donghee Na	d873fb42f3	gh-137838: Move _PyUOpInstruction buffer to PyInterpreterState (gh-138918)	2025-09-17 18:50:16 +01:00
Mark Shannon	a8d9d94784	GH-137959: Replace shim code in jitted code with a single trampoline function. (GH-137961)	2025-08-21 10:40:53 +01:00
Petr Viktorin	7dfa048bbb	gh-135228: Create __dict__ and __weakref__ descriptors for object (GH-136966) This partially reverts #137047, keeping the tests for GC collectability of the original class that dataclass adds `__slots__` to. The reference leaks solved there are instead solved by having the `__dict__` & `__weakref__` descriptors not tied to (and referencing) their class. Instead, they're shared between all classes that need them (within an interpreter). The `__objclass__` ol the descriptors is set to `object`, since these descriptors work with any object. (The appropriate checks were already made in the get/set code, so the `__objclass__` check was redundant.) The repr of these descriptors (and any others whose `__objclass__` is `object`) now doesn't mention the objclass. This change required adjustment of introspection code that checks `__objclass__` to determine an object's “own” (i.e. not inherited) `__dict__`. Third-party code that does similar introspection of the internals will also need adjusting. Co-authored-by: Jelle Zijlstra <jelle.zijlstra@gmail.com>	2025-08-18 14:25:51 +02:00
Sam Gross	a10152f8fd	gh-137400: Fix thread-safety issues when profiling all threads (gh-137518) There were a few thread-safety issues when profiling or tracing all threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads: * The loop over thread states could crash if a thread exits concurrently (in both the free threading and default build) * The modification of `c_profilefunc` and `c_tracefunc` wasn't thread-safe on the free threading build.	2025-08-13 14:15:12 -04:00
Dino Viehland	375f484f97	gh-137291: Support perf profiler with an evaluation hook (#137292 ) Support perf profiler with an evaluation hook	2025-08-07 14:54:12 -07:00
Mark Shannon	e7b55f564d	GH-136410: Faster side exits by using a cold exit stub (GH-136411)	2025-08-01 16:26:07 +01:00
Victor Stinner	2c9a8011c6	gh-135906: Test the internal C API in test_cext (#136247 ) Remove duplicated definition: atexit_datacallbackfunc type is already defined by Include/cpython/pylifecycle.h.	2025-07-11 16:48:43 +02:00
Pablo Galindo Salgado	236f733d8f	gh-136541: Fix several problems of perf trampolines in x86_64 and aarch64 (#136500 ) This commit fixes the following problems: * The x86_64 trampolines are not preserving frame pointers * The hardcoded offsets to the code segment from the FDE only worked properly for x64_64 * The CIE data was not following conventions of aarch64 * The eh_frame for aarch64 was not fully correct	2025-07-11 14:32:35 +01:00
Petr Viktorin	49d72365cd	gh-127545: Add _Py_ALIGNED_DEF(N, T) and use it for PyObject (GH-135209) * Replace _Py_ALIGN_AS(V) by _Py_ALIGNED_DEF(N, T) This is now a common façade for the various `_Alignas` alternatives, which behave in interesting ways -- see the source comment. The new macro (and MSVC's `__declspec(align)`) should not be used on a variable/member declaration that includes a struct declaraton. A workaround is to separate the struct definition. Do that for `PyASCIIObject.state`. * Specify minimum PyGC_Head and PyObject alignment As documented in InternalDocs/garbage_collector.md, the garbage collector stores flags in the least significant two bits of the _gc_prev pointer in struct PyGC_Head. Consequently, this pointer is only capable of storing a location that's aligned to a 4-byte boundary. Encode this requirement using _Py_ALIGNED_DEF. This patch fixes a segfault in m68k, which was previously investigated by Adrian Glaubitz here: https://lists.debian.org/debian-68k/2024/11/msg00020.html https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1087600 Original patch (using the GCC-only Py_ALIGNED) by Finn Thain. Co-authored-by: Finn Thain <fthain@linux-m68k.org> Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>	2025-06-11 12:44:58 +02:00
Neil Schemenauer	fbbbc10055	gh-127266: avoid data races when updating type slots (gh-133177) In the free-threaded build, avoid data races caused by updating type slots or type flags after the type was initially created. For those (typically rare) cases, use the stop-the-world mechanism. Remove the use of atomics when reading or writing type flags.	2025-05-27 18:27:41 -07:00
Pablo Galindo Salgado	42b25ad4d3	gh-91048: Refactor and optimize remote debugging module (#134652 ) Completely refactor Modules/_remote_debugging_module.c with improved code organization, replacing scattered reference counting and error handling with centralized goto error paths. This cleanup improves maintainability and reduces code duplication throughout the module while preserving the same external API. Implement memory page caching optimization in Python/remote_debug.h to avoid repeated reads of the same memory regions during debugging operations. The cache stores previously read memory pages and reuses them for subsequent reads, significantly reducing system calls and improving performance. Add code object caching mechanism with a new code_object_generation field in the interpreter state that tracks when code object caches need invalidation. This allows efficient reuse of parsed code object metadata and eliminates redundant processing of the same code objects across debugging sessions. Optimize memory operations by replacing multiple individual structure copies with single bulk reads for the same data structures. This reduces the number of memory operations and system calls required to gather debugging information from the target process. Update Makefile.pre.in to include Python/remote_debug.h in the headers list, ensuring that changes to the remote debugging header force proper recompilation of dependent modules and maintain build consistency across the codebase. Also, make the module compatible with the free threading build as an extra :) Co-authored-by: Łukasz Langa <lukasz@langa.pl>	2025-05-25 20:19:29 +00:00
Neil Schemenauer	893034cf93	gh-132917: Use RSS + swap for estimate of process memory usage (gh-133464)	2025-05-05 14:15:05 -07:00
Neil Schemenauer	5c245ffce7	gh-132917: Check resident set size (RSS) before GC trigger. (gh-133399) For the free-threaded build, check the process resident set size (RSS) increase before triggering a full automatic garbage collection. If the RSS has not increased 10% since the last collection then it is deferred.	2025-05-05 17:17:05 +00:00
Mark Shannon	ac7d5ba96e	GH-133231: Changes to executor management to support proposed `sys._jit` module (GH-133287) * Track the current executor, not the previous one, on the thread-state. * Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.	2025-05-04 10:05:35 +01:00
Neil Schemenauer	eecafc3380	Revert gh-127266: avoid data races when updating type slots (gh-131174) (gh-133129) This is triggering deadlocks in test_opcache. See GH-133130 for stack trace.	2025-04-28 23:38:29 -07:00
Neil Schemenauer	e414a2d81c	gh-127266: avoid data races when updating type slots (gh-131174) In the free-threaded build, avoid data races caused by updating type slots or type flags after the type was initially created. For those (typically rare) cases, use the stop-the-world mechanism. Remove the use of atomics when reading or writing type flags. The use of atomics is not sufficient to avoid races (since flags are sometimes read without a lock and without atomics) and are no longer required.	2025-04-28 20:28:44 +00:00
Eric Snow	2a28b21a51	gh-132776: Revert Moving memoryview XIData Code to memoryobject.c (gh-132960) This is a partial revert of gh-132821. It resolves the refleak introduced by that PR.	2025-04-25 16:43:50 +00:00
Eric Snow	e54e828852	gh-132776: Cleanup for XIBufferViewType (gh-132821) * add notes * rename XIBufferViewObject to xibufferview * move memoryview XIData code to memoryobject.c	2025-04-24 18:25:29 -06:00
Bénédikt Tran	246ed23456	gh-127117: ensure that `_initial_thread` is the last field of `PyInterpreterState` when `Py_STACKREF_DEBUG` is defined (#132721 )	2025-04-20 11:53:00 +05:30
Bénédikt Tran	427e7fc099	gh-132399: ensure correct alignment of `PyInterpreterState` (#132428 )	2025-04-19 11:03:06 +02:00
Neil Schemenauer	d687900f98	gh-128384: Use a context variable for warnings.catch_warnings (gh-130010) Make `warnings.catch_warnings()` use a context variable for holding the warning filtering state if the `sys.flags.context_aware_warnings` flag is set to true. This makes using the context manager thread-safe in multi-threaded programs. Add the `sys.flags.thread_inherit_context` flag. If true, starting a new thread with `threading.Thread` will use a copy of the context from the caller of `Thread.start()`. Both these flags are set to true by default for the free-threaded build and false for the default build. Move the Python implementation of warnings.py into _py_warnings.py. Make _contextvars a builtin module. Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-04-09 16:18:54 -07:00
Irit Katriel	2c8f329dc6	gh-131738: optimize builtin any/all/tuple calls with a generator expression arg (#131737 )	2025-03-28 10:35:20 +00:00
Victor Stinner	6827c5129c	gh-131238: Move pycore_obmalloc.h include to pycore_runtime_structs.h (#131482 ) Move pycore_obmalloc.h include from pycore_interp_structs.h to pycore_runtime_structs.h. Add also comment explaining the purpose of each include in pycore_interp_structs.h, pycore_runtime_structs.h and pycore_structs.h. Remove <stdbool.h> and <stddef.h> from pycore_structs.h.	2025-03-19 23:32:30 +00:00
Victor Stinner	4b54031323	gh-131238: Remove pycore_runtime.h from pycore_pystate.h (#131356 ) * Remove includes from pycore_pystate.h: * pycore_runtime_structs.h * pycore_runtime.h * pycore_tstate.h * pycore_interp.h * Reorganize internal headers. Move _gc_thread_state from pycore_interp_structs.h to pycore_tstate.h. * Add 3 new header files to PCbuild/pythoncore.vcxproj.	2025-03-19 17:33:24 +01:00
Victor Stinner	b8367e7cf3	gh-130931: Add pycore_typedefs.h internal header (#131396 ) Declare _PyInterpreterFrame and _PyRuntimeState types before declaring their structure members. Break reference cycles between header files.	2025-03-19 15:23:32 +01:00
Mark Shannon	a1aeec61c4	GH-131238: Core header refactor (GH-131250) * Moves most structs in pycore_ header files into pycore_structs.h and pycore_runtime_structs.h * Removes many cross-header dependencies	2025-03-17 09:19:04 +00:00

35 commits