cpython

mirror of https://github.com/python/cpython.git synced 2025-12-08 06:10:17 +00:00

Author	SHA1	Message	Date
Stefano Rivera	f6dd9c12a8	GH-139914: Handle stack growth direction on HPPA (GH-140028) Adapted from a patch for Python 3.14 submitted to the Debian BTS by John https://bugs.debian.org/1105111#20 Co-authored-by: John David Anglin <dave.anglin@bell.net>	2025-11-17 14:41:22 +01:00
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Mikhail Efimov	d17f28fed5	gh-140373: Correctly emit `PY_UNWIND` event when generator is closed (GH-140767)	2025-10-31 10:09:22 +00:00
Dino Viehland	299de38e61	gh-131776: Expose functions called from the interpreter loop via PyAPI_FUNC (#134242 )	2025-09-17 08:04:02 -07:00
Mark Shannon	a8d9d94784	GH-137959: Replace shim code in jitted code with a single trampoline function. (GH-137961)	2025-08-21 10:40:53 +01:00
Sam Gross	a10152f8fd	gh-137400: Fix thread-safety issues when profiling all threads (gh-137518) There were a few thread-safety issues when profiling or tracing all threads via PyEval_SetProfileAllThreads or PyEval_SetTraceAllThreads: * The loop over thread states could crash if a thread exits concurrently (in both the free threading and default build) * The modification of `c_profilefunc` and `c_tracefunc` wasn't thread-safe on the free threading build.	2025-08-13 14:15:12 -04:00
Petr Viktorin	1b1ae82fab	gh-135755: Move SPECIAL_ constants to a private header (GH-135922) Macros without a `Py`/`_Py` prefix should not be defined in public headers.	2025-06-25 13:03:05 +02:00
Eric Snow	a450a0ddec	gh-135443: Sometimes Fall Back to __main__.__dict__ For Globals (gh-135491) For several builtin functions, we now fall back to __main__.__dict__ for the globals when there is no current frame and _PyInterpreterState_IsRunningMain() returns true. This allows those functions to be run with Interpreter.call(). The affected builtins: * exec() * eval() * globals() * locals() * vars() * dir() We take a similar approach with "stateless" functions, which don't use any global variables.	2025-06-16 17:34:19 -06:00
Mark Shannon	b90ecea9e6	GH-132554: Fix tier2 `FOR_ITER` implementation and optimizations (GH-135137)	2025-06-05 18:53:57 +01:00
Mark Shannon	f6f4e8a662	GH-132554: "Virtual" iterators (GH-132555) * FOR_ITER now pushes either the iterator and NULL or leaves the iterable and pushes tagged zero * NEXT_ITER uses the tagged int as the index into the sequence or, if TOS is NULL, iterates as before.	2025-05-27 15:59:45 +01:00
Mark Shannon	44e4c479fb	GH-124715: Move trashcan mechanism into `Py_Dealloc` (GH-132280)	2025-04-30 11:37:53 +01:00
Pablo Galindo Salgado	99b13775da	gh-131591: Check for remote debug in PyErr_CheckSignals (#132853 ) For the same reasons as running the GC, this will allow sections that run in native code for long periods without executing bytecode to also run the remote debugger protocol without having to wait until bytecode is executed Signed-off-by: Pablo Galindo <pablogsal@gmail.com>	2025-04-23 20:59:41 +01:00
Bénédikt Tran	8a9c6c4d16	gh-128398: improve error messages when incorrectly using `with` and `async with` (#132218 ) Improve the error message with a suggestion when an object supporting the synchronous (resp. asynchronous) context manager protocol is entered using `async with` (resp. `with`) instead of `with` (resp. `async with`).	2025-04-19 10:44:01 +02:00
Pablo Galindo Salgado	2067378e6d	gh-131591: Handle includes for iOS in remote_debugging.c (#132050 )	2025-04-06 21:39:25 +01:00
Chris Eibl	20098719df	GH-131288: Use `_AddressOfReturnAddress` for MSVC in pycore_ceval.h (gh-131289) Use `_AddressOfReturnAddress` in `_Py_get_machine_stack_pointer` to silence MSVC warning in pycore_ceval.h for release builds.	2025-04-04 09:03:12 -04:00
Pablo Galindo Salgado	943cc1431e	gh-131591: Implement PEP 768 (#131937 ) Co-authored-by: Ivona Stojanovic <stojanovic.i@hotmail.com> Co-authored-by: Matt Wozniski <godlygeek@gmail.com>	2025-04-03 16:20:01 +01:00
Victor Stinner	20c5f969dd	gh-131238: Remove more includes from pycore_interp.h (#131480 )	2025-03-19 23:01:32 +01:00
Victor Stinner	b8367e7cf3	gh-130931: Add pycore_typedefs.h internal header (#131396 ) Declare _PyInterpreterFrame and _PyRuntimeState types before declaring their structure members. Break reference cycles between header files.	2025-03-19 15:23:32 +01:00
Mark Shannon	2bef8ea8ea	GH-127705: Use `_PyStackRef`s in the default build. (GH-127875)	2025-03-10 14:06:56 +00:00
Mark Shannon	014223649c	GH-130396: Use computed stack limits on linux (GH-130398) * Implement C recursion protection with limit pointers for Linux, MacOS and Windows * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow	2025-02-25 09:24:48 +00:00
Petr Viktorin	ef29104f7d	GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now (GH130413) Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now Unfortunatlely, the change broke some buildbots. This reverts commit `2498c22fa0`.	2025-02-24 11:16:08 +01:00
Mark Shannon	2498c22fa0	GH-91079: Implement C stack limits using addresses, not counters. (GH-130007) * Implement C recursion protection with limit pointers * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow	2025-02-19 11:44:57 +00:00
Mark Shannon	72f56654d0	GH-128682: Account for escapes in `DECREF_INPUTS` (GH-129953) * Handle escapes in DECREF_INPUTS * Mark a few more functions as escaping * Replace DECREF_INPUTS with PyStackRef_CLOSE where possible	2025-02-12 17:44:59 +00:00
Irit Katriel	c39ae8922b	gh-128799: Add frame of except* to traceback when wrapping a naked exception (#128971 )	2025-01-25 13:00:23 +00:00
mpage	2e95c5ba3b	gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926 ) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.	2024-11-04 11:13:32 -08:00
Sam Gross	3c4a7fa617	gh-124218: Avoid refcount contention on builtins module (GH-125847) This replaces `_PyEval_BuiltinsFromGlobals` with `_PyDict_LoadBuiltinsFromGlobals`, which returns a new reference instead of a borrowed reference. Internally, the new function uses per-thread reference counting when possible to avoid contention on the refcount fields on the builtins module.	2024-10-24 12:44:38 -04:00
Mark Shannon	06ca33020e	GH-125323: Convert DECREF_INPUTS_AND_REUSE_FLOAT into a function that takes PyStackRefs. (GH-125439)	2024-10-14 14:18:57 +01:00
Mark Shannon	da071fa3e8	GH-119866: Spill the stack around escaping calls. (GH-124392) * Spill the evaluation around escaping calls in the generated interpreter and JIT. * The code generator tracks live, cached values so they can be saved to memory when needed. * Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.	2024-10-07 14:56:39 +01:00
Savannah Ostrowski	65f1237098	GH-123516: Improve JIT memory consumption by invalidating cold executors (GH-124443) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>	2024-09-27 00:35:42 +00:00
Ken Jin	8810e286fa	gh-121459: Deferred LOAD_GLOBAL (GH-123128) Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Sam Gross <655866+colesbury@users.noreply.github.com>	2024-09-14 00:23:51 +08:00
Mark Shannon	7aca84e557	GH-117224: Move the body of a few large-ish micro-ops into helper functions (GH-122601)	2024-08-02 16:31:17 +01:00
Brandt Bucher	15d4cd0967	GH-116090: Fire RAISE events from _FOR_ITER_TIER_TWO (GH-122413)	2024-07-29 12:17:47 -07:00
Brandt Bucher	7b36b67b1e	GH-118093: Add tier two support to several instructions (GH-121884)	2024-07-18 14:24:58 -07:00
Ken Jin	22b0de2755	gh-117139: Convert the evaluation stack to stack refs (#118450 ) This PR sets up tagged pointers for CPython. The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly. Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out. This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it. The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs. Please read Include/internal/pycore_stackref.h for more information! --------- Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>	2024-06-27 03:10:43 +08:00
Mark Shannon	9cefcc0ee7	GH-120507: Lower the `BEFORE_WITH` and `BEFORE_ASYNC_WITH` instructions. (#120640 ) * Remove BEFORE_WITH and BEFORE_ASYNC_WITH instructions. * Add LOAD_SPECIAL instruction * Reimplement `with` and `async with` statements using LOAD_SPECIAL	2024-06-18 12:17:46 +01:00
Sam Gross	f3b89a63cb	gh-117657: Fix TSAN reported race in `_PyEval_IsGILEnabled`. (#119921 ) The GIL may be disabled concurrently with this call so we need to use a relaxed atomic load.	2024-06-02 10:19:02 -04:00
Brett Simmers	be1dfccdf2	gh-118727: Don't drop the GIL in `drop_gil()` unless the current thread holds it (#118745 ) `drop_gil()` assumes that its caller is attached, which means that the current thread holds the GIL if and only if the GIL is enabled, and the enabled-state of the GIL won't change. This isn't true, though, because `detach_thread()` calls `_PyEval_ReleaseLock()` after detaching and `_PyThreadState_DeleteCurrent()` calls it after removing the current thread from consideration for stop-the-world requests (effectively detaching it). Fix this by remembering whether or not a thread acquired the GIL when it last attached, in `PyThreadState._status.holds_gil`, and check this in `drop_gil()` instead of `gil->enabled`. This fixes a crash in `test_multiprocessing_pool_circular_import()`, so I've reenabled it.	2024-05-23 16:59:35 -04:00
Brett Simmers	853163d3b5	gh-116322: Enable the GIL while loading C extension modules (#118560 ) Add the ability to enable/disable the GIL at runtime, and use that in the C module loading code. We can't know before running a module init function if it supports free-threading, so the GIL is temporarily enabled before doing so. If the module declares support for running without the GIL, the GIL is later disabled. Otherwise, the GIL is permanently enabled, and will never be disabled again for the life of the current interpreter.	2024-05-06 23:07:23 -04:00
Pablo Galindo Salgado	1b22d801b8	gh-118518: Allow perf to work without frame pointers (#112254 )	2024-05-05 03:07:29 +02:00
Eric Snow	09c2947581	gh-110693: Pending Calls Machinery Cleanups (gh-118296) This does some cleanup in preparation for later changes.	2024-04-26 01:05:51 +00:00
Mark Shannon	f180b31e76	GH-118095: Handle `RETURN_GENERATOR` in tier 2 (GH-118180)	2024-04-25 11:32:47 +01:00
Mark Shannon	2cf18a4430	GH-116422: Modify a few uops so that they can be supported by tier 2 with hot/cold splitting (GH-116832)	2024-03-15 10:48:00 +00:00
Brandt Bucher	f0df35eeca	GH-115802: JIT "small" code for Windows (GH-115964)	2024-02-29 08:11:28 -08:00
Brett Simmers	0749244d13	gh-112175: Add `eval_breaker` to `PyThreadState` (#115194 ) This change adds an `eval_breaker` field to `PyThreadState`. The primary motivation is for performance in free-threaded builds: with thread-local eval breakers, we can stop a specific thread (e.g., for an async exception) without interrupting other threads. The source of truth for the global instrumentation version is stored in the `instrumentation_version` field in PyInterpreterState. Threads usually read the version from their local `eval_breaker`, where it continues to be colocated with the eval breaker bits.	2024-02-20 09:57:48 -05:00
Sam Gross	a3af3cb4f4	gh-110481: Implement inter-thread queue for biased reference counting (#114824 ) Biased reference counting maintains two refcount fields in each object: `ob_ref_local` and `ob_ref_shared`. The true refcount is the sum of these two fields. In some cases, when refcounting operations are split across threads, the ob_ref_shared field can be negative (although the total refcount must be at least zero). In this case, the thread that decremented the refcount requests that the owning thread give up ownership and merge the refcount fields.	2024-02-09 17:08:32 -05:00
Sam Gross	441affc9e7	gh-111964: Implement stop-the-world pauses (gh-112471) The `--disable-gil` builds occasionally need to pause all but one thread. Some examples include: * Cyclic garbage collection, where this is often called a "stop the world event" * Before calling `fork()`, to ensure a consistent state for internal data structures * During interpreter shutdown, to ensure that daemon threads aren't accessing Python objects This adds the following functions to implement global and per-interpreter pauses: * `_PyEval_StopTheWorldAll()` and `_PyEval_StartTheWorldAll()` (for the global runtime) * `_PyEval_StopTheWorld()` and `_PyEval_StartTheWorld()` (per-interpreter) (The function names may change.) These functions are no-ops outside of the `--disable-gil` build.	2024-01-23 11:08:23 -07:00
Sam Gross	a3c031884d	gh-112723: Call `PyThreadState_Clear()` from the correct interpreter (#112776 ) The `PyThreadState_Clear()` function must only be called with the GIL held and must be called from the same interpreter as the passed in thread state. Otherwise, any Python objects on the thread state may be destroyed using the wrong interpreter, leading to memory corruption. This is also important for `Py_GIL_DISABLED` builds because free lists will be associated with PyThreadStates and cleared in `PyThreadState_Clear()`. This fixes two places that called `PyThreadState_Clear()` from the wrong interpreter and adds an assertion to `PyThreadState_Clear()`.	2023-12-12 17:20:21 -07:00
Sam Gross	cf6110ba13	gh-111924: Use PyMutex for Runtime-global Locks. (gh-112207) This replaces some usages of PyThread_type_lock with PyMutex, which does not require memory allocation to initialize. This simplifies some of the runtime initialization and is also one step towards avoiding changing the default raw memory allocator during initialize/finalization, which can be non-thread-safe in some circumstances.	2023-12-07 12:33:40 -07:00
Pablo Galindo Salgado	a73aa48e6b	gh-112367: Only free perf trampoline arenas at shutdown (#112368 ) Signed-off-by: Pablo Galindo <pablogsal@gmail.com>	2023-12-01 13:20:51 +00:00
Tian Gao	e0afed7e27	gh-103615: Use local events for opcode tracing (GH-109472) * Use local monitoring for opcode trace * Remove f_opcode_trace_set * Add test for setting f_trace_opcodes after settrace	2023-11-03 16:39:50 +00:00

1 2 3

141 commits