cpython

mirror of https://github.com/python/cpython.git synced 2025-12-08 06:10:17 +00:00

Author	SHA1	Message	Date
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Mark Shannon	3b83257366	GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379)	2025-09-18 10:09:59 +01:00
Mark Shannon	af15e1d13e	GH-132532: Add new DSL macros to better declare semantics of exits at ends of instructions/uops. (GH-137098)	2025-08-09 15:41:28 +01:00
Mark Shannon	e7b55f564d	GH-136410: Faster side exits by using a cold exit stub (GH-136411)	2025-08-01 16:26:07 +01:00
Ken Jin	569fc6870f	gh-134584: Specialize POP_TOP by reference and type in JIT (GH-135761)	2025-06-24 00:57:14 +08:00
Mark Shannon	9731dd2c8d	GH-135379: Specialize int operations for compact ints only (GH-135668)	2025-06-19 11:10:29 +01:00
Ken Jin	fba5dded6d	gh-134584: Decref elimination for float ops in the JIT (GH-134588) This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.	2025-06-17 23:25:53 +08:00
Mark Shannon	8dd8b5c2f0	GH-135379: Support limited scalar replacement for replicated uops in the JIT code generator. (GH-135563) * Use it to support efficient specializations of COPY and SWAP in the JIT.	2025-06-17 13:43:09 +01:00
Mark Shannon	f6f4e8a662	GH-132554: "Virtual" iterators (GH-132555) * FOR_ITER now pushes either the iterator and NULL or leaves the iterable and pushes tagged zero * NEXT_ITER uses the tagged int as the index into the sequence or, if TOS is NULL, iterates as before.	2025-05-27 15:59:45 +01:00
Tomas R.	484e00379b	GH-131798: Optimize away isinstance calls in the JIT (GH-134369)	2025-05-22 12:52:47 -04:00
Brandt Bucher	ec736e7dae	GH-131798: Optimize cached class attributes and methods in the JIT (GH-134403)	2025-05-22 11:15:03 -04:00
Brandt Bucher	2f0570caf4	GH-131798: Narrow types more aggressively in the JIT (GH-134373)	2025-05-20 18:09:51 -04:00
Mark Shannon	6dcb0fdfe0	GH-134282: Always borrow references LOAD_CONST (GH-134284)	2025-05-20 11:24:11 -04:00
Tomas R.	a7f317d730	GH-131798: Add _POP_CALL_TWO_LOAD_CONST_INLINE_BORROW (GH-134268)	2025-05-19 18:00:53 -04:00
Diego Russo	42d03f3933	GH-131798: Split CALL_LIST_APPEND into several uops (GH-134240)	2025-05-19 15:48:55 -04:00
Tomas R.	c492ac7252	GH-131798: Split up and optimize CALL_ISINSTANCE (GH-133339)	2025-05-08 14:26:30 -07:00
Diego Russo	9cc77aaf9d	GH-131798: Split CALL_LEN into several uops (GH-133180)	2025-05-05 14:31:48 -07:00
Ken Jin	ddac7ac59a	gh-132744: Check recursion limit in CALL_PY_GENERAL (GH-132746)	2025-05-02 17:36:29 +01:00
Irit Katriel	5529213d4e	gh-100239: specialize BINARY_OP/SUBSCR for list-slice (#132626 )	2025-05-01 10:28:52 +00:00
Lysandros Nikolaou	60202609a2	gh-132661: Implement PEP 750 (#132662 ) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Wingy <git@wingysam.xyz> Co-authored-by: Koudai Aono <koxudaxi@gmail.com> Co-authored-by: Dave Peck <davepeck@gmail.com> Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: Paul Everitt <pauleveritt@me.com> Co-authored-by: sobolevn <mail@sobolevn.me>	2025-04-30 11:46:41 +02:00
Tomas R.	08e3389e8c	GH-131798: Split up and optimize CALL_TUPLE_1 in the JIT (GH-132851)	2025-04-24 15:55:03 -07:00
Tomas R.	0a387b311e	GH-131798: Split up and optimize CALL_STR_1 in the JIT (GH-132849)	2025-04-24 12:54:46 -07:00
Tomas R.	a6a3dbb7db	GH-131798: JIT: Split CALL_TYPE_1 into several uops (GH-132419)	2025-04-22 09:30:38 -07:00
Sam Gross	da53660f35	gh-131586: Avoid refcount contention in context managers (gh-131851) This avoid reference count contention in the free threading build when calling special methods like `__enter__` and `__exit__`.	2025-04-21 15:54:25 -04:00
Brandt Bucher	20926c73b5	GH-131798: Remove JIT guards for dict, frozenset, list, set, and tuple (GH-132289)	2025-04-09 14:32:21 -07:00
Brandt Bucher	3a8cefba0b	GH-131726: Split up _CHECK_VALIDITY_AND_SET_IP (GH-131810)	2025-04-01 16:55:05 -07:00
Brandt Bucher	1a9d4a1fb3	GH-131798: Allow the JIT to remove more int/float/str guards (GH-131800)	2025-04-01 15:10:15 -07:00
mpage	053c285f6b	gh-130704: Strength reduce `LOAD_FAST{_LOAD_FAST}` (#130708 ) Optimize `LOAD_FAST` opcodes into faster versions that load borrowed references onto the operand stack when we can prove that the lifetime of the local outlives the lifetime of the temporary that is loaded onto the stack.	2025-04-01 10:18:42 -07:00
Amit Lavon	685fd74f81	GH-131798: Remove type checks for _TO_BOOL_STR (GH-131816)	2025-03-30 16:07:25 -07:00
Savannah Ostrowski	b92ee14b80	GH-130415: Optimize constant comparison in JIT builds (GH-131489)	2025-03-21 11:23:12 -07:00
T. Wouters	de2f7da77d	gh-115999: Add free-threaded specialization for FOR_ITER (#128798 ) Add free-threaded versions of existing specialization for FOR_ITER (list, tuples, fast range iterators and generators), without significantly affecting their thread-safety. (Iterating over shared lists/tuples/ranges should be fine like before. Reusing iterators between threads is not fine, like before. Sharing generators between threads is a recipe for significant crashes, like before.)	2025-03-12 16:21:46 +01:00
Mark Shannon	54965f3fb2	GH-130296: Avoid stack transients in four instructions. (GH-130310) * Combine _GUARD_GLOBALS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_MODULE_FROM_KEYS into _LOAD_GLOBAL_MODULE * Combine _GUARD_BUILTINS_VERSION_PUSH_KEYS and _LOAD_GLOBAL_BUILTINS_FROM_KEYS into _LOAD_GLOBAL_BUILTINS * Combine _CHECK_ATTR_MODULE_PUSH_KEYS and _LOAD_ATTR_MODULE_FROM_KEYS into _LOAD_ATTR_MODULE * Remove stack transient in LOAD_ATTR_WITH_HINT	2025-02-28 18:00:38 +00:00
Irit Katriel	a1417b211f	gh-100239: replace BINARY_SUBSCR & family by BINARY_OP with oparg NB_SUBSCR (#129700 )	2025-02-07 22:39:54 +00:00
Brandt Bucher	5fa7e1b7fd	GH-129715: Remove _DYNAMIC_EXIT (GH-129716)	2025-02-07 11:41:17 -08:00
Mark Shannon	75b628adeb	GH-128563: Generate `opcode = ...` in instructions that need `opcode` (GH-129608) * Remove support for GO_TO_INSTRUCTION	2025-02-03 15:09:21 +00:00
Mark Shannon	75b4962157	GH-128914: Remove all but one conditional stack effects (GH-129226) * Remove all 'if (0)' and 'if (1)' conditional stack effects * Use array instead of conditional for BUILD_SLICE args * Refactor LOAD_GLOBAL to use a common conditional uop * Remove conditional stack effects from LOAD_ATTR specializations * Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array. * Remove conditional stack effects from CALL_FUNCTION_EX	2025-01-27 16:24:48 +00:00
Sam Gross	a10f99375e	Revert "GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)" (GH-129202) The commit introduced a ~2.5-3% regression in the free threading build. This reverts commit `ab61d3f430`.	2025-01-23 09:26:25 +00:00
Mark Shannon	ab61d3f430	GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)	2025-01-20 17:09:23 +00:00
Xuanteng Huang	b44ff6d0df	GH-126599: Remove the "counter" optimizer/executor (GH-126853)	2025-01-16 15:57:04 -08:00
Irit Katriel	3893a92d95	gh-100239: specialize long tail of binary operations (#128722 )	2025-01-16 15:22:13 +00:00
Mark Shannon	ddd959987c	GH-128685: Specialize (rather than quicken) LOAD_CONST into LOAD_CONST_[IM]MORTAL (GH-128708)	2025-01-13 10:30:28 +00:00
Mark Shannon	f826beca0c	GH-128375: Better instrument for `FOR_ITER` (GH-128445)	2025-01-06 17:54:47 +00:00
Neil Schemenauer	1b15c89a17	gh-115999: Specialize `STORE_ATTR` in free-threaded builds. (gh-127838) * Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`. * Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and `tp_version_tag` cannot change. * Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). * Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. * In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock. * Avoid borrowed references in `_Py_Specialize_StoreAttr()`. * Use `specialize()` and `unspecialize()` helpers. * Add unit tests to ensure specializing happens as expected in FT builds. * Add unit tests to attempt to trigger data races (useful for running under TSAN). * Add `has_split_table` function to `_testinternalcapi`.	2024-12-19 10:21:17 -08:00
Mark Shannon	d2f1d917e8	GH-122548: Implement branch taken and not taken events for sys.monitoring (GH-122564)	2024-12-19 16:59:51 +00:00
mpage	2de048ce79	gh-115999: Specialize loading attributes from modules in free-threaded builds (#127711 ) We use the same approach that was used for specialization of LOAD_GLOBAL in free-threaded builds: _CHECK_ATTR_MODULE is renamed to _CHECK_ATTR_MODULE_PUSH_KEYS; it pushes the keys object for the following _LOAD_ATTR_MODULE_FROM_KEYS (nee _LOAD_ATTR_MODULE). This arrangement avoids having to recheck the keys version. _LOAD_ATTR_MODULE is renamed to _LOAD_ATTR_MODULE_FROM_KEYS; it loads the value from the keys object pushed by the preceding _CHECK_ATTR_MODULE_PUSH_KEYS at the cached index.	2024-12-13 10:17:16 -08:00
Ken Jin	6293d00e72	gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846) Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>	2024-11-09 11:35:33 +08:00
mpage	2e95c5ba3b	gh-115999: Implement thread-local bytecode and enable specialization for `BINARY_OP` (#123926 ) Each thread specializes a thread-local copy of the bytecode, created on the first RESUME, in free-threaded builds. All copies of the bytecode for a code object are stored in the co_tlbc array on the code object. Threads reserve a globally unique index identifying its copy of the bytecode in all co_tlbc arrays at thread creation and release the index at thread destruction. The first entry in every co_tlbc array always points to the "main" copy of the bytecode that is stored at the end of the code object. This ensures that no bytecode is copied for programs that do not use threads. Thread-local bytecode can be disabled at runtime by providing either -X tlbc=0 or PYTHON_TLBC=0. Disabling thread-local bytecode also disables specialization. Concurrent modifications to the bytecode made by the specializing interpreter and instrumentation use atomics, with specialization taking care not to overwrite an instruction that was instrumented concurrently.	2024-11-04 11:13:32 -08:00
Mark Shannon	faa3272fb8	GH-125837: Split `LOAD_CONST` into three. (GH-125972) * Add LOAD_CONST_IMMORTAL opcode * Add LOAD_SMALL_INT opcode * Remove RETURN_CONST opcode	2024-10-29 11:15:42 +00:00
mpage	f978fb4f8d	gh-115999: Refactor `LOAD_GLOBAL` specializations to avoid reloading {globals, builtins} keys (gh-124953) Each of the `LOAD_GLOBAL` specializations is implemented roughly as: 1. Load keys version. 2. Load cached keys version. 3. Deopt if (1) and (2) don't match. 4. Load keys. 5. Load cached index into keys. 6. Load object from (4) at offset from (5). This is not thread-safe in free-threaded builds; the keys object may be replaced in between steps (3) and (4). This change refactors the specializations to avoid reloading the keys object and instead pass the keys object from guards to be consumed by downstream uops.	2024-10-09 15:18:25 +00:00
Mark Shannon	da071fa3e8	GH-119866: Spill the stack around escaping calls. (GH-124392) * Spill the evaluation around escaping calls in the generated interpreter and JIT. * The code generator tracks live, cached values so they can be saved to memory when needed. * Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.	2024-10-07 14:56:39 +01:00

1 2

95 commits