cpython

mirror of https://github.com/python/cpython.git synced 2025-12-08 06:10:17 +00:00

Author	SHA1	Message	Date
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Ken Jin	a269e691de	gh-139109: Dynamic opcode targets (GH-139111) Make opcode targets table dynamic	2025-09-18 14:12:07 +01:00
Victor Stinner	6504f20cce	gh-135755: Make Py_TAIL_CALL_INTERP macro private (#138981 ) Rename Py_TAIL_CALL_INTERP to _Py_TAIL_CALL_INTERP.	2025-09-18 14:33:07 +02:00
Mark Shannon	6dcb0fdfe0	GH-134282: Always borrow references LOAD_CONST (GH-134284)	2025-05-20 11:24:11 -04:00
Irit Katriel	5529213d4e	gh-100239: specialize BINARY_OP/SUBSCR for list-slice (#132626 )	2025-05-01 10:28:52 +00:00
Russell Keith-Magee	6c522debc2	GH-125515: Remove two unused error branches. (#133181 ) Remove two unused error branches in the generated bytecode handling.	2025-05-01 06:21:57 +08:00
Lysandros Nikolaou	60202609a2	gh-132661: Implement PEP 750 (#132662 ) Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com> Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com> Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com> Co-authored-by: Wingy <git@wingysam.xyz> Co-authored-by: Koudai Aono <koxudaxi@gmail.com> Co-authored-by: Dave Peck <davepeck@gmail.com> Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu> Co-authored-by: Paul Everitt <pauleveritt@me.com> Co-authored-by: sobolevn <mail@sobolevn.me>	2025-04-30 11:46:41 +02:00
mpage	053c285f6b	gh-130704: Strength reduce `LOAD_FAST{_LOAD_FAST}` (#130708 ) Optimize `LOAD_FAST` opcodes into faster versions that load borrowed references onto the operand stack when we can prove that the lifetime of the local outlives the lifetime of the temporary that is loaded onto the stack.	2025-04-01 10:18:42 -07:00
Mark Shannon	89df62c120	GH-128534: Fix behavior of branch monitoring for `async for` (GH-130847) * Both branches in a pair now have a common source and are included in co_branches	2025-03-07 14:30:31 +00:00
Tomasz Pytel	aeb2327386	gh-130574: renumber RESUME opcode from 149 to 128 (GH-130685)	2025-03-06 08:59:36 +00:00
Mark Shannon	2a18e80695	GH-128534: Instrument branches for `async for` loops. (GH-130569)	2025-02-27 09:36:41 +00:00
Ken Jin	359c7dde3b	gh-129989: Properly disable tailcall interp in configure (GH-129991) Co-authored-by: Zanie Blue <contact@zanie.dev>	2025-02-16 03:01:24 +08:00
Irit Katriel	a1417b211f	gh-100239: replace BINARY_SUBSCR & family by BINARY_OP with oparg NB_SUBSCR (#129700 )	2025-02-07 22:39:54 +00:00
Ken Jin	cb640b659e	gh-128563: A new tail-calling interpreter (GH-128718) Co-authored-by: Garrett Gu <garrettgu777@gmail.com> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>	2025-02-06 23:21:57 +08:00
Mark Shannon	75b628adeb	GH-128563: Generate `opcode = ...` in instructions that need `opcode` (GH-129608) * Remove support for GO_TO_INSTRUCTION	2025-02-03 15:09:21 +00:00
Brandt Bucher	828b27680f	GH-126599: Remove the PyOptimizer API (GH-129194)	2025-01-28 16:10:51 -08:00
Mark Shannon	75b4962157	GH-128914: Remove all but one conditional stack effects (GH-129226) * Remove all 'if (0)' and 'if (1)' conditional stack effects * Use array instead of conditional for BUILD_SLICE args * Refactor LOAD_GLOBAL to use a common conditional uop * Remove conditional stack effects from LOAD_ATTR specializations * Replace conditional stack effects in LOAD_ATTR with a 0 or 1 sized array. * Remove conditional stack effects from CALL_FUNCTION_EX	2025-01-27 16:24:48 +00:00
Sam Gross	a10f99375e	Revert "GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)" (GH-129202) The commit introduced a ~2.5-3% regression in the free threading build. This reverts commit `ab61d3f430`.	2025-01-23 09:26:25 +00:00
Mark Shannon	ab61d3f430	GH-128914: Remove conditional stack effects from `bytecodes.c` and the code generators (GH-128918)	2025-01-20 17:09:23 +00:00
Irit Katriel	3893a92d95	gh-100239: specialize long tail of binary operations (#128722 )	2025-01-16 15:22:13 +00:00
Mark Shannon	ddd959987c	GH-128685: Specialize (rather than quicken) LOAD_CONST into LOAD_CONST_[IM]MORTAL (GH-128708)	2025-01-13 10:30:28 +00:00
Mark Shannon	f826beca0c	GH-128375: Better instrument for `FOR_ITER` (GH-128445)	2025-01-06 17:54:47 +00:00
Mark Shannon	d2f1d917e8	GH-122548: Implement branch taken and not taken events for sys.monitoring (GH-122564)	2024-12-19 16:59:51 +00:00
Mark Shannon	faa3272fb8	GH-125837: Split `LOAD_CONST` into three. (GH-125972) * Add LOAD_CONST_IMMORTAL opcode * Add LOAD_SMALL_INT opcode * Remove RETURN_CONST opcode	2024-10-29 11:15:42 +00:00
Mark Shannon	da071fa3e8	GH-119866: Spill the stack around escaping calls. (GH-124392) * Spill the evaluation around escaping calls in the generated interpreter and JIT. * The code generator tracks live, cached values so they can be saved to memory when needed. * Spills the stack pointer around escaping calls, so that the exact stack is visible to the cycle GC.	2024-10-07 14:56:39 +01:00
Mark Shannon	5d3201fe3f	GH-123040: Specialize shadowed `LOAD_ATTR`. (GH-123219)	2024-08-23 10:22:35 +01:00
Mark Shannon	c13e7d98fb	GH-118093: Specialize `CALL_KW` (GH-123006)	2024-08-16 17:11:24 +01:00
Mark Shannon	eec7bdaf01	GH-120024: Remove `CHECK_EVAL_BREAKER` macro. (GH-122968) * Factor some instructions into micro-ops to isolate CHECK_EVAL_BREAKER for escape analysis * Eliminate CHECK_EVAL_BREAKER macro	2024-08-14 12:04:05 +01:00
Mark Shannon	7a65439b93	GH-122390: Replace `_Py_GetbaseOpcode` with `_Py_GetBaseCodeUnit` (GH-122942)	2024-08-13 14:22:57 +01:00
Mark Shannon	95a73917cd	GH-122029: Break INSTRUMENTED_CALL into micro-ops, so that its behavior is consistent with CALL (GH-122177)	2024-07-26 14:35:57 +01:00
Mark Shannon	afb0aa6ed2	GH-121131: Clean up and fix some instrumented instructions. (GH-121132) * Add support for 'prev_instr' to code generator and refactor some INSTRUMENTED instructions	2024-07-26 12:24:12 +01:00
Mark Shannon	2e14a52cce	GH-122160: Remove BUILD_CONST_KEY_MAP opcode. (GH-122164)	2024-07-25 16:24:29 +01:00
Mark Shannon	9cefcc0ee7	GH-120507: Lower the `BEFORE_WITH` and `BEFORE_ASYNC_WITH` instructions. (#120640 ) * Remove BEFORE_WITH and BEFORE_ASYNC_WITH instructions. * Add LOAD_SPECIAL instruction * Reimplement `with` and `async with` statements using LOAD_SPECIAL	2024-06-18 12:17:46 +01:00
Jelle Zijlstra	98e855fcc1	gh-119180: Add LOAD_COMMON_CONSTANT opcode (#119321 ) The PEP 649 implementation will require a way to load NotImplementedError from the bytecode. @markshannon suggested implementing this by converting LOAD_ASSERTION_ERROR into a more general mechanism for loading constants. This PR adds this new opcode. I will work on the rest of the implementation of the PEP separately. Co-authored-by: Irit Katriel <1055913+iritkatriel@users.noreply.github.com>	2024-05-22 00:46:39 +00:00
Mark Shannon	1ab6356ebe	GH-118095: Use broader specializations of CALL in tier 1, for better tier 2 support of calls. (GH-118322) * Add CALL_PY_GENERAL, CALL_BOUND_METHOD_GENERAL and call CALL_NON_PY_GENERAL specializations. * Remove CALL_PY_WITH_DEFAULTS specialization * Use CALL_NON_PY_GENERAL in more cases when otherwise failing to specialize	2024-05-04 12:11:11 +01:00
Ken Jin	41457c7fdb	gh-116381: Remove bad specializations, add fail stats (GH-116464) * Remove bad specializations, add fail stats	2024-03-08 00:21:21 +08:00
Ken Jin	7114cf20c0	gh-116381: Specialize CONTAINS_OP (GH-116385) * Specialize CONTAINS_OP * 📜🤖 Added by blurb_it. * Add PyAPI_FUNC for JIT --------- Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>	2024-03-07 03:30:11 +08:00
Mark Shannon	de8a4e52a5	GH-111485: Generate `TARGET` table for computed goto dispatch. (GH-113319)	2023-12-20 15:09:12 +00:00
Irit Katriel	d49aba5a7a	gh-111354: Simplify _PyGen_yf by moving some of its work to the compiler and frame state (#111648 )	2023-11-03 10:01:36 +00:00
Irit Katriel	52cc4af6ae	gh-111354: simplify detection of RESUME after YIELD_VALUE at except-depth 1 (#111459 )	2023-11-02 10:18:43 +00:00
Brandt Bucher	22e65eecaa	GH-105848: Replace KW_NAMES + CALL with LOAD_CONST + CALL_KW (GH-109300)	2023-09-13 10:25:45 -07:00
Irit Katriel	8b55adfa8f	gh-109256: allocate opcode IDs for internal opcodes in their own range (#109269 )	2023-09-12 10:36:17 +00:00
Mark Shannon	0858328ca2	GH-108614: Add `RESUME_CHECK` instruction (GH-108630)	2023-09-07 14:39:03 +01:00
Irit Katriel	665a4391e1	gh-105481: generate op IDs from bytecode.c instead of hard coding them in opcode.py (#107971 )	2023-08-16 22:25:18 +00:00
Brandt Bucher	ea72c6fe3b	GH-107596: Specialize str[int] (GH-107597)	2023-08-08 13:42:43 -07:00
Mark Shannon	0c90e75610	GH-100288: Specialize LOAD_ATTR for simple class attributes. (#105990 ) * Add two more specializations of LOAD_ATTR.	2023-07-10 11:40:35 +01:00
Brandt Bucher	7b2d94d875	GH-106008: Make implicit boolean conversions explicit (GH-106003)	2023-06-29 13:49:54 -07:00
hms	8bff940ad6	gh-105775: Convert LOAD_CLOSURE to a pseudo-op (#106059 ) This enables super-instruction formation, removal of checks for uninitialized variables, and frees up an instruction.	2023-06-29 09:34:00 -07:00
Mark Shannon	04492cbc9a	GH-91095: Specialize calls to normal Python classes. (GH-99331)	2023-06-22 09:48:19 +01:00
Irit Katriel	33f0a8578b	gh-105481: generate _specializations and _specialized_instructions from bytecodes.c (#105913 )	2023-06-19 23:47:04 +01:00

1 2 3 4

180 commits