cpython

mirror of https://github.com/python/cpython.git synced 2025-12-08 06:10:17 +00:00

Author	SHA1	Message	Date
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Victor Stinner	166cdaa6fb	gh-111489: Remove _PyTuple_FromArray() alias (#139973 ) Replace _PyTuple_FromArray() with PyTuple_FromArray(). Remove pycore_tuple.h includes.	2025-10-11 22:58:14 +02:00
Mark Shannon	3b83257366	GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379)	2025-09-18 10:09:59 +01:00
AN Long	1ff2cbbac8	gh-137136: Suppress build warnings when build on Windows with --experimental-jit-interpreter (GH-137137)	2025-09-03 15:42:26 +01:00
Ken Jin	7fda8b66de	gh-137728 gh-137762: Fix bugs in the JIT with many local variables (GH-137764)	2025-08-20 22:53:54 +08:00
Ken Jin	ff7b5d44a0	gh-132732: Fix up pure types in JIT (GH-136050) Fix up pure types in JIT	2025-06-28 18:30:30 +08:00
Ken Jin	c419af9e27	gh-132732: JIT: Only allow compact ints in pure evaluation (GH-136040)	2025-06-28 00:18:44 +08:00
Ken Jin	695ab61351	gh-132732: Automatically constant evaluate pure operations (GH-132733) This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats). Co-authored-by: Tomas R. <tomas.roun8@gmail.com>	2025-06-27 19:37:44 +08:00
Ken Jin	569fc6870f	gh-134584: Specialize POP_TOP by reference and type in JIT (GH-135761)	2025-06-24 00:57:14 +08:00
Ken Jin	0243260284	gh-135379: Move PyLong_CheckCompact to private header and rename it (GH-135707)	2025-06-19 13:09:09 +00:00
Mark Shannon	9731dd2c8d	GH-135379: Specialize int operations for compact ints only (GH-135668)	2025-06-19 11:10:29 +01:00
Ken Jin	fba5dded6d	gh-134584: Decref elimination for float ops in the JIT (GH-134588) This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.	2025-06-17 23:25:53 +08:00
Brandt Bucher	ec736e7dae	GH-131798: Optimize cached class attributes and methods in the JIT (GH-134403)	2025-05-22 11:15:03 -04:00
Brandt Bucher	2f0570caf4	GH-131798: Narrow types more aggressively in the JIT (GH-134373)	2025-05-20 18:09:51 -04:00
Bénédikt Tran	883c2f682b	GH-131331: Rename "not" to "invert" (GH-131334)	2025-03-20 16:59:41 -07:00
Mark Shannon	7ebd71ee14	GH-131498: Remove conditional stack effects (GH-131499) * Adds some missing #includes	2025-03-20 15:39:38 +00:00
Mark Shannon	a45f25361d	GH-131238: More refactoring of core header files (GH-131351) Adds new pycore_stats.h header file to help break dependencies involving the pycore_code.h header.	2025-03-17 14:41:05 +00:00
Amit Lavon	691354ccb0	GH-130415: Narrow str to "" based on boolean tests (GH-130476)	2025-03-04 13:20:17 -08:00
Klaus117	c989e74446	GH-130415: Narrow int to 0 based on boolean tests (GH-130772)	2025-03-04 12:44:09 -08:00
Brandt Bucher	7afa476874	GH-130415: Use boolean guards to narrow types to values in the JIT (GH-130659)	2025-03-02 13:21:34 -08:00
Mark Shannon	f0f7b978be	GH-128939: Refactor JIT optimize structs (GH-128940)	2025-01-20 15:49:15 +00:00
Victor Stinner	9e4a81f00f	gh-120642: Move private PyCode APIs to the internal C API (#120643 ) * Move _Py_CODEUNIT and related functions to pycore_code.h. * Move _Py_BackoffCounter to pycore_backoff.h. * Move Include/cpython/optimizer.h content to pycore_optimizer.h. * Remove Include/cpython/optimizer.h. * Remove PyUnstable_Replace_Executor(). Rename functions: * PyUnstable_GetExecutor() => _Py_GetExecutor() * PyUnstable_GetOptimizer() => _Py_GetOptimizer() * PyUnstable_SetOptimizer() => _Py_SetTier2Optimizer() * PyUnstable_Optimizer_NewCounter() => _PyOptimizer_NewCounter() * PyUnstable_Optimizer_NewUOpOptimizer() => _PyOptimizer_NewUOpOptimizer()	2024-06-26 13:54:03 +02:00
Saul Shanabrook	55402d3232	gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365) Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu> Co-authored-by: dpdani <git@danieleparmeggiani.me> Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com> Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com> Co-authored-by: Ken Jin <kenjin@python.org>	2024-06-08 17:41:45 +08:00
Mark Shannon	f5c6b9977a	GH-118910: Less boilerplate in the tier 2 optimizer (#118913 )	2024-05-10 17:43:23 +01:00
Mark Shannon	72867c962c	GH-118095: Unify the behavior of tier 2 FOR_ITER branch micro-ops (GH-118420) * Target _FOR_ITER_TIER_TWO at POP_TOP following the matching END_FOR * Modify _GUARD_NOT_EXHAUSTED_RANGE, _GUARD_NOT_EXHAUSTED_LIST and _GUARD_NOT_EXHAUSTED_TUPLE so that they also target the POP_TOP following the matching END_FOR	2024-05-02 16:17:59 +01:00
Guido van Rossum	7d83f7bcc4	gh-118335: Configure Tier 2 interpreter at build time (#118339 ) The code for Tier 2 is now only compiled when configured with `--enable-experimental-jit[=yes\|interpreter]`. We drop support for `PYTHON_UOPS` and -`Xuops`, but you can disable the interpreter or JIT at runtime by setting `PYTHON_JIT=0`. You can also build it without enabling it by default using `--enable-experimental-jit=yes-off`; enable with `PYTHON_JIT=1`. On Windows, the `build.bat` script supports `--experimental-jit`, `--experimental-jit-off`, `--experimental-interpreter`. In the C code, `_Py_JIT` is defined as before when the JIT is enabled; the new variable `_Py_TIER2` is defined when the JIT or the interpreter is enabled. It is actually a bitmask: 1: JIT; 2: default-off; 4: interpreter.	2024-04-30 18:26:34 -07:00
Mark Shannon	a6647d16ab	GH-115480: Reduce guard strength for binary ops when type of one operand is known already (GH-118050)	2024-04-22 13:34:06 +01:00
Mark Shannon	0c81ce1360	GH-115819: Eliminate Boolean guards when value is known (GH-116355)	2024-03-05 15:06:00 +00:00
Mark Shannon	cbf3d38cbe	GH-115685: Optimize `TO_BOOL` and variants based on truthiness of input. (GH-116311)	2024-03-05 11:23:46 +00:00
Guido van Rossum	0656509033	gh-116088: Insert bottom checks after all sym_set_...() calls (#116089 ) This changes the `sym_set_...()` functions to return a `bool` which is `false` when the symbol is `bottom` after the operation. All calls to such functions now check this result and go to `hit_bottom`, a special error label that prints a different message and then reports that it wasn't able to optimize the trace. No executor will be produced in this case.	2024-02-29 18:55:29 +00:00
Guido van Rossum	3409bc29c9	gh-115859: Re-enable T2 optimizer pass by default (#116062 ) This undoes the temporary default disabling of the T2 optimizer pass in gh-115860. - Add a new test that reproduces Brandt's example from gh-115859; it indeed crashes before gh-116028 with PYTHONUOPSOPTIMIZE=1 - Re-enable the optimizer pass in T2, stop checking PYTHONUOPSOPTIMIZE - Rename the env var to disable T2 entirely to PYTHON_UOPS_OPTIMIZE (must be explicitly set to 0 to disable) - Fix skipIf conditions on tests in test_opt.py accordingly - Export sym_is_bottom() (for debugging) - Fix various things in the `_BINARY_OP_` specializations in the abstract interpreter: - DECREF(temp) - out-of-space check after sym_new_const() - add sym_matches_type() checks, so even if we somehow reach a binary op with symbolic constants of the wrong type on the stack we won't trigger the type assert	2024-02-28 22:38:01 +00:00
Guido van Rossum	e2a3e4b748	gh-115816: Improve internal symbols API in optimizer (#116028 ) - Any `sym_set_...` call that attempts to set conflicting information cause the symbol to become `bottom` (contradiction). - All `sym_is...` and similar calls return false or NULL for `bottom`. - Everything's tested. - The tests still pass with `PYTHONUOPSOPTIMIZE=1`.	2024-02-28 17:55:56 +00:00
Mark Shannon	6ecfcfe894	GH-115816: Assorted naming and formatting changes to improve maintainability. (GH-115987) * Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol. * #define shortened form of _Py_uop_... names for improved readability.	2024-02-27 13:25:02 +00:00
Mark Shannon	10fbcd6c5d	GH-115816: Make tier2 optimizer symbols testable, and add a few tests. (GH-115953)	2024-02-27 10:51:26 +00:00

34 commits