Commit graph

34 commits

Author SHA1 Message Date
Ken Jin
4fa80ce74c
gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Victor Stinner
166cdaa6fb
gh-111489: Remove _PyTuple_FromArray() alias (#139973)
Replace _PyTuple_FromArray() with PyTuple_FromArray().
Remove pycore_tuple.h includes.
2025-10-11 22:58:14 +02:00
Mark Shannon
3b83257366
GH-138378: Move globals-to-consts pass into main optimizer pass (GH-138379) 2025-09-18 10:09:59 +01:00
AN Long
1ff2cbbac8
gh-137136: Suppress build warnings when build on Windows with --experimental-jit-interpreter (GH-137137) 2025-09-03 15:42:26 +01:00
Ken Jin
7fda8b66de
gh-137728 gh-137762: Fix bugs in the JIT with many local variables (GH-137764) 2025-08-20 22:53:54 +08:00
Ken Jin
ff7b5d44a0
gh-132732: Fix up pure types in JIT (GH-136050)
Fix up pure types in JIT
2025-06-28 18:30:30 +08:00
Ken Jin
c419af9e27
gh-132732: JIT: Only allow compact ints in pure evaluation (GH-136040) 2025-06-28 00:18:44 +08:00
Ken Jin
695ab61351
gh-132732: Automatically constant evaluate pure operations (GH-132733)
This adds a "macro" to the optimizer DSL called "REPLACE_OPCODE_IF_EVALUATES_PURE", which allows automatically constant evaluating a bytecode body if certain inputs have no side effects upon evaluations (such as ints, strings, and floats).


Co-authored-by: Tomas R. <tomas.roun8@gmail.com>
2025-06-27 19:37:44 +08:00
Ken Jin
569fc6870f
gh-134584: Specialize POP_TOP by reference and type in JIT (GH-135761) 2025-06-24 00:57:14 +08:00
Ken Jin
0243260284
gh-135379: Move PyLong_CheckCompact to private header and rename it (GH-135707) 2025-06-19 13:09:09 +00:00
Mark Shannon
9731dd2c8d
GH-135379: Specialize int operations for compact ints only (GH-135668) 2025-06-19 11:10:29 +01:00
Ken Jin
fba5dded6d
gh-134584: Decref elimination for float ops in the JIT (GH-134588)
This PR adds a PyJitRef API to the JIT's optimizer that mimics the _PyStackRef API. This allows it to track references and their stack lifetimes properly. Thus opening up the doorway to refcount elimination in the JIT.
2025-06-17 23:25:53 +08:00
Brandt Bucher
ec736e7dae
GH-131798: Optimize cached class attributes and methods in the JIT (GH-134403) 2025-05-22 11:15:03 -04:00
Brandt Bucher
2f0570caf4
GH-131798: Narrow types more aggressively in the JIT (GH-134373) 2025-05-20 18:09:51 -04:00
Bénédikt Tran
883c2f682b
GH-131331: Rename "not" to "invert" (GH-131334) 2025-03-20 16:59:41 -07:00
Mark Shannon
7ebd71ee14
GH-131498: Remove conditional stack effects (GH-131499)
* Adds some missing #includes
2025-03-20 15:39:38 +00:00
Mark Shannon
a45f25361d
GH-131238: More refactoring of core header files (GH-131351)
Adds new pycore_stats.h header file to help break dependencies involving the pycore_code.h header.
2025-03-17 14:41:05 +00:00
Amit Lavon
691354ccb0
GH-130415: Narrow str to "" based on boolean tests (GH-130476) 2025-03-04 13:20:17 -08:00
Klaus117
c989e74446
GH-130415: Narrow int to 0 based on boolean tests (GH-130772) 2025-03-04 12:44:09 -08:00
Brandt Bucher
7afa476874
GH-130415: Use boolean guards to narrow types to values in the JIT (GH-130659) 2025-03-02 13:21:34 -08:00
Mark Shannon
f0f7b978be
GH-128939: Refactor JIT optimize structs (GH-128940) 2025-01-20 15:49:15 +00:00
Victor Stinner
9e4a81f00f
gh-120642: Move private PyCode APIs to the internal C API (#120643)
* Move _Py_CODEUNIT and related functions to pycore_code.h.
* Move _Py_BackoffCounter to pycore_backoff.h.
* Move Include/cpython/optimizer.h content to pycore_optimizer.h.
* Remove Include/cpython/optimizer.h.
* Remove PyUnstable_Replace_Executor().

Rename functions:

* PyUnstable_GetExecutor() => _Py_GetExecutor()
* PyUnstable_GetOptimizer() => _Py_GetOptimizer()
* PyUnstable_SetOptimizer() => _Py_SetTier2Optimizer()
* PyUnstable_Optimizer_NewCounter() => _PyOptimizer_NewCounter()
* PyUnstable_Optimizer_NewUOpOptimizer() => _PyOptimizer_NewUOpOptimizer()
2024-06-26 13:54:03 +02:00
Saul Shanabrook
55402d3232
gh-119258: Eliminate Type Guards in Tier 2 Optimizer with Watcher (GH-119365)
Co-authored-by: parmeggiani <parmeggiani@spaziodati.eu>
Co-authored-by: dpdani <git@danieleparmeggiani.me>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Brandt Bucher <brandtbucher@microsoft.com>
Co-authored-by: Ken Jin <kenjin@python.org>
2024-06-08 17:41:45 +08:00
Mark Shannon
f5c6b9977a
GH-118910: Less boilerplate in the tier 2 optimizer (#118913) 2024-05-10 17:43:23 +01:00
Mark Shannon
72867c962c
GH-118095: Unify the behavior of tier 2 FOR_ITER branch micro-ops (GH-118420)
* Target _FOR_ITER_TIER_TWO at POP_TOP following the matching END_FOR

* Modify _GUARD_NOT_EXHAUSTED_RANGE, _GUARD_NOT_EXHAUSTED_LIST and _GUARD_NOT_EXHAUSTED_TUPLE so that they also target the POP_TOP following the matching END_FOR
2024-05-02 16:17:59 +01:00
Guido van Rossum
7d83f7bcc4
gh-118335: Configure Tier 2 interpreter at build time (#118339)
The code for Tier 2 is now only compiled when configured
with `--enable-experimental-jit[=yes|interpreter]`.

We drop support for `PYTHON_UOPS` and -`Xuops`,
but you can disable the interpreter or JIT
at runtime by setting `PYTHON_JIT=0`.
You can also build it without enabling it by default
using `--enable-experimental-jit=yes-off`;
enable with `PYTHON_JIT=1`.

On Windows, the `build.bat` script supports
`--experimental-jit`, `--experimental-jit-off`,
`--experimental-interpreter`.

In the C code, `_Py_JIT` is defined as before
when the JIT is enabled; the new variable
`_Py_TIER2` is defined when the JIT *or* the
interpreter is enabled. It is actually a bitmask:
1: JIT; 2: default-off; 4: interpreter.
2024-04-30 18:26:34 -07:00
Mark Shannon
a6647d16ab
GH-115480: Reduce guard strength for binary ops when type of one operand is known already (GH-118050) 2024-04-22 13:34:06 +01:00
Mark Shannon
0c81ce1360
GH-115819: Eliminate Boolean guards when value is known (GH-116355) 2024-03-05 15:06:00 +00:00
Mark Shannon
cbf3d38cbe
GH-115685: Optimize TO_BOOL and variants based on truthiness of input. (GH-116311) 2024-03-05 11:23:46 +00:00
Guido van Rossum
0656509033
gh-116088: Insert bottom checks after all sym_set_...() calls (#116089)
This changes the `sym_set_...()` functions to return a `bool` which is `false`
when the symbol is `bottom` after the operation.

All calls to such functions now check this result and go to `hit_bottom`,
a special error label that prints a different message and then reports
that it wasn't able to optimize the trace. No executor will be produced
in this case.
2024-02-29 18:55:29 +00:00
Guido van Rossum
3409bc29c9
gh-115859: Re-enable T2 optimizer pass by default (#116062)
This undoes the *temporary* default disabling of the T2 optimizer pass in gh-115860.

- Add a new test that reproduces Brandt's example from gh-115859; it indeed crashes before gh-116028 with PYTHONUOPSOPTIMIZE=1
- Re-enable the optimizer pass in T2, stop checking PYTHONUOPSOPTIMIZE
- Rename the env var to disable T2 entirely to PYTHON_UOPS_OPTIMIZE (must be explicitly set to 0 to disable)
- Fix skipIf conditions on tests in test_opt.py accordingly
- Export sym_is_bottom() (for debugging)
- Fix various things in the `_BINARY_OP_` specializations in the abstract interpreter:
  - DECREF(temp)
  - out-of-space check after sym_new_const()
  - add sym_matches_type() checks, so even if we somehow reach a binary op with symbolic constants of the wrong type on the stack we won't trigger the type assert
2024-02-28 22:38:01 +00:00
Guido van Rossum
e2a3e4b748
gh-115816: Improve internal symbols API in optimizer (#116028)
- Any `sym_set_...` call that attempts to set conflicting information
  cause the symbol to become `bottom` (contradiction).
- All `sym_is...` and similar calls return false or NULL for `bottom`.
- Everything's tested.
- The tests still pass with `PYTHONUOPSOPTIMIZE=1`.
2024-02-28 17:55:56 +00:00
Mark Shannon
6ecfcfe894
GH-115816: Assorted naming and formatting changes to improve maintainability. (GH-115987)
* Rename _Py_UOpsAbstractInterpContext to _Py_UOpsContext and _Py_UOpsSymType to _Py_UopsSymbol.

* #define shortened form of _Py_uop_... names for improved readability.
2024-02-27 13:25:02 +00:00
Mark Shannon
10fbcd6c5d
GH-115816: Make tier2 optimizer symbols testable, and add a few tests. (GH-115953) 2024-02-27 10:51:26 +00:00