Commit graph

33 commits

Author SHA1 Message Date
Ken Jin
4fa80ce74c
gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310)
This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. **Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR**. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg

Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md .

This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277

The optimizer stack space check is disabled, as it's no longer valid to deal with underflow.

Pros:
* Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator.
* `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace.
* The new JIT frontend is able to handle a lot more control-flow than the old one.
* Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing.
* Better handling of polymorphism. We leverage the specializing interpreter for this.

Cons:
* (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962

Design:
* After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests.
* The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering.
* The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory.
* The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt.
* The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested.
* Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace.
* Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.
2025-11-13 18:08:32 +00:00
Ken Jin
f701f98052
gh-140312: Set lltrace on JIT debug builds (GH-140313)
Co-authored-by: Mark Shannon <mark@hotpy.org>
2025-11-01 16:22:59 +00:00
Mark Shannon
a8d9d94784
GH-137959: Replace shim code in jitted code with a single trampoline function. (GH-137961) 2025-08-21 10:40:53 +01:00
Mark Shannon
e7b55f564d
GH-136410: Faster side exits by using a cold exit stub (GH-136411) 2025-08-01 16:26:07 +01:00
Brandt Bucher
3d8c38f6db
GH-135904: Improve the JIT's performance on macOS (GH-136528) 2025-07-14 10:14:20 -07:00
Mark Shannon
ac7d5ba96e
GH-133231: Changes to executor management to support proposed sys._jit module (GH-133287)
* Track the current executor, not the previous one, on the thread-state. 

* Batch executors for deallocation to avoid having to constantly incref executors; this is an ad-hoc form of deferred reference counting.
2025-05-04 10:05:35 +01:00
Lysandros Nikolaou
60202609a2
gh-132661: Implement PEP 750 (#132662)
Co-authored-by: Lysandros Nikolaou <lisandrosnik@gmail.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
Co-authored-by: Wingy <git@wingysam.xyz>
Co-authored-by: Koudai Aono <koxudaxi@gmail.com>
Co-authored-by: Dave Peck <davepeck@gmail.com>
Co-authored-by: Terry Jan Reedy <tjreedy@udel.edu>
Co-authored-by: Paul Everitt <pauleveritt@me.com>
Co-authored-by: sobolevn <mail@sobolevn.me>
2025-04-30 11:46:41 +02:00
Victor Stinner
49fb75c676
gh-131238: Add missing pycore_function.h includes for JIT compiler (#131571) 2025-03-21 23:37:49 +00:00
Mark Shannon
7ebd71ee14
GH-131498: Remove conditional stack effects (GH-131499)
* Adds some missing #includes
2025-03-20 15:39:38 +00:00
Brandt Bucher
70e387c990
GH-129709: Clean up tier two (GH-129710) 2025-02-07 09:52:49 -08:00
Brandt Bucher
fbaa6c8ff0
GH-129763: Remove the LLTRACE macro (GH-129764) 2025-02-07 08:49:51 -08:00
Ken Jin
cb640b659e
gh-128563: A new tail-calling interpreter (GH-128718)
Co-authored-by: Garrett Gu <garrettgu777@gmail.com>
Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
Co-authored-by: Hugo van Kemenade <1324225+hugovk@users.noreply.github.com>
2025-02-06 23:21:57 +08:00
Ken Jin
6293d00e72
gh-120619: Strength reduce function guards, support 2-operand uop forms (GH-124846)
Co-authored-by: Brandt Bucher <brandtbucher@gmail.com>
2024-11-09 11:35:33 +08:00
Savannah Ostrowski
c29bbe2101
GH-125498: Update JIT builds to use LLVM 19 and preserve_none (GH-125499) 2024-10-30 12:03:31 -07:00
Brandt Bucher
51185923a8
GH-113464: Speed up JIT builds (GH-122839) 2024-08-14 07:53:46 -07:00
Brandt Bucher
33903c53db
GH-116017: Get rid of _COLD_EXITs (GH-120960) 2024-07-01 13:17:40 -07:00
Ken Jin
22b0de2755
gh-117139: Convert the evaluation stack to stack refs (#118450)
This PR sets up tagged pointers for CPython.

The general idea is to create a separate struct _PyStackRef for everything on the evaluation stack to store the bits. This forces the C compiler to warn us if we try to cast things or pull things out of the struct directly.

Only for free threading: We tag the low bit if something is deferred - that means we skip incref and decref operations on it. This behavior may change in the future if Mark's plans to defer all objects in the interpreter loop pans out.

This implies a strict stack reference discipline is required. ALL incref and decref operations on stackrefs must use the stackref variants. It is unsafe to untag something then do normal incref/decref ops on it.

The new incref and decref variants are called dup and close. They mimic a "handle" API operating on these stackrefs.

Please read Include/internal/pycore_stackref.h for more information!

---------

Co-authored-by: Mark Shannon <9448417+markshannon@users.noreply.github.com>
2024-06-27 03:10:43 +08:00
Mark Shannon
8f5a01707f
GH-120982: Add stack check assertions to generated interpreter code (GH-120992) 2024-06-25 16:42:29 +01:00
Brandt Bucher
a47abdb45d
GH-117062: Make _JUMP_TO_TOP a general absolute jump (GH-120854) 2024-06-24 08:35:10 -07:00
Mark Shannon
67bba9dd0f
GH-117442: Check eval-breaker at start (rather than end) of tier 2 loops (GH-118482) 2024-05-02 13:10:31 +01:00
Brandt Bucher
49baa656cb
GH-115802: Use the GHC calling convention in JIT code (GH-118287) 2024-05-01 08:05:53 -07:00
Mark Shannon
3e06c7f719
GH-118095: Add dynamic exit support and FOR_ITER_GEN support to tier 2 (GH-118279) 2024-04-26 18:08:50 +01:00
Dino Viehland
07525c9a85
gh-116818: Make sys.settrace, sys.setprofile, and monitoring thread-safe (#116775)
Makes sys.settrace, sys.setprofile, and monitoring generally thread-safe.

Mostly uses a stop-the-world approach and synchronization around the code object's _co_instrumentation_version.  There may be a little bit of extra synchronization around the monitoring data that's required to be TSAN clean.
2024-04-19 14:47:42 -07:00
Brandt Bucher
62aeb0ee69
GH-117512: Allow 64-bit JIT operands on 32-bit platforms (GH-117527) 2024-04-06 08:26:43 -07:00
Michael Droettboom
0edde64a41
GH-117457: Correct pystats uop "miss" counts (GH-117477) 2024-04-04 15:49:18 -07:00
Guido van Rossum
060a96f1a9
gh-116968: Reimplement Tier 2 counters (#117144)
Introduce a unified 16-bit backoff counter type (``_Py_BackoffCounter``),
shared between the Tier 1 adaptive specializer and the Tier 2 optimizer. The
API used for adaptive specialization counters is changed but the behavior is
(supposed to be) identical.

The behavior of the Tier 2 counters is changed:
- There are no longer dynamic thresholds (we never varied these).
- All counters now use the same exponential backoff.
- The counter for ``JUMP_BACKWARD`` starts counting down from 16.
- The ``temperature`` in side exits starts counting down from 64.
2024-04-04 15:03:27 +00:00
Sam Gross
19c1dd60c5
gh-117323: Make cell thread-safe in free-threaded builds (#117330)
Use critical sections to lock around accesses to cell contents. The critical sections are no-ops in the default (with GIL) build.
2024-03-29 13:35:43 -04:00
Michael Droettboom
26d328b2ba
GH-117121: Add pystats to JIT builds (GH-117346) 2024-03-28 15:23:08 -07:00
Mark Shannon
bf82f77957
GH-116422: Tier2 hot/cold splitting (GH-116813)
Splits the "cold" path, deopts and exits, from the "hot" path, reducing the size of most jitted instructions, at the cost of slower exits.
2024-03-26 09:35:11 +00:00
Ken Jin
41457c7fdb
gh-116381: Remove bad specializations, add fail stats (GH-116464)
* Remove bad specializations, add fail stats
2024-03-08 00:21:21 +08:00
Brandt Bucher
f0df35eeca
GH-115802: JIT "small" code for Windows (GH-115964) 2024-02-29 08:11:28 -08:00
Mark Shannon
7b21403ccd
GH-112354: Initial implementation of warm up on exits and trace-stitching (GH-114142) 2024-02-20 09:39:55 +00:00
Brandt Bucher
f6d9e5926b
GH-113464: Add a JIT backend for tier 2 (GH-113465)
Add an option (--enable-experimental-jit for configure-based builds
or --experimental-jit for PCbuild-based ones) to build an
*experimental* just-in-time compiler, based on copy-and-patch (https://fredrikbk.com/publications/copy-and-patch.pdf).

See Tools/jit/README.md for more information on how to install the required build-time tooling.
2024-01-28 18:48:48 -08:00