cpython

mirror of https://github.com/python/cpython.git synced 2025-12-08 06:10:17 +00:00

Author	SHA1	Message	Date
Ken Jin	4fa80ce74c	gh-139109: A new tracing JIT compiler frontend for CPython (GH-140310) This PR changes the current JIT model from trace projection to trace recording. Benchmarking: better pyperformance (about 1.7% overall) geomean versus current https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251108-3.15.0a1%2B-7e2bc1d-JIT/bm-20251108-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-7e2bc1d-vs-base.svg, 100% faster Richards on the most improved benchmark versus the current JIT. Slowdown of about 10-15% on the worst benchmark versus the current JIT. Note: the fastest version isn't the one merged, as it relies on fixing bugs in the specializing interpreter, which is left to another PR. The speedup in the merged version is about 1.1%. https://raw.githubusercontent.com/facebookexperimental/free-threading-benchmarking/refs/heads/main/results/bm-20251112-3.15.0a1%2B-f8a764a-JIT/bm-20251112-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-f8a764a-vs-base.svg Stats: 50% more uops executed, 30% more traces entered the last time we ran them. It also suggests our trace lengths for a real trace recording JIT are too short, as a lot of trace too long aborts https://github.com/facebookexperimental/free-threading-benchmarking/blob/main/results/bm-20251023-3.15.0a1%2B-eb73378-CLANG%2CJIT/bm-20251023-vultr-x86_64-Fidget%252dSpinner-tracing_jit-3.15.0a1%2B-eb73378-pystats-vs-base.md . This new JIT frontend is already able to record/execute significantly more instructions than the previous JIT frontend. In this PR, we are now able to record through custom dunders, simple object creation, generators, etc. None of these were done by the old JIT frontend. Some custom dunders uops were discovered to be broken as part of this work gh-140277 The optimizer stack space check is disabled, as it's no longer valid to deal with underflow. Pros: * Ignoring the generated tracer code as it's automatically created, this is only additional 1k lines of code. The maintenance burden is handled by the DSL and code generator. * `optimizer.c` is now significantly simpler, as we don't have to do strange things to recover the bytecode from a trace. * The new JIT frontend is able to handle a lot more control-flow than the old one. * Tracing is very low overhead. We use the tail calling interpreter/computed goto interpreter to switch between tracing mode and non-tracing mode. I call this mechanism dual dispatch, as we have two dispatch tables dispatching to each other. Specialization is still enabled while tracing. * Better handling of polymorphism. We leverage the specializing interpreter for this. Cons: * (For now) requires tail calling interpreter or computed gotos. This means no Windows JIT for now :(. Not to fret, tail calling is coming soon to Windows though https://github.com/python/cpython/pull/139962 Design: * After each instruction, the `record_previous_inst` function/label is executed. This does as the name suggests. * The tracing interpreter lowers bytecode to uops directly so that it can obtain "fresh" values at the point of lowering. * The tracing version behaves nearly identical to the normal interpreter, in fact it even has specialization! This allows it to run without much of a slowdown when tracing. The actual cost of tracing is only a function call and writes to memory. * The tracing interpreter uses the specializing interpreter's deopt to naturally form the side exit chains. This allows it to side exit chain effectively, without repeating much code. We force a re-specializing when tracing a deopt. * The tracing interpreter can even handle goto errors/exceptions, but I chose to disable them for now as it's not tested. * Because we do not share interpreter dispatch, there is should be no significant slowdown to the original specializing interpreter on tailcall and computed got with JIT disabled. With JIT enabled, there might be a slowdown in the form of the JIT trying to trace. * Things that could have dynamic instruction pointer effects are guarded on. The guard deopts to a new instruction --- `_DYNAMIC_EXIT`.	2025-11-13 18:08:32 +00:00
Victor Stinner	b99db92dde	gh-139653: Add PyUnstable_ThreadState_SetStackProtection() (#139668 ) Add PyUnstable_ThreadState_SetStackProtection() and PyUnstable_ThreadState_ResetStackProtection() functions to set the stack base address and stack size of a Python thread state. Co-authored-by: Petr Viktorin <encukou@gmail.com>	2025-11-13 17:30:50 +01:00
Petr Viktorin	589a03a8ce	gh-140550: Initial implementation of PEP 793 – PyModExport (GH-140556) Co-authored-by: Victor Stinner <vstinner@python.org> Co-authored-by: Kumar Aditya <kumaraditya@python.org>	2025-11-05 12:31:42 +01:00
Dino Viehland	ff0cf0af10	gh-139525: Don't specialize functions which have a modified vectorcall (#139524 ) Don't specialize functions which have a modified vectorcall	2025-10-03 09:58:32 -07:00
Bénédikt Tran	3779f2b95e	gh-139393: fix `_CALL_LEN` JIT tests for tuples (#139394 ) Fix a regression introduced in `7ce25edb8f` where `_PY_NSMALLPOSINTS` was changed from 257 to 1025.	2025-09-28 19:30:44 +02:00
Mark Shannon	16eae6d90d	GH-137573: Add test to check that the margin used for overflow protection is larger than the stack space used by the interpreter (GH-137724)	2025-09-23 15:47:27 +02:00
Peter Bierma	2191497933	gh-136003: Execute pre-finalization callbacks in a loop (GH-136004)	2025-09-18 08:29:12 -04:00
Hood Chatham	7ae4749d06	gh-124621: Emscripten: Add support for async input devices (GH-136822) This is useful for implementing proper `input()`. It requires the JavaScript engine to support the wasm JSPI spec which is now stage 4. It is supported on Chrome since version 137 and on Firefox and node behind a flag. We override the `__wasi_fd_read()` syscall with our own variant that checks for a readAsync operation. If it has it, we use our own async variant of `fd_read()`, otherwise we use the original `fd_read()`. We also add a variant of `FS.createDevice()` called `FS.createAsyncInputDevice()`. Finally, if JSPI is available, we wrap the `main()` symbol with `WebAssembly.promising()` so that we can stack switch from `fd_read()`. If JSPI is not available, attempting to read from an AsyncInputDevice will raise an `OSError`.	2025-07-19 17:14:29 +02:00
William S Fulton	7de8ea7be6	gh-136300: Modify C tests to conform to PEP-737 (GH-136301) - Use %T format specifier instead of %s and Py_TYPE(x)->tp_name. - Remove legacy %.200s format specifier for truncating type names. Co-authored-by: Victor Stinner <vstinner@python.org>	2025-07-11 15:18:35 +02:00
Victor Stinner	28940e8e48	gh-130396: Move PYOS_LOG2_STACK_MARGIN to internal headers (#135928 ) Move PYOS_LOG2_STACK_MARGIN, PYOS_STACK_MARGIN, PYOS_STACK_MARGIN_BYTES and PYOS_STACK_MARGIN_SHIFT macros to pycore_pythonrun.h internal header. Add underscore (_) prefix to the names to make them private. Rename _PYOS to _PyOS.	2025-07-01 15:18:17 +02:00
Peter Bierma	10a3d43188	gh-135755: Move `PyFunction_GET_BUILTINS` to the private API (GH-135938)	2025-06-26 11:43:08 +02:00
Eric Snow	62143736b6	gh-134939: Add the concurrent.interpreters Module (gh-133958) PEP-734 has been accepted (for 3.14). (FTR, I'm opposed to putting this under the concurrent package, but doing so is the SC condition under which the module can land in 3.14.)	2025-06-11 17:35:48 -06:00
sobolevn	cebae977a6	gh-133891: Add missing error check to `SET_COUNT` macro in `_testinternalcapi.c` (#133892 )	2025-06-01 00:33:02 +03:00
Eric Snow	88f8102a8f	gh-132775: Support Fallbacks in _PyObject_GetXIData() (gh-133482) It now supports a "full" fallback to _PyFunction_GetXIData() and then `_PyPickle_GetXIData()`. There's also room for other fallback modes if that later makes sense.	2025-05-21 07:23:48 -06:00
b-pass	f2de1e6861	gh-134144: Fix use-after-free in zapthreads() (#134145 )	2025-05-18 20:32:29 +05:30
Eric Snow	8cf4947b0f	gh-132775: Add _PyFunction_GetXIData() (gh-133481)	2025-05-12 22:10:56 +00:00
Eric Snow	c81fa2b9cd	gh-132775: Add _PyCode_GetScriptXIData() (gh-133480) This converts functions, code, str, bytes, bytearray, and memoryview objects to PyCodeObject, and ensure that the object looks like a script. That means no args, no return, and no closure. _PyCode_GetPureScriptXIData() takes it a step further and ensures there are no globals. We also add _PyObject_SupportedAsScript() to the internal C-API.	2025-05-08 15:07:46 +00:00
Eric Snow	27128e4fa8	gh-132775: Unrevert "Add _PyCode_VerifyStateless()" (gh-133528) This reverts commit `3c73cf5` (gh-133497), which itself reverted the original commit `d270bb5` (gh-133221). We reverted the original change due to failing android tests. The checks in _PyCode_CheckNoInternalState() were too strict, so we've relaxed them.	2025-05-08 00:00:33 +00:00
Petr Viktorin	3c73cf51df	gh-132775: Revert "gh-132775: Add _PyCode_VerifyStateless() (gh-133221)" (#133497 )	2025-05-06 13:09:41 +03:00
Eric Snow	ea598730ef	gh-132775: Add _PyCode_GetXIData() (gh-133475)	2025-05-05 23:46:03 +00:00
Brandt Bucher	b1aa515bd6	GH-133231: Add JIT utilities in sys._jit (GH-133233)	2025-05-05 15:25:22 -07:00
Eric Snow	d270bb5792	gh-132775: Add _PyCode_VerifyStateless() (gh-133221) "Stateless" code is a function or code object which does not rely on external state or internal state. It may rely on arguments and builtins, but not globals or a closure. I've left a comment in pycore_code.h that provides more detail. We also add _PyFunction_VerifyStateless(). The new functions will be used in several later changes that facilitate "sharing" functions and code objects between interpreters.	2025-05-05 21:48:58 +00:00
Eric Snow	24ebb9ccfd	gh-132775: Unrevert "Add _PyCode_GetVarCounts()" (gh-133265) This reverts commit `811edcf` (gh-133232), which itself reverted the original commit `811edcf` (gh-133128). We reverted the original change due to failing s390 builds (a big-endian architecture). It ended up that I had not accommodated op caches.	2025-05-05 13:24:29 -06:00
Eric Snow	811edcf9cd	Revert "gh-132775: Add _PyCode_GetVarCounts() (gh-133128)" (gh-133232) The change broke the s390 builds, so I'm reverting it while I investigate. This reverts commit `94b4fcd806`.	2025-05-01 02:35:20 +00:00
Eric Snow	cb35c11d82	gh-132775: Add _PyPickle_GetXIData() (gh-133107) There's some extra complexity due to making sure we we get things right when handling functions and classes defined in the __main__ module. This is also reflected in the tests, including the addition of extra functions in test.support.import_helper.	2025-04-30 17:34:05 -06:00
Eric Snow	94b4fcd806	gh-132775: Add _PyCode_GetVarCounts() (gh-133128) This helper is useful in a variety of ways, including in demonstrating how the different counts relate to one another. It will be used in a later change to help identify if a function is "stateless", meaning it doesn't have any free vars or globals. Note that a majority of this change is tests.	2025-04-30 18:19:20 +00:00
Eric Snow	219d8d24b5	gh-87859: Track Code Object Local Kinds For Arguments (gh-132980) Doing this was always the intention. I was finally motivated to find the time to do it. See #87859 (comment).	2025-04-29 02:21:47 +00:00
Eric Snow	96a7fb93a8	gh-132775: Add _PyCode_ReturnsOnlyNone() (gh-132981) The function indicates whether or not the function has a return statement. This is used by a later change related treating some functions like scripts.	2025-04-28 20:12:52 -06:00
Eric Snow	bdd23c0bb9	gh-132775: Add _PyMarshal_GetXIData() (gh-133108) Note that the bulk of this change is tests.	2025-04-28 17:23:46 -06:00
Neil Schemenauer	22f0730d40	gh-122320: Limit dict key versions used by test_opcache. (gh-132961) The `test_load_global_module()` test consumes a lot of dict key versions. Skip the test if we have consumed half of the available versions that can be used for the "load global" cache.	2025-04-28 12:54:55 -07:00
Eric Snow	6f04325992	gh-132775: Cleanup Related to crossinterp.c Before Further Changes (gh-132974) This change consists of adding tests and moving code around, with some renaming thrown in.	2025-04-28 11:55:15 -06:00
Eric Snow	cd9536a087	gh-132781: Cleanup Code Related to NotShareableError (gh-132782) The following are added to the internal C-API: * _PyErr_FormatV() * _PyErr_SetModuleNotFoundError() * _PyXIData_GetNotShareableErrorType() * _PyXIData_FormatNotShareableError() We also drop _PyXIData_lookup_context_t and _PyXIData_GetLookupContext().	2025-04-25 14:43:38 -06:00
Sergey B Kirpichev	79f7c67bf6	gh-128813: hide mixed-mode functions for complex arithmetic from C-API (#131703 )	2025-04-22 14:18:18 +02:00
Bénédikt Tran	edbf7fb129	gh-111178: remove redundant casts for functions with correct signatures (#131673 )	2025-04-01 17:18:11 +02:00
Mark Shannon	7ebd71ee14	GH-131498: Remove conditional stack effects (GH-131499) * Adds some missing #includes	2025-03-20 15:39:38 +00:00
Victor Stinner	b69da006a4	gh-131238: Remove includes from pycore_interp.h (#131495 ) Remove also now unused includes in C files.	2025-03-20 11:35:23 +00:00
Victor Stinner	20c5f969dd	gh-131238: Remove more includes from pycore_interp.h (#131480 )	2025-03-19 23:01:32 +01:00
Sam Gross	45bc120d45	gh-130519: Fix crash in QSBR when destructor reenters QSBR (gh-130553) The `free_work_item()` function in QSBR may call arbitrary code via Python object destructors, which may reenter the QSBR code. Reorder the processing of work items to be robust to reentrancy. Also fix the TODO for the out of memory situation.	2025-02-26 14:55:15 -05:00
Mark Shannon	014223649c	GH-130396: Use computed stack limits on linux (GH-130398) * Implement C recursion protection with limit pointers for Linux, MacOS and Windows * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow	2025-02-25 09:24:48 +00:00
Petr Viktorin	ef29104f7d	GH-91079: Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now (GH130413) Revert "GH-91079: Implement C stack limits using addresses, not counters. (GH-130007)" for now Unfortunatlely, the change broke some buildbots. This reverts commit `2498c22fa0`.	2025-02-24 11:16:08 +01:00
Mark Shannon	2498c22fa0	GH-91079: Implement C stack limits using addresses, not counters. (GH-130007) * Implement C recursion protection with limit pointers * Remove calls to PyOS_CheckStack * Add stack protection to parser * Make tests more robust to low stacks * Improve error messages for stack overflow	2025-02-19 11:44:57 +00:00
Brandt Bucher	002c4e2982	GH-129386: Use symbolic constants for specialization tests (GH-129415)	2025-01-29 10:49:58 -08:00
Brandt Bucher	828b27680f	GH-126599: Remove the PyOptimizer API (GH-129194)	2025-01-28 16:10:51 -08:00
Victor Stinner	1d485db953	gh-128863: Deprecate _PyLong_Sign() function (#129176 ) Replace _PyLong_Sign() with PyLong_GetSign().	2025-01-23 03:11:53 +01:00
Mark Shannon	f0f7b978be	GH-128939: Refactor JIT optimize structs (GH-128940)	2025-01-20 15:49:15 +00:00
Victor Stinner	8ceb6cb117	gh-129033: Remove _PyInterpreterState_SetConfig() function (#129048 ) Remove _PyInterpreterState_GetConfigCopy() and _PyInterpreterState_SetConfig() private functions. PEP 741 "Python Configuration C API" added a better public C API: PyConfig_Get() and PyConfig_Set().	2025-01-20 16:31:33 +01:00
Xuanteng Huang	b44ff6d0df	GH-126599: Remove the "counter" optimizer/executor (GH-126853)	2025-01-16 15:57:04 -08:00
Neil Schemenauer	1b15c89a17	gh-115999: Specialize `STORE_ATTR` in free-threaded builds. (gh-127838) * Add `_PyDictKeys_StringLookupSplit` which does locking on dict keys and use in place of `_PyDictKeys_StringLookup`. * Change `_PyObject_TryGetInstanceAttribute` to use that function in the case of split keys. * Add `unicodekeys_lookup_split` helper which allows code sharing between `_Py_dict_lookup` and `_PyDictKeys_StringLookupSplit`. * Fix locking for `STORE_ATTR_INSTANCE_VALUE`. Create `_GUARD_TYPE_VERSION_AND_LOCK` uop so that object stays locked and `tp_version_tag` cannot change. * Pass `tp_version_tag` to `specialize_dict_access()`, ensuring the version we store on the cache is the correct one (in case of it changing during the specalize analysis). * Split `analyze_descriptor` into `analyze_descriptor_load` and `analyze_descriptor_store` since those don't share much logic. Add `descriptor_is_class` helper function. * In `specialize_dict_access`, double check `_PyObject_GetManagedDict()` in case we race and dict was materialized before the lock. * Avoid borrowed references in `_Py_Specialize_StoreAttr()`. * Use `specialize()` and `unspecialize()` helpers. * Add unit tests to ensure specializing happens as expected in FT builds. * Add unit tests to attempt to trigger data races (useful for running under TSAN). * Add `has_split_table` function to `_testinternalcapi`.	2024-12-19 10:21:17 -08:00
Mark Shannon	bc262de06b	GH-125174: Mark objects as statically allocated. (#127797 ) * Set a bit in the unused part of the refcount on 64 bit machines and the free-threaded build. * Use the top of the refcount range on 32 bit machines	2024-12-11 17:37:38 +00:00
Peter Bierma	d5d84c3f13	gh-127791: Fix, document, and test `PyUnstable_AtExit` (#127793 )	2024-12-11 12:14:04 +01:00

1 2 3 4

184 commits