cpython/Python
Gregory P. Smith ad4ee7cb0f
gh-144015: Add portable SIMD optimization for bytes.hex() et. al. (GH-143991)
Add SIMD optimization for `bytes.hex()`, `bytearray.hex()`, and `binascii.hexlify()` as well as `hashlib` `.hexdigest()` methods using platform-agnostic GCC/Clang vector extensions that compile to native SIMD instructions on our [PEP-11 Tier 1 Linux and macOS](https://peps.python.org/pep-0011/#tier-1) platforms.

- 1.1-3x faster for common small data (16-64 bytes, covering md5 through sha512 digest sizes)
- Up to 11x faster for large data (1KB+)
- Retains the existing scalar code for short inputs (<16 bytes) or platforms lacking SIMD instructions, no observable performance regressions there.

## Supported platforms:

- x86-64: the compiler generates SSE2 - always available, no flags or CPU feature checks needed
- ARM64: NEON is always available, always available, no flags or CPU feature checks needed
- ARM32: Requires NEON support and that appropriate compiler flags enable that (e.g., `-march=native` on a Raspberry Pi 3+) - while we _could_ use runtime detection to allow neon when compiled without a recent enough `-march=` flag (`cortex-a53` and later IIRC), there are diminishing returns in doing so. Anyone using 32-bit ARM in a situation where performance matters will already be compiling with such flags. (as opposed to 32-bit Raspbian compilation that defaults to aiming primarily for compatibility with rpi1&0 armv6 arch=armhf which lacks neon)
- Windows/MSVC: Not supported. MSVC lacks `__builtin_shufflevector`, so the existing scalar path is used. Leaving it as an opportunity for the future for someone to figure out how to express the intent to that compiler.

This is compile time detection of features that are always available on the target architectures. No need for runtime feature inspection.
2026-02-22 19:19:03 -08:00
..
clinic gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
frozen_modules gh-97669: Create Tools/build/ directory (#97963) 2022-10-17 12:01:00 +02:00
_contextvars.c gh-128384: Use a context variable for warnings.catch_warnings (gh-130010) 2025-04-09 16:18:54 -07:00
_warnings.c gh-135801: Improve filtering by module in warn_explicit() without module argument (GH-140151) 2025-10-30 15:55:39 +02:00
asdl.c
asm_trampoline.S gh-136459: Add perf trampoline support for macOS (#136461) 2025-07-22 16:47:24 +01:00
assemble.c gh-87859: Track Code Object Local Kinds For Arguments (gh-132980) 2025-04-29 02:21:47 +00:00
ast.c gh-143055: Implementation of PEP 798 (#143056) 2026-01-30 20:37:52 -08:00
ast_preprocess.c gh-143055: Implementation of PEP 798 (#143056) 2026-01-30 20:37:52 -08:00
ast_unparse.c gh-132661: Disallow Template/str concatenation after PEP 750 spec update (#135996) 2025-07-21 08:44:26 +02:00
bltinmodule.c gh-145076: Check globals type in __lazy_import__() (#145086) 2026-02-21 22:06:59 +01:00
bootstrap_hash.c GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
brc.c Fix typos in documentation and comments (#119763) 2024-06-04 10:22:22 +00:00
bytecodes.c GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948) 2026-02-19 11:52:57 +00:00
ceval.c gh-144981: Make PyUnstable_Code_SetExtra/GetExtra thread-safe (#144980) 2026-02-20 10:52:18 -08:00
ceval.h gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
ceval_gil.c GH-142513: Reimplement executor management (GH-142931) 2025-12-18 16:43:44 +00:00
ceval_macros.h gh-120321: Make gi_yieldfrom thread-safe in free-threading build (#144292) 2026-01-30 12:20:27 -05:00
codecs.c Python/codecs.c: Remove unused forward declaration (#139511) 2025-10-03 13:33:49 +02:00
codegen.c gh-144822: remove redundant decref in codegen.c (#144823) 2026-02-14 19:20:33 +00:00
compile.c gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
condvar.h gh-104530: Enable native Win32 condition variables by default (GH-104531) 2024-02-02 13:50:51 +00:00
config_common.h gh-76785: Add PyInterpreterConfig Helpers (gh-117170) 2024-04-02 20:35:52 +00:00
context.c gh-116738: make entering of contextvars.Context thread safe (#143074) 2026-01-06 12:24:02 +05:30
critical_section.c gh-144513: Skip critical section locking during stop-the-world (gh-144524) 2026-02-06 15:14:08 +00:00
crossinterp.c gh-143377: fix crashes in _interpreters.capture_exception (#143418) 2026-01-10 12:37:54 +01:00
crossinterp_data_lookup.h gh-135443: Sometimes Fall Back to __main__.__dict__ For Globals (gh-135491) 2025-06-16 17:34:19 -06:00
crossinterp_exceptions.h gh-132775: Clean Up Cross-Interpreter Error Handling (gh-135369) 2025-06-13 16:45:21 -06:00
dtoa.c gh-131238: Add explicit includes to pycore headers (#131257) 2025-03-17 12:32:43 +01:00
dup2.c gh-108765: Python.h no longer includes <unistd.h> (#108783) 2023-09-02 16:50:18 +02:00
dynamic_annotations.c
dynload_hpux.c gh-88402: Add new sysconfig variables on Windows (GH-110049) 2023-10-04 22:50:29 +00:00
dynload_shlib.c gh-131238: Remove more includes from pycore_interp.h (#131480) 2025-03-19 23:01:32 +01:00
dynload_stub.c gh-88402: Add new sysconfig variables on Windows (GH-110049) 2023-10-04 22:50:29 +00:00
dynload_win.c gh-131942: Use the Python-specific Py_DEBUG macro rather than _DEBUG in Windows-related C code (GH-131944) 2025-05-08 15:01:25 +00:00
emscripten_signal.c GH-108614: Unbreak emscripten build (GH-109132) 2023-09-08 17:54:45 +01:00
emscripten_syscalls.c gh-124621: Emscripten: Fix __syscall_ioctl patch (GH-136993) 2025-07-22 15:05:26 +02:00
emscripten_trampoline.c gh-128627: Use __builtin_wasm_test_function_pointer_signature for Emscripten trampoline (#137470) 2025-09-17 15:33:55 +01:00
emscripten_trampoline_inner.c gh-128627: Use __builtin_wasm_test_function_pointer_signature for Emscripten trampoline (#137470) 2025-09-17 15:33:55 +01:00
errors.c gh-143547: Fix PyErr_FormatUnraisable() fallback (#143557) 2026-01-09 13:16:22 +01:00
executor_cases.c.h GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948) 2026-02-19 11:52:57 +00:00
fileutils.c gh-42400: Fix buffer overflow in _Py_wrealpath() for very long paths (#141529) 2025-11-18 17:34:58 +01:00
flowgraph.c GH-143493: Conform to spec for generator expressions while supporting virtual iterators (GH-143569) 2026-01-16 09:11:58 +00:00
frame.c gh-144446: Fix some frame object thread-safety issues (gh-144479) 2026-02-06 09:43:36 -05:00
frozen.c GH-89435: os.path should not be a frozen module (#126924) 2024-11-22 18:50:30 +00:00
frozenmain.c Use PyConfig_Get() in frozenmain.c (#137421) 2025-08-06 14:33:28 +02:00
future.c gh-126139: Improve error message location for future statement with unknown feature (#126140) 2024-10-29 23:57:59 +00:00
gc.c gh-141070: Rename PyUnstable_Object_Dump to PyObject_Dump (GH-142848) 2026-01-16 09:19:43 -05:00
gc_free_threading.c gh-144054: no deferred refcount for untracked (gh-144081) 2026-01-20 10:01:09 -08:00
gc_gil.c gh-100240: Use a consistent implementation for freelists (#121934) 2024-07-22 12:08:27 -04:00
generated_cases.c.h GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948) 2026-02-19 11:52:57 +00:00
getargs.c Revert "gh-112068: C API: Add support of nullable arguments in PyArg_Parse (GH-121303)" (#136991) 2025-07-22 16:39:50 +03:00
getcompiler.c gh-141341: Rename COMPILER macro to _Py_COMPILER on Windows (#141342) 2025-11-10 15:50:51 +01:00
getcopyright.c gh-126133: Only use start year in PSF copyright, remove end years (#126236) 2024-11-12 15:59:19 +02:00
getopt.c GH-133336: Remove reserved `-J` flag for Jython (#133444) 2025-05-05 15:09:19 +00:00
getplatform.c
getversion.c gh-119132: Remove "experimental" tag from the CPython free-threading. (gh-135550) 2025-06-16 23:32:52 +09:00
hamt.c gh-142829: Fix use-after-free in Context.__eq__ via re-entrant ContextVar.set (#142905) 2026-01-09 17:57:34 +05:30
hashtable.c gh-111545: Add Py_HashPointer() function (#112096) 2023-12-06 15:09:22 +01:00
import.c gh-145058: Add input validation to _PyImport_LazyImportModuleLevelObject (#145068) 2026-02-21 12:52:40 +00:00
importdl.c gh-140011: Delete importdl assertion that prevents importing embedded modules from packages (GH-141605) 2025-11-26 14:12:49 +01:00
index_pool.c gh-91048: Refactor and optimize remote debugging module (#134652) 2025-05-25 20:19:29 +00:00
initconfig.c gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
instruction_sequence.c GH-143493: Conform to spec for generator expressions while supporting virtual iterators (GH-143569) 2026-01-16 09:11:58 +00:00
instrumentation.c GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948) 2026-02-19 11:52:57 +00:00
interpconfig.c GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
intrinsics.c gh-111489: Remove _PyTuple_FromArray() alias (#139973) 2025-10-11 22:58:14 +02:00
jit.c fix warnings in jit builds (GH-144817) 2026-02-14 17:39:10 +00:00
legacy_tracing.c gh-137400: Fix thread-safety issues when profiling all threads (gh-137518) 2025-08-13 14:15:12 -04:00
lock.c gh-120321: Make gi_yieldfrom thread-safe in free-threading build (#144292) 2026-01-30 12:20:27 -05:00
marshal.c gh-141510, PEP 814: Add built-in frozendict type (#144757) 2026-02-17 10:54:41 +01:00
modsupport.c gh-137210: Add a struct, slot & function for checking an extension's ABI (GH-137212) 2025-09-05 16:23:18 +02:00
mysnprintf.c Add a warning message about PyOS_snprintf (#95993) 2022-10-07 11:49:53 -07:00
mystrtoul.c gh-108765: Python.h no longer includes <ctype.h> (#108831) 2023-09-03 18:54:27 +02:00
object_stack.c gh-100240: Use a consistent implementation for freelists (#121934) 2024-07-22 12:08:27 -04:00
opcode_targets.h gh-142982: Specialize CALL_FUNCTION_EX (GH-143391) 2026-01-06 20:34:08 +00:00
optimizer.c gh-145064: Fix JIT assertion failure during CALL_ALLOC_AND_ENTER_INIT side exit (GH-145100) 2026-02-22 18:46:03 +00:00
optimizer_analysis.c GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948) 2026-02-19 11:52:57 +00:00
optimizer_bytecodes.c Fix warnings on main (GH-145104) 2026-02-22 19:02:15 +08:00
optimizer_cases.c.h Fix warnings on main (GH-145104) 2026-02-22 19:02:15 +08:00
optimizer_symbols.c GH-144651: Optimize the new uops added when recording values during tracing. (GH-144948) 2026-02-19 11:52:57 +00:00
parking_lot.c gh-137433: Fix deadlock with stop-the-world and daemon threads (gh-137735) 2025-09-16 09:21:58 +01:00
pathconfig.c gh-133644: Remove deprecated Python initialization getter functions (#133661) 2025-05-09 11:39:23 +00:00
perf_jit_trampoline.c gh-144194: Fix mmap failure check in perf_jit_trampoline.c (#143713) 2026-01-28 13:30:17 +00:00
perf_trampoline.c gh-144766: Fix a crash in fork child process when perf support is enabled. (#144795) 2026-02-14 11:41:28 +00:00
preconfig.c gh-145092: Fix compiler warning for memchr() and wcschr() returning const pointer (GH-145093) 2026-02-22 10:01:27 +02:00
pyarena.c Chore: Fix typo in pyarena.c (#126527) 2024-11-07 16:37:41 +01:00
pyctype.c
pyfpe.c bpo-46315: Add ifdef HAVE_ feature checks for WASI compatibility (GH-30507) 2022-01-13 09:46:04 +01:00
pyhash.c gh-141226: Deprecate PEP-456 support for embedders (#141287) 2026-02-21 12:42:13 +01:00
pylifecycle.c gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
pymath.c
pystate.c gh-144068: fix JIT tracer memory leak when daemon thread exits (GH-144077) 2026-01-24 09:43:01 +00:00
pystats.c GH-135379: Top of stack caching for the JIT. (GH-135465) 2025-12-11 10:32:52 +00:00
pystrcmp.c gh-108767: Replace ctype.h functions with pyctype.h functions (#108772) 2023-09-01 18:36:53 +02:00
pystrhex.c gh-144015: Add portable SIMD optimization for bytes.hex() et. al. (GH-143991) 2026-02-22 19:19:03 -08:00
pystrtod.c gh-141004: soft-deprecate Py_INFINITY macro (#141033) 2025-11-12 13:44:49 +01:00
Python-ast.c gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
Python-tokenize.c gh-111178: Fix function signatures for test_types (#131455) 2025-03-19 13:46:17 +00:00
pythonrun.c gh-141070: Rename PyUnstable_Object_Dump to PyObject_Dump (GH-142848) 2026-01-16 09:19:43 -05:00
pytime.c gh-80620: Support negative timestamps on windows in time.gmtime, time.localtime, and datetime module (#143463) 2026-01-15 10:51:11 +01:00
qsbr.c gh-144438: Fix false sharing between QSBR and tlbc_index (gh-144554) 2026-02-17 11:12:25 -05:00
README
record_functions.c.h GH-144179: Use recorded values to make optimizer more robust (GH-144437) 2026-02-05 08:58:41 +00:00
remote_debug.h gh-144563: Fix remote debugging with duplicate libpython mappings from ctypes (#144595) 2026-02-10 10:04:50 +00:00
remote_debugging.c gh-138122: Implement frame caching in RemoteUnwinder to reduce memory reads (#142137) 2025-12-06 22:37:34 +00:00
specialize.c gh-100239: Use `PyFloat_AS_DOUBLE and _PyLong_IsZero`` in the float / compactlong specializations (#144826) 2026-02-19 21:45:59 +02:00
stackrefs.c gh-131527: Stackref debug borrow checker (#140599) 2025-11-05 11:12:56 -08:00
stdlib_module_names.h gh-81313: Add the math.integer module (PEP-791) (GH-133909) 2025-10-31 16:13:43 +02:00
structmember.c gh-41779: Allow defining any __slots__ for a class derived from tuple (GH-141763) 2026-01-06 11:36:00 +02:00
suggestions.c GH-131238: Core header refactor (GH-131250) 2025-03-17 09:19:04 +00:00
symtable.c gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
sysmodule.c gh-142349: Implement PEP 810 - Explicit lazy imports (#142351) 2026-02-12 00:15:33 +00:00
thread.c gh-134745: Use "pymutex" for sys.thread_info on Windows (#141140) 2025-11-06 16:10:39 +01:00
thread_nt.h gh-134745: Change PyThread_allocate_lock() implementation to PyMutex (#134747) 2025-05-30 10:15:47 +00:00
thread_pthread.h gh-137884: Added threading.get_native_id() on Illumos/Solaris (GH-137927) 2025-08-20 17:10:44 +00:00
thread_pthread_stubs.h gh-125161: return non zero value in pthread_self on wasi (#125303) 2024-10-13 20:59:41 +05:30
tier2_engine.md Docs: fix spelling of the word 'transferring' (#116641) 2024-03-13 23:53:32 +01:00
traceback.c gh-143108: Don't instrument some faulthandler related functions for TSan (#143450) 2026-01-05 22:13:29 +01:00
tracemalloc.c gh-144763: Don't detach the GIL in tracemalloc (#144779) 2026-02-18 15:57:48 +00:00
uniqueid.c gh-128923: Use zero to indicate unassigned unique id (#128925) 2025-01-17 16:42:27 +01:00
vm-state.md gh-133079: Remove Py_C_RECURSION_LIMIT & PyThreadState.c_recursion_remaining (GH-133080) 2025-04-29 12:56:20 +02:00

Miscellaneous source files for the main Python shared library