This PR implements frame caching in the RemoteUnwinder class to significantly reduce memory reads when profiling remote processes with deep call stacks.
When cache_frames=True, the unwinder stores the frame chain from each sample and reuses unchanged portions in subsequent samples. Since most profiling samples capture similar call stacks (especially the parent frames), this optimization avoids repeatedly reading the same frame data from the target process.
The implementation adds a last_profiled_frame field to the thread state that tracks where the previous sample stopped. On the next sample, if the current frame chain reaches this marker, the cached frames from that point onward are reused instead of being re-read from remote memory.
The sampling profiler now enables frame caching by default.
- Introduce a new field in the GC state to store the frame that initiated garbage collection.
- Update RemoteUnwinder to include options for including "<native>" and "<GC>" frames in the stack trace.
- Modify the sampling profiler to accept parameters for controlling the inclusion of native and GC frames.
- Enhance the stack collector to properly format and append these frames during profiling.
- Add tests to verify the correct behavior of the profiler with respect to native and GC frames, including options to exclude them.
Co-authored-by: Pablo Galindo Salgado <pablogsal@gmail.com>
Follow-up refactoring after GH-133143, where we use `_Py_ID` for "big" and "little"
in `abi_info.byteorder`.
This uses `_Py_ID` for `sys.byteorder`, but also `float_repr_style` and a module name.
Add support for getting and setting groups used for key agreement.
* `ssl.SSLSocket.group()` returns the name of the group used
for the key agreement of the current session establishment.
This feature requires Python to be built with OpenSSL 3.2 or later.
* `ssl.SSLContext.get_groups()` returns the list of names of groups
that are compatible with the TLS version of the current context.
This feature requires Python to be built with OpenSSL 3.5 or later.
* `ssl.SSLContext.set_groups()` sets the groups allowed for key agreement
for sockets created with this context. This feature is always supported.
Change the names of the symbol tables for lambda expressions and generator
expressions to "<lambda>" and "<genexpr>" respectively to avoid conflicts
with user-defined names.
Completely refactor Modules/_remote_debugging_module.c with improved
code organization, replacing scattered reference counting and error
handling with centralized goto error paths. This cleanup improves
maintainability and reduces code duplication throughout the module while
preserving the same external API.
Implement memory page caching optimization in Python/remote_debug.h to
avoid repeated reads of the same memory regions during debugging
operations. The cache stores previously read memory pages and reuses
them for subsequent reads, significantly reducing system calls and
improving performance.
Add code object caching mechanism with a new code_object_generation
field in the interpreter state that tracks when code object caches need
invalidation. This allows efficient reuse of parsed code object metadata
and eliminates redundant processing of the same code objects across
debugging sessions.
Optimize memory operations by replacing multiple individual structure
copies with single bulk reads for the same data structures. This reduces
the number of memory operations and system calls required to gather
debugging information from the target process.
Update Makefile.pre.in to include Python/remote_debug.h in the headers
list, ensuring that changes to the remote debugging header force proper
recompilation of dependent modules and maintain build consistency across
the codebase.
Also, make the module compatible with the free threading build as an extra :)
Co-authored-by: Łukasz Langa <lukasz@langa.pl>
* All parameters of sqlite3.connect() except "database" are now keyword-only.
* The first three parameters of methods create_function() and
create_aggregate() are now positional-only.
* The first parameter of methods set_authorizer(), set_progress_handler()
and set_trace_callback() is now positional-only.
* Add _zstd module for https://peps.python.org/pep-0784/
This commit introduces the `_zstd` module, with bindings to libzstd from
the pyzstd project. It also includes the unix build system configuration.
Windows build system support will be integrated independently as it
depends on integration with cpython-source-deps.
* Add _zstd to modules
* Fix path for compression.zstd module
* Ignore _zstd module like _io
* Expand module state macros to improve code quality
Also removes module state references from the classes in the _zstd
module and instead uses PyType_GetModuleState()
* Remove backticks suggested in review
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
* Use critical sections to lock object state
This should avoid races and deadlocks.
* Remove compress/decompress and mark module as not reliant on the GIL
The `compress`/`decompress` functions will be moved to Python code for simplicity.
C implementations can always be re-added in the future.
Also, mark _zstd as not requiring the GIL.
* Lift critical section to avoid clang warning
* Respond to comments by picnixz
* Call out pyzstd explicitly in license description
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
* Use a much more robust implementation...
... for `get_zstd_state_from_type`
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* Use PyList_GetItemRef for thread safety purposes
* Use a macro for the minimum supported version
* remove const from primivite types
* Use PyMem_New in another spot
* Simplify error handling in _get_frame_size
* Another simplification of error handling in get_frame_info
* Rename _module_state to mod_state
* Rewrite comment explaining the context of the code
* Add link to pyzstd
* Add TODO about refactoring dict training code
* Use PyModule_AddObjectRef over PyModule_AddObject
PyModule_AddObject is soft-deprecated, so we should use PyModule_AddObjectRef
* Check result of OutputBufferGrow
* Simplify return logic in `add_constant_to_type`
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* Ignore return value of _zstd_clear()
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
* Remove redundant comments
* Remove __reduce__ from ZstdDict
We should instead document that to pickle a dictionary a user should use
the `.dict_content` attribute.
* Use PyUnicode_FromFormat instead of a buffer
* Don't use C constants/types in error messages
* Make error messages easier to understand for Python users
* Lower minimum required version 1.4.0
* Use casts and make slot function signatures correct
* Be consistent with CPython on const usage
* Make else clauses in line with PEP 7
* Fix over-indented blocks in argument clinic
* Add critical section around ZSTD_DCtx_setParameter
* Add a TODO about refactoring critical sections
* Use Py_UNREACHABLE
* Move bytes operations out of Py_BEGIN_ALLOW_THREADS
* Add TODO about ensuring a lock is held
* Remove asserts that may not be correct
* Add TODO to make ZstdDict and others GC objects
* Make objects GC tracked
* Remove unused include
* Fix some memory issues
* Fix refleaks on module and in ZstdDict
* Update configure to check for ZDICT_finalizeDictionary
* Properly check version in configure
* exit(1) if check fails
* Use AC_RUN_IFELSE
* Use a define() to re-use version check
* Actually properly set _zstd module status based on version
---------
Co-authored-by: Stan Ulbrych <89152624+StanFromIreland@users.noreply.github.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Bénédikt Tran <10796600+picnixz@users.noreply.github.com>
Make `warnings.catch_warnings()` use a context variable for holding
the warning filtering state if the `sys.flags.context_aware_warnings`
flag is set to true. This makes using the context manager thread-safe in
multi-threaded programs.
Add the `sys.flags.thread_inherit_context` flag. If true, starting a new
thread with `threading.Thread` will use a copy of the context
from the caller of `Thread.start()`.
Both these flags are set to true by default for the free-threaded build
and false for the default build.
Move the Python implementation of warnings.py into _py_warnings.py.
Make _contextvars a builtin module.
Co-authored-by: Kumar Aditya <kumaraditya@python.org>
In the free threaded build, the `_PyObject_LookupSpecial()` call can lead to
reference count contention on the returned function object becuase it
doesn't use stackrefs. Refactor some of the callers to use
`_PyObject_MaybeCallSpecialNoArgs`, which uses stackrefs internally.
This fixes the scaling bottleneck in the "lookup_special" microbenchmark
in `ftscalingbench.py`. However, the are still some uses of
`_PyObject_LookupSpecial()` that need to be addressed in future PRs.
- Restore max field size to sys.maxsize, as in Python 3.13 & below
- PyCField: Split out bit/byte sizes/offsets.
- Expose CField's size/offset data to Python code
- Add generic checks for all the test structs/unions, using the newly exposed attrs
When formatting the AST as a string, infinite values are replaced by
1e309, which evaluates to infinity. The initialization of this string
replacement was not thread-safe in the free threading build.