gh-138122: Improve the profiling section in the 3.15 what's new document (#140156)

This commit is contained in:
Pablo Galindo Salgado 2025-12-07 02:46:33 +00:00 committed by GitHub
parent edff5aaa48
commit c5b37228af
No known key found for this signature in database
GPG key ID: B5690EEEBB952194

View file

@ -66,7 +66,7 @@ Summary -- Release highlights
.. PEP-sized items next.
* :pep:`799`: :ref:`A dedicated profiling package for organizing Python
profiling tools <whatsnew315-sampling-profiler>`
profiling tools <whatsnew315-profiling-package>`
* :pep:`686`: :ref:`Python now uses UTF-8 as the default encoding
<whatsnew315-utf8-default>`
* :pep:`782`: :ref:`A new PyBytesWriter C API to create a Python bytes object
@ -77,12 +77,32 @@ Summary -- Release highlights
New features
============
.. _whatsnew315-profiling-package:
:pep:`799`: A dedicated profiling package
-----------------------------------------
A new :mod:`!profiling` module has been added to organize Python's built-in
profiling tools under a single, coherent namespace. This module contains:
* :mod:`!profiling.tracing`: deterministic function-call tracing (relocated from
:mod:`cProfile`).
* :mod:`!profiling.sampling`: a new statistical sampling profiler (named Tachyon).
The :mod:`cProfile` module remains as an alias for backwards compatibility.
The :mod:`profile` module is deprecated and will be removed in Python 3.17.
.. seealso:: :pep:`799` for further details.
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`138122`.)
.. _whatsnew315-sampling-profiler:
:pep:`799`: High frequency statistical sampling profiler
--------------------------------------------------------
Tachyon: High frequency statistical sampling profiler
-----------------------------------------------------
A new statistical sampling profiler has been added to the new :mod:`!profiling` module as
A new statistical sampling profiler (Tachyon) has been added as
:mod:`!profiling.sampling`. This profiler enables low-overhead performance analysis of
running Python processes without requiring code modification or process restart.
@ -91,101 +111,64 @@ every function call, the sampling profiler periodically captures stack traces fr
running processes. This approach provides virtually zero overhead while achieving
sampling rates of **up to 1,000,000 Hz**, making it the fastest sampling profiler
available for Python (at the time of its contribution) and ideal for debugging
performance issues in production environments.
performance issues in production environments. This capability is particularly
valuable for debugging performance issues in production systems where traditional
profiling approaches would be too intrusive.
Key features include:
* **Zero-overhead profiling**: Attach to any running Python process without
affecting its performance
* **No code modification required**: Profile existing applications without restart
* **Real-time statistics**: Monitor sampling quality during data collection
* **Multiple output formats**: Generate both detailed statistics and flamegraph data
* **Thread-aware profiling**: Option to profile all threads or just the main thread
affecting its performance. Ideal for production debugging where you can't afford
to restart or slow down your application.
Profile process 1234 for 10 seconds with default settings:
* **No code modification required**: Profile existing applications without restart.
Simply point the profiler at a running process by PID and start collecting data.
.. code-block:: shell
* **Flexible target modes**:
python -m profiling.sampling 1234
* Profile running processes by PID (``attach``) - attach to already-running applications
* Run and profile scripts directly (``run``) - profile from the very start of execution
* Execute and profile modules (``run -m``) - profile packages run as ``python -m module``
Profile with custom interval and duration, save to file:
* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
.. code-block:: shell
* **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
network waits, and blocking operations. Use this to understand where your program spends
calendar time, including when waiting for external resources.
* **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
* **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
essential for understanding multi-threaded application behavior.
Generate collapsed stacks for flamegraph:
* **Multiple output formats**: Choose the visualization that best fits your workflow:
.. code-block:: shell
* ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
timing with direct and cumulative samples. Best for detailed analysis and integration with
existing Python profiling tools.
* ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
specifically designed for creating flamegraphs with external tools like Brendan Gregg's
FlameGraph scripts or speedscope.
* ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
* ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
(https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
timeline-based analysis with features like stack charts, markers, and network activity.
* ``--heatmap``: Generates an interactive HTML heatmap visualization with line-level sample
counts. Creates a directory with per-file heatmaps showing exactly where time is spent
at the source code level.
python -m profiling.sampling --collapsed 1234
* **Live interactive mode**: Real-time TUI profiler with a top-like interface (``--live``).
Monitor performance as your application runs with interactive sorting and filtering.
Profile all threads and sort by total time:
* **Async-aware profiling**: Profile async/await code with task-based stack reconstruction
(``--async-aware``). See which coroutines are consuming time, with options to show only
running tasks or all tasks including those waiting.
.. code-block:: shell
python -m profiling.sampling -a --sort-tottime 1234
The profiler generates statistical estimates of where time is spent:
.. code-block:: text
Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
Captured 498841 samples in 5.00 seconds
Sample rate: 99768.04 samples/sec
Error rate: 0.72%
Profile Stats:
nsamples sample% tottime (s) cumul% cumtime (s) filename:lineno(function)
43/418858 0.0 0.000 87.9 4.189 case.py:667(TestCase.run)
3293/418812 0.7 0.033 87.9 4.188 case.py:613(TestCase._callTestMethod)
158562/158562 33.3 1.586 33.3 1.586 test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
129553/129553 27.2 1.296 27.2 1.296 ast.py:46(parse)
0/128129 0.0 0.000 26.9 1.281 test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
7/67446 0.0 0.000 14.2 0.674 test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
6/60380 0.0 0.000 12.7 0.604 test_ast.py:888(AST_Tests.test_ast_recursion_limit)
3/50020 0.0 0.000 10.5 0.500 test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
1/38011 0.0 0.000 8.0 0.380 test_ast.py:886(AST_Tests.test_ast_recursion_limit)
1/25076 0.0 0.000 5.3 0.251 test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
22361/22362 4.7 0.224 4.7 0.224 test_compile.py:1368(TestSpecifics.test_big_dict_literal)
4/18008 0.0 0.000 3.8 0.180 test_ast.py:889(AST_Tests.test_ast_recursion_limit)
11/17696 0.0 0.000 3.7 0.177 subprocess.py:1038(Popen.__init__)
16968/16968 3.6 0.170 3.6 0.170 subprocess.py:1900(Popen._execute_child)
2/16941 0.0 0.000 3.6 0.169 test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
Legend:
nsamples: Direct/Cumulative samples (direct executing / on call stack)
sample%: Percentage of total samples this function was directly executing
tottime: Estimated total time spent directly in this function
cumul%: Percentage of total samples when this function was on the call stack
cumtime: Estimated cumulative time (including time in called functions)
filename:lineno(function): Function location and name
Summary of Interesting Functions:
Functions with Highest Direct/Cumulative Ratio (Hot Spots):
1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
Functions with Highest Call Frequency (Indirect Calls):
418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
Functions with Highest Call Magnification (Cumulative/Direct):
12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
The profiler automatically identifies performance bottlenecks through statistical
analysis, highlighting functions with high CPU usage and call frequency patterns.
This capability is particularly valuable for debugging performance issues in
production systems where traditional profiling approaches would be too intrusive.
.. seealso:: :pep:`799` for further details.
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953`.)
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953` and :gh:`138122`.)
.. _whatsnew315-improved-error-messages: