mirror of
https://github.com/python/cpython.git
synced 2025-12-07 13:50:06 +00:00
gh-138122: Improve the profiling section in the 3.15 what's new document (#140156)
This commit is contained in:
parent
edff5aaa48
commit
c5b37228af
1 changed files with 67 additions and 84 deletions
|
|
@ -66,7 +66,7 @@ Summary -- Release highlights
|
||||||
.. PEP-sized items next.
|
.. PEP-sized items next.
|
||||||
|
|
||||||
* :pep:`799`: :ref:`A dedicated profiling package for organizing Python
|
* :pep:`799`: :ref:`A dedicated profiling package for organizing Python
|
||||||
profiling tools <whatsnew315-sampling-profiler>`
|
profiling tools <whatsnew315-profiling-package>`
|
||||||
* :pep:`686`: :ref:`Python now uses UTF-8 as the default encoding
|
* :pep:`686`: :ref:`Python now uses UTF-8 as the default encoding
|
||||||
<whatsnew315-utf8-default>`
|
<whatsnew315-utf8-default>`
|
||||||
* :pep:`782`: :ref:`A new PyBytesWriter C API to create a Python bytes object
|
* :pep:`782`: :ref:`A new PyBytesWriter C API to create a Python bytes object
|
||||||
|
|
@ -77,12 +77,32 @@ Summary -- Release highlights
|
||||||
New features
|
New features
|
||||||
============
|
============
|
||||||
|
|
||||||
|
.. _whatsnew315-profiling-package:
|
||||||
|
|
||||||
|
:pep:`799`: A dedicated profiling package
|
||||||
|
-----------------------------------------
|
||||||
|
|
||||||
|
A new :mod:`!profiling` module has been added to organize Python's built-in
|
||||||
|
profiling tools under a single, coherent namespace. This module contains:
|
||||||
|
|
||||||
|
* :mod:`!profiling.tracing`: deterministic function-call tracing (relocated from
|
||||||
|
:mod:`cProfile`).
|
||||||
|
* :mod:`!profiling.sampling`: a new statistical sampling profiler (named Tachyon).
|
||||||
|
|
||||||
|
The :mod:`cProfile` module remains as an alias for backwards compatibility.
|
||||||
|
The :mod:`profile` module is deprecated and will be removed in Python 3.17.
|
||||||
|
|
||||||
|
.. seealso:: :pep:`799` for further details.
|
||||||
|
|
||||||
|
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`138122`.)
|
||||||
|
|
||||||
|
|
||||||
.. _whatsnew315-sampling-profiler:
|
.. _whatsnew315-sampling-profiler:
|
||||||
|
|
||||||
:pep:`799`: High frequency statistical sampling profiler
|
Tachyon: High frequency statistical sampling profiler
|
||||||
--------------------------------------------------------
|
-----------------------------------------------------
|
||||||
|
|
||||||
A new statistical sampling profiler has been added to the new :mod:`!profiling` module as
|
A new statistical sampling profiler (Tachyon) has been added as
|
||||||
:mod:`!profiling.sampling`. This profiler enables low-overhead performance analysis of
|
:mod:`!profiling.sampling`. This profiler enables low-overhead performance analysis of
|
||||||
running Python processes without requiring code modification or process restart.
|
running Python processes without requiring code modification or process restart.
|
||||||
|
|
||||||
|
|
@ -91,101 +111,64 @@ every function call, the sampling profiler periodically captures stack traces fr
|
||||||
running processes. This approach provides virtually zero overhead while achieving
|
running processes. This approach provides virtually zero overhead while achieving
|
||||||
sampling rates of **up to 1,000,000 Hz**, making it the fastest sampling profiler
|
sampling rates of **up to 1,000,000 Hz**, making it the fastest sampling profiler
|
||||||
available for Python (at the time of its contribution) and ideal for debugging
|
available for Python (at the time of its contribution) and ideal for debugging
|
||||||
performance issues in production environments.
|
performance issues in production environments. This capability is particularly
|
||||||
|
valuable for debugging performance issues in production systems where traditional
|
||||||
|
profiling approaches would be too intrusive.
|
||||||
|
|
||||||
Key features include:
|
Key features include:
|
||||||
|
|
||||||
* **Zero-overhead profiling**: Attach to any running Python process without
|
* **Zero-overhead profiling**: Attach to any running Python process without
|
||||||
affecting its performance
|
affecting its performance. Ideal for production debugging where you can't afford
|
||||||
* **No code modification required**: Profile existing applications without restart
|
to restart or slow down your application.
|
||||||
* **Real-time statistics**: Monitor sampling quality during data collection
|
|
||||||
* **Multiple output formats**: Generate both detailed statistics and flamegraph data
|
|
||||||
* **Thread-aware profiling**: Option to profile all threads or just the main thread
|
|
||||||
|
|
||||||
Profile process 1234 for 10 seconds with default settings:
|
* **No code modification required**: Profile existing applications without restart.
|
||||||
|
Simply point the profiler at a running process by PID and start collecting data.
|
||||||
|
|
||||||
.. code-block:: shell
|
* **Flexible target modes**:
|
||||||
|
|
||||||
python -m profiling.sampling 1234
|
* Profile running processes by PID (``attach``) - attach to already-running applications
|
||||||
|
* Run and profile scripts directly (``run``) - profile from the very start of execution
|
||||||
|
* Execute and profile modules (``run -m``) - profile packages run as ``python -m module``
|
||||||
|
|
||||||
Profile with custom interval and duration, save to file:
|
* **Multiple profiling modes**: Choose what to measure based on your performance investigation:
|
||||||
|
|
||||||
.. code-block:: shell
|
* **Wall-clock time** (``--mode wall``, default): Measures real elapsed time including I/O,
|
||||||
|
network waits, and blocking operations. Use this to understand where your program spends
|
||||||
|
calendar time, including when waiting for external resources.
|
||||||
|
* **CPU time** (``--mode cpu``): Measures only active CPU execution time, excluding I/O waits
|
||||||
|
and blocking. Use this to identify CPU-bound bottlenecks and optimize computational work.
|
||||||
|
* **GIL-holding time** (``--mode gil``): Measures time spent holding Python's Global Interpreter
|
||||||
|
Lock. Use this to identify which threads dominate GIL usage in multi-threaded applications.
|
||||||
|
|
||||||
python -m profiling.sampling -i 50 -d 30 -o profile.stats 1234
|
* **Thread-aware profiling**: Option to profile all threads (``-a``) or just the main thread,
|
||||||
|
essential for understanding multi-threaded application behavior.
|
||||||
|
|
||||||
Generate collapsed stacks for flamegraph:
|
* **Multiple output formats**: Choose the visualization that best fits your workflow:
|
||||||
|
|
||||||
.. code-block:: shell
|
* ``--pstats``: Detailed tabular statistics compatible with :mod:`pstats`. Shows function-level
|
||||||
|
timing with direct and cumulative samples. Best for detailed analysis and integration with
|
||||||
|
existing Python profiling tools.
|
||||||
|
* ``--collapsed``: Generates collapsed stack traces (one line per stack). This format is
|
||||||
|
specifically designed for creating flamegraphs with external tools like Brendan Gregg's
|
||||||
|
FlameGraph scripts or speedscope.
|
||||||
|
* ``--flamegraph``: Generates a self-contained interactive HTML flamegraph using D3.js.
|
||||||
|
Opens directly in your browser for immediate visual analysis. Flamegraphs show the call
|
||||||
|
hierarchy where width represents time spent, making it easy to spot bottlenecks at a glance.
|
||||||
|
* ``--gecko``: Generates Gecko Profiler format compatible with Firefox Profiler
|
||||||
|
(https://profiler.firefox.com). Upload the output to Firefox Profiler for advanced
|
||||||
|
timeline-based analysis with features like stack charts, markers, and network activity.
|
||||||
|
* ``--heatmap``: Generates an interactive HTML heatmap visualization with line-level sample
|
||||||
|
counts. Creates a directory with per-file heatmaps showing exactly where time is spent
|
||||||
|
at the source code level.
|
||||||
|
|
||||||
python -m profiling.sampling --collapsed 1234
|
* **Live interactive mode**: Real-time TUI profiler with a top-like interface (``--live``).
|
||||||
|
Monitor performance as your application runs with interactive sorting and filtering.
|
||||||
|
|
||||||
Profile all threads and sort by total time:
|
* **Async-aware profiling**: Profile async/await code with task-based stack reconstruction
|
||||||
|
(``--async-aware``). See which coroutines are consuming time, with options to show only
|
||||||
|
running tasks or all tasks including those waiting.
|
||||||
|
|
||||||
.. code-block:: shell
|
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953` and :gh:`138122`.)
|
||||||
|
|
||||||
python -m profiling.sampling -a --sort-tottime 1234
|
|
||||||
|
|
||||||
The profiler generates statistical estimates of where time is spent:
|
|
||||||
|
|
||||||
.. code-block:: text
|
|
||||||
|
|
||||||
Real-time sampling stats: Mean: 100261.5Hz (9.97µs) Min: 86333.4Hz (11.58µs) Max: 118807.2Hz (8.42µs) Samples: 400001
|
|
||||||
Captured 498841 samples in 5.00 seconds
|
|
||||||
Sample rate: 99768.04 samples/sec
|
|
||||||
Error rate: 0.72%
|
|
||||||
Profile Stats:
|
|
||||||
nsamples sample% tottime (s) cumul% cumtime (s) filename:lineno(function)
|
|
||||||
43/418858 0.0 0.000 87.9 4.189 case.py:667(TestCase.run)
|
|
||||||
3293/418812 0.7 0.033 87.9 4.188 case.py:613(TestCase._callTestMethod)
|
|
||||||
158562/158562 33.3 1.586 33.3 1.586 test_compile.py:725(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
|
|
||||||
129553/129553 27.2 1.296 27.2 1.296 ast.py:46(parse)
|
|
||||||
0/128129 0.0 0.000 26.9 1.281 test_ast.py:884(AST_Tests.test_ast_recursion_limit.<locals>.check_limit)
|
|
||||||
7/67446 0.0 0.000 14.2 0.674 test_compile.py:729(TestSpecifics.test_compiler_recursion_limit)
|
|
||||||
6/60380 0.0 0.000 12.7 0.604 test_ast.py:888(AST_Tests.test_ast_recursion_limit)
|
|
||||||
3/50020 0.0 0.000 10.5 0.500 test_compile.py:727(TestSpecifics.test_compiler_recursion_limit)
|
|
||||||
1/38011 0.0 0.000 8.0 0.380 test_ast.py:886(AST_Tests.test_ast_recursion_limit)
|
|
||||||
1/25076 0.0 0.000 5.3 0.251 test_compile.py:728(TestSpecifics.test_compiler_recursion_limit)
|
|
||||||
22361/22362 4.7 0.224 4.7 0.224 test_compile.py:1368(TestSpecifics.test_big_dict_literal)
|
|
||||||
4/18008 0.0 0.000 3.8 0.180 test_ast.py:889(AST_Tests.test_ast_recursion_limit)
|
|
||||||
11/17696 0.0 0.000 3.7 0.177 subprocess.py:1038(Popen.__init__)
|
|
||||||
16968/16968 3.6 0.170 3.6 0.170 subprocess.py:1900(Popen._execute_child)
|
|
||||||
2/16941 0.0 0.000 3.6 0.169 test_compile.py:730(TestSpecifics.test_compiler_recursion_limit)
|
|
||||||
|
|
||||||
Legend:
|
|
||||||
nsamples: Direct/Cumulative samples (direct executing / on call stack)
|
|
||||||
sample%: Percentage of total samples this function was directly executing
|
|
||||||
tottime: Estimated total time spent directly in this function
|
|
||||||
cumul%: Percentage of total samples when this function was on the call stack
|
|
||||||
cumtime: Estimated cumulative time (including time in called functions)
|
|
||||||
filename:lineno(function): Function location and name
|
|
||||||
|
|
||||||
Summary of Interesting Functions:
|
|
||||||
|
|
||||||
Functions with Highest Direct/Cumulative Ratio (Hot Spots):
|
|
||||||
1.000 direct/cumulative ratio, 33.3% direct samples: test_compile.py:(TestSpecifics.test_compiler_recursion_limit.<locals>.check_limit)
|
|
||||||
1.000 direct/cumulative ratio, 27.2% direct samples: ast.py:(parse)
|
|
||||||
1.000 direct/cumulative ratio, 3.6% direct samples: subprocess.py:(Popen._execute_child)
|
|
||||||
|
|
||||||
Functions with Highest Call Frequency (Indirect Calls):
|
|
||||||
418815 indirect calls, 87.9% total stack presence: case.py:(TestCase.run)
|
|
||||||
415519 indirect calls, 87.9% total stack presence: case.py:(TestCase._callTestMethod)
|
|
||||||
159470 indirect calls, 33.5% total stack presence: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
|
|
||||||
|
|
||||||
Functions with Highest Call Magnification (Cumulative/Direct):
|
|
||||||
12267.9x call magnification, 159470 indirect calls from 13 direct: test_compile.py:(TestSpecifics.test_compiler_recursion_limit)
|
|
||||||
10581.7x call magnification, 116388 indirect calls from 11 direct: test_ast.py:(AST_Tests.test_ast_recursion_limit)
|
|
||||||
9740.9x call magnification, 418815 indirect calls from 43 direct: case.py:(TestCase.run)
|
|
||||||
|
|
||||||
The profiler automatically identifies performance bottlenecks through statistical
|
|
||||||
analysis, highlighting functions with high CPU usage and call frequency patterns.
|
|
||||||
|
|
||||||
This capability is particularly valuable for debugging performance issues in
|
|
||||||
production systems where traditional profiling approaches would be too intrusive.
|
|
||||||
|
|
||||||
.. seealso:: :pep:`799` for further details.
|
|
||||||
|
|
||||||
(Contributed by Pablo Galindo and László Kiss Kollár in :gh:`135953`.)
|
|
||||||
|
|
||||||
|
|
||||||
.. _whatsnew315-improved-error-messages:
|
.. _whatsnew315-improved-error-messages:
|
||||||
|
|
|
||||||
Loading…
Add table
Add a link
Reference in a new issue