| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | .. highlight:: shell-session
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. _perf_profiling:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | ==============================================
 | 
					
						
							|  |  |  | Python support for the Linux ``perf`` profiler
 | 
					
						
							|  |  |  | ==============================================
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | :author: Pablo Galindo
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | `The Linux perf profiler <https://perf.wiki.kernel.org>`_
 | 
					
						
							|  |  |  | is a very powerful tool that allows you to profile and obtain
 | 
					
						
							|  |  |  | information about the performance of your application.
 | 
					
						
							|  |  |  | ``perf`` also has a very vibrant ecosystem of tools
 | 
					
						
							|  |  |  | that aid with the analysis of the data that it produces.
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | The main problem with using the ``perf`` profiler with Python applications is that
 | 
					
						
							| 
									
										
										
										
											2023-04-14 04:07:49 +01:00
										 |  |  | ``perf`` only gets information about native symbols, that is, the names of
 | 
					
						
							|  |  |  | functions and procedures written in C. This means that the names and file names
 | 
					
						
							|  |  |  | of Python functions in your code will not appear in the output of ``perf``.
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | Since Python 3.12, the interpreter can run in a special mode that allows Python
 | 
					
						
							|  |  |  | functions to appear in the output of the ``perf`` profiler. When this mode is
 | 
					
						
							|  |  |  | enabled, the interpreter will interpose a small piece of code compiled on the
 | 
					
						
							|  |  |  | fly before the execution of every Python function and it will teach ``perf`` the
 | 
					
						
							|  |  |  | relationship between this piece of code and the associated Python function using
 | 
					
						
							| 
									
										
										
										
											2023-05-21 03:12:24 -07:00
										 |  |  | :doc:`perf map files <../c-api/perfmaps>`.
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | .. note::
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-04-14 04:07:49 +01:00
										 |  |  |     Support for the ``perf`` profiler is currently only available for Linux on
 | 
					
						
							|  |  |  |     select architectures. Check the output of the ``configure`` build step or
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  |     check the output of ``python -m sysconfig | grep HAVE_PERF_TRAMPOLINE``
 | 
					
						
							|  |  |  |     to see if your system is supported.
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For example, consider the following script:
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | .. code-block:: python
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def foo(n):
 | 
					
						
							|  |  |  |         result = 0
 | 
					
						
							|  |  |  |         for _ in range(n):
 | 
					
						
							|  |  |  |             result += 1
 | 
					
						
							|  |  |  |         return result
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def bar(n):
 | 
					
						
							|  |  |  |         foo(n)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     def baz(n):
 | 
					
						
							|  |  |  |         bar(n)
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     if __name__ == "__main__":
 | 
					
						
							|  |  |  |         baz(1000000)
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-04-14 04:07:49 +01:00
										 |  |  | We can run ``perf`` to sample CPU stack traces at 9999 hertz::
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  |     $ perf record -F 9999 -g -o perf.data python my_script.py
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-04-14 04:07:49 +01:00
										 |  |  | Then we can use ``perf report`` to analyze the data:
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | .. code-block:: shell-session
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     $ perf report --stdio -n -g
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     # Children      Self       Samples  Command     Shared Object       Symbol
 | 
					
						
							|  |  |  |     # ........  ........  ............  ..........  ..................  ..........................................
 | 
					
						
							|  |  |  |     #
 | 
					
						
							|  |  |  |         91.08%     0.00%             0  python.exe  python.exe          [.] _start
 | 
					
						
							|  |  |  |                 |
 | 
					
						
							|  |  |  |                 ---_start
 | 
					
						
							|  |  |  |                 |
 | 
					
						
							|  |  |  |                     --90.71%--__libc_start_main
 | 
					
						
							|  |  |  |                             Py_BytesMain
 | 
					
						
							|  |  |  |                             |
 | 
					
						
							|  |  |  |                             |--56.88%--pymain_run_python.constprop.0
 | 
					
						
							|  |  |  |                             |          |
 | 
					
						
							|  |  |  |                             |          |--56.13%--_PyRun_AnyFileObject
 | 
					
						
							|  |  |  |                             |          |          _PyRun_SimpleFileObject
 | 
					
						
							|  |  |  |                             |          |          |
 | 
					
						
							|  |  |  |                             |          |          |--55.02%--run_mod
 | 
					
						
							|  |  |  |                             |          |          |          |
 | 
					
						
							|  |  |  |                             |          |          |           --54.65%--PyEval_EvalCode
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     PyObject_Vectorcall
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_Vector
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     PyObject_Vectorcall
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_Vector
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     PyObject_Vectorcall
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_Vector
 | 
					
						
							|  |  |  |                             |          |          |                     |
 | 
					
						
							|  |  |  |                             |          |          |                     |--51.67%--_PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     |          |
 | 
					
						
							|  |  |  |                             |          |          |                     |          |--11.52%--_PyLong_Add
 | 
					
						
							|  |  |  |                             |          |          |                     |          |          |
 | 
					
						
							|  |  |  |                             |          |          |                     |          |          |--2.97%--_PyObject_Malloc
 | 
					
						
							|  |  |  |     ...
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-11-16 00:40:19 +09:00
										 |  |  | As you can see, the Python functions are not shown in the output, only ``_PyEval_EvalFrameDefault``
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | (the function that evaluates the Python bytecode) shows up. Unfortunately that's not very useful because all Python
 | 
					
						
							|  |  |  | functions use the same C function to evaluate bytecode so we cannot know which Python function corresponds to which
 | 
					
						
							|  |  |  | bytecode-evaluating function.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | Instead, if we run the same experiment with ``perf`` support enabled we get:
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | .. code-block:: shell-session
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     $ perf report --stdio -n -g
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |     # Children      Self       Samples  Command     Shared Object       Symbol
 | 
					
						
							|  |  |  |     # ........  ........  ............  ..........  ..................  .....................................................................
 | 
					
						
							|  |  |  |     #
 | 
					
						
							|  |  |  |         90.58%     0.36%             1  python.exe  python.exe          [.] _start
 | 
					
						
							|  |  |  |                 |
 | 
					
						
							|  |  |  |                 ---_start
 | 
					
						
							|  |  |  |                 |
 | 
					
						
							|  |  |  |                     --89.86%--__libc_start_main
 | 
					
						
							|  |  |  |                             Py_BytesMain
 | 
					
						
							|  |  |  |                             |
 | 
					
						
							|  |  |  |                             |--55.43%--pymain_run_python.constprop.0
 | 
					
						
							|  |  |  |                             |          |
 | 
					
						
							|  |  |  |                             |          |--54.71%--_PyRun_AnyFileObject
 | 
					
						
							|  |  |  |                             |          |          _PyRun_SimpleFileObject
 | 
					
						
							|  |  |  |                             |          |          |
 | 
					
						
							|  |  |  |                             |          |          |--53.62%--run_mod
 | 
					
						
							|  |  |  |                             |          |          |          |
 | 
					
						
							|  |  |  |                             |          |          |           --53.26%--PyEval_EvalCode
 | 
					
						
							|  |  |  |                             |          |          |                     py::<module>:/src/script.py
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     PyObject_Vectorcall
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_Vector
 | 
					
						
							|  |  |  |                             |          |          |                     py::baz:/src/script.py
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     PyObject_Vectorcall
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_Vector
 | 
					
						
							|  |  |  |                             |          |          |                     py::bar:/src/script.py
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     PyObject_Vectorcall
 | 
					
						
							|  |  |  |                             |          |          |                     _PyEval_Vector
 | 
					
						
							|  |  |  |                             |          |          |                     py::foo:/src/script.py
 | 
					
						
							|  |  |  |                             |          |          |                     |
 | 
					
						
							|  |  |  |                             |          |          |                     |--51.81%--_PyEval_EvalFrameDefault
 | 
					
						
							|  |  |  |                             |          |          |                     |          |
 | 
					
						
							|  |  |  |                             |          |          |                     |          |--13.77%--_PyLong_Add
 | 
					
						
							|  |  |  |                             |          |          |                     |          |          |
 | 
					
						
							|  |  |  |                             |          |          |                     |          |          |--3.26%--_PyObject_Malloc
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | How to enable ``perf`` profiling support
 | 
					
						
							|  |  |  | ----------------------------------------
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-04-14 04:07:49 +01:00
										 |  |  | ``perf`` profiling support can be enabled either from the start using
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | the environment variable :envvar:`PYTHONPERFSUPPORT` or the
 | 
					
						
							|  |  |  | :option:`-X perf <-X>` option,
 | 
					
						
							|  |  |  | or dynamically using :func:`sys.activate_stack_trampoline` and
 | 
					
						
							|  |  |  | :func:`sys.deactivate_stack_trampoline`.
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | The :mod:`!sys` functions take precedence over the :option:`!-X` option,
 | 
					
						
							|  |  |  | the :option:`!-X` option takes precedence over the environment variable.
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | Example, using the environment variable::
 | 
					
						
							| 
									
										
										
										
											2022-08-30 19:37:22 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-10-05 19:27:19 +02:00
										 |  |  |    $ PYTHONPERFSUPPORT=1 python script.py
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  |    $ perf report -g -i perf.data
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | Example, using the :option:`!-X` option::
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  |    $ python -X perf script.py
 | 
					
						
							|  |  |  |    $ perf report -g -i perf.data
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | Example, using the :mod:`sys` APIs in file :file:`example.py`:
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | .. code-block:: python
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  |    import sys
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  |    sys.activate_stack_trampoline("perf")
 | 
					
						
							|  |  |  |    do_profiled_stuff()
 | 
					
						
							|  |  |  |    sys.deactivate_stack_trampoline()
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  |    non_profiled_stuff()
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | ...then::
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  |    $ python ./example.py
 | 
					
						
							|  |  |  |    $ perf report -g -i perf.data
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | How to obtain the best results
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | ------------------------------
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2023-04-14 04:07:49 +01:00
										 |  |  | For best results, Python should be compiled with
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | ``CFLAGS="-fno-omit-frame-pointer -mno-omit-leaf-frame-pointer"`` as this allows
 | 
					
						
							|  |  |  | profilers to unwind using only the frame pointer and not on DWARF debug
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | information. This is because as the code that is interposed to allow ``perf``
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | support is dynamically generated it doesn't have any DWARF debugging information
 | 
					
						
							|  |  |  | available.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2022-10-27 15:06:48 +02:00
										 |  |  | You can check if your system has been compiled with this flag by running::
 | 
					
						
							| 
									
										
										
										
											2022-08-30 18:11:18 +01:00
										 |  |  | 
 | 
					
						
							|  |  |  |     $ python -m sysconfig | grep 'no-omit-frame-pointer'
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you don't see any output it means that your interpreter has not been compiled with
 | 
					
						
							|  |  |  | frame pointers and therefore it may not be able to show Python functions in the output
 | 
					
						
							|  |  |  | of ``perf``.
 | 
					
						
							| 
									
										
										
										
											2024-05-05 03:07:29 +02:00
										 |  |  | 
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | How to work without frame pointers
 | 
					
						
							|  |  |  | ----------------------------------
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | If you are working with a Python interpreter that has been compiled without frame pointers
 | 
					
						
							|  |  |  | you can still use the ``perf`` profiler but the overhead will be a bit higher because Python
 | 
					
						
							|  |  |  | needs to generate unwinding information for every Python function call on the fly. Additionally,
 | 
					
						
							|  |  |  | ``perf`` will take more time to process the data because it will need to use the DWARF debugging
 | 
					
						
							|  |  |  | information to unwind the stack and this is a slow process.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-05-07 13:47:21 +02:00
										 |  |  | To enable this mode, you can use the environment variable :envvar:`PYTHON_PERF_JIT_SUPPORT` or the
 | 
					
						
							|  |  |  | :option:`-X perf_jit <-X>` option, which will enable the JIT mode for the ``perf`` profiler.
 | 
					
						
							| 
									
										
										
										
											2024-05-05 03:07:29 +02:00
										 |  |  | 
 | 
					
						
							|  |  |  | When using the perf JIT mode, you need an extra step before you can run ``perf report``. You need to
 | 
					
						
							|  |  |  | call the ``perf inject`` command to inject the JIT information into the ``perf.data`` file.
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-05-07 13:47:21 +02:00
										 |  |  |     $ perf record -F 9999 -g --call-graph dwarf -o perf.data python -Xperf_jit my_script.py
 | 
					
						
							| 
									
										
										
										
											2024-05-05 03:07:29 +02:00
										 |  |  |     $ perf inject -i perf.data --jit
 | 
					
						
							|  |  |  |     $ perf report -g -i perf.data
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | or using the environment variable::
 | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2024-05-07 13:47:21 +02:00
										 |  |  |     $ PYTHON_PERF_JIT_SUPPORT=1 perf record -F 9999 -g --call-graph dwarf -o perf.data python my_script.py
 | 
					
						
							| 
									
										
										
										
											2024-05-05 03:07:29 +02:00
										 |  |  |     $ perf inject -i perf.data --jit
 | 
					
						
							|  |  |  |     $ perf report -g -i perf.data
 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Notice that when using ``--call-graph dwarf`` the ``perf`` tool will take snapshots of the stack of
 | 
					
						
							|  |  |  | the process being profiled and save the information in the ``perf.data`` file. By default the size of
 | 
					
						
							|  |  |  | the stack dump is 8192 bytes but the user can change the size by passing the size after comma like
 | 
					
						
							|  |  |  | ``--call-graph dwarf,4096``. The size of the stack dump is important because if the size is too small
 | 
					
						
							|  |  |  | ``perf`` will not be able to unwind the stack and the output will be incomplete.
 | 
					
						
							|  |  |  | 
 |