Adds `python -m profiling.sampling dump <pid>`, which prints a single
traceback-style snapshot of a running process's Python stack via the
existing `_remote_debugging` unwinder. Supports per-thread status,
source line highlighting, optional bytecode opcodes, and async-aware
task reconstruction (`--async-aware`, default `--async-mode=all`).
We already show self time in differential flamegraphs, but it should
be included in regular flamegraphs as well. Display the time spent
in the function body excluding callees, not just the total inclusive
time.
Differential flame graphs compare two profiling runs and highlight where
performance has changed. This makes it easier to detect regressions
introduced by code changes and to verify that optimizations have the
intended effect.
The visualization renders the current profile with frame widths
representing current time consumption. Color is then applied to show the
difference relative to the baseline profile: red gradients indicate
regressions, while blue gradients indicate improvements.
Some call paths may disappear entirely between profiles. These are
referred to as elided stacks and occur when optimizations remove code
paths or when certain branches stop executing. When elided stacks are
present, an "Elided" toggle is displayed, allowing the user to switch
between the main differential view and a view showing only the removed
paths.
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
The tests were flaky on slow machines because subprocesses could finish
before enough samples were collected. This adds synchronization similar
to test_external_inspection: test scripts now signal when they start
working, and the profiler waits for this signal before sampling.
Test scripts now run in infinite loops until killed rather than for
fixed iterations, ensuring the profiler always has active work to
sample regardless of machine speed.