When running profiling, users rarely care about the global percentage of
the runtime. Often, they want to select a function and measure child
percentages relative to that.
This PR updates the flamegraph tooltips to show both "Percentage" and
"Relative Percentage" when the user clicks a specific function.
Use exact remote reads for interpreter state, thread state, and
interpreter frame structs instead of pulling full remote pages into the
profiler page cache. This matches the core change from
python/cpython#149585.
The profiler clears the page cache between samples, so live entries are
always packed at the front. Track the live count and only clear/search
that prefix instead of scanning all 1024 slots on the hot path.
Use the frame cache to predict the next thread state and top frame
address, then batch interpreter/thread/frame reads with process_vm_readv
when profiling a Linux target. Reuse prefetched frame buffers in the
frame walker when the prediction is valid.
Cache the last FrameInfo tuple per code object/instruction offset, reuse
cached thread id objects, and append cached parent frames directly on
full frame-cache hits. This cuts Python allocation churn in the
steady-state profiler path.
The line highlights on the heatmap are driven by the URL hash and the
`:target` selector. When clicking a caller/callee link for the line that
was already selected, the hash doesn't change, so the browser keeps the
existing target state and doesn't restart the animation. Due to this the
highlight only works the first time.
With this fix, line navigation goes through JavaScript. If the target
URL already points to the current location, the highlight is replayed by
clearing the animation, forcing style recalculation, and restoring it.
The `baseline_self` variable isn't initialized for structural elided
roots. This variable is accessed later unconditionally and leads to a
crash.
The child process ends up being invoked with `--diff_flamegraph` instead
of the correct argument.
Adds `python -m profiling.sampling dump <pid>`, which prints a single
traceback-style snapshot of a running process's Python stack via the
existing `_remote_debugging` unwinder. Supports per-thread status,
source line highlighting, optional bytecode opcodes, and async-aware
task reconstruction (`--async-aware`, default `--async-mode=all`).
Fix inverted flamegraph width
The inverted view used thread presence as a proxy for self time.
This missed self samples on C-level wrapper frames like _run_code,
where the node's thread always appears in its children too. Those
samples were silently dropped, causing the chart to render narrower
than full width. Now uses the explicit self field on each node
instead of the thread heuristic.
We already show self time in differential flamegraphs, but it should
be included in regular flamegraphs as well. Display the time spent
in the function body excluding callees, not just the total inclusive
time.
Differential flame graphs compare two profiling runs and highlight where
performance has changed. This makes it easier to detect regressions
introduced by code changes and to verify that optimizations have the
intended effect.
The visualization renders the current profile with frame widths
representing current time consumption. Color is then applied to show the
difference relative to the baseline profile: red gradients indicate
regressions, while blue gradients indicate improvements.
Some call paths may disappear entirely between profiles. These are
referred to as elided stacks and occur when optimizations remove code
paths or when certain branches stop executing. When elided stacks are
present, an "Elided" toggle is displayed, allowing the user to switch
between the main differential view and a view showing only the removed
paths.
Co-authored-by: Pablo Galindo Salgado <Pablogsal@gmail.com>
When running the `profiling.sampling` module in pstats mode, the output
can be emitted in two different ways: text to stdout or a binary file
when the `--output` argument is set.
The current documentation and help text is confusing as it does not
distinguish between these two output formats so it may be surprising to
the user to get different formats depending whether `--output` is set or not.
The bytecode panel appears when a user generates a heatmap with
--opcodes and clicks the button to unfold a line and display the
bytecode instructions. Currently, an empty space appears on the
left where the line number, self, and total columns are displayed.
This area should instead extend those columns, rather than leaving
a gap.