This reverts commit go.dev/cl/528657.
Reason for revert: broke a lot of builders.
Change-Id: I70c33062020e997c4df67b3eaa2e886cf0da961e
Reviewed-on: https://go-review.googlesource.com/c/go/+/543660
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Add runtime-internal locks to the mutex contention profile.
Store up to one call stack responsible for lock contention on the M,
until it's safe to contribute its value to the mprof table. Try to use
that limited local storage space for a relatively large source of
contention, and attribute any contention in stacks we're not able to
store to a sentinel _LostContendedLock function.
Avoid ballooning lock contention while manipulating the mprof table by
attributing to that sentinel function any lock contention experienced
while reporting lock contention.
Guard collecting real call stacks with GODEBUG=profileruntimelocks=1,
since the available data has mixed semantics; we can easily capture an
M's own wait time, but we'd prefer for the profile entry of each
critical section to describe how long it made the other Ms wait. It's
too late in the Go 1.22 cycle to make the required changes to
futex-based locks. When not enabled, attribute the time to the sentinel
function instead.
Fixes#57071
Change-Id: I3eee0ccbfc20f333b56f20d8725dfd7f3a526b41
Reviewed-on: https://go-review.googlesource.com/c/go/+/528657
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Rhys Hiltner <rhys@justin.tv>
Reviewed-by: Than McIntosh <thanm@google.com>
This CL adds four new time histogram metrics:
/sched/pauses/stopping/gc:seconds
/sched/pauses/stopping/other:seconds
/sched/pauses/total/gc:seconds
/sched/pauses/total/other:seconds
The "stopping" metrics measure the time taken to start a stop-the-world
pause. i.e., how long it takes stopTheWorldWithSema to stop all Ps.
This can be used to detect STW struggling to preempt Ps.
The "total" metrics measure the total duration of a stop-the-world
pause, from starting to stop-the-world until the world is started again.
This includes the time spent in the "start" phase.
The "gc" metrics are used for GC-related STW pauses. The "other" metrics
are used for all other STW pauses.
All of these metrics start timing in stopTheWorldWithSema only after
successfully acquiring sched.lock, thus excluding lock contention on
sched.lock. The reasoning behind this is that while waiting on
sched.lock the world is not stopped at all (all other Ps can run), so
the impact of this contention is primarily limited to the goroutine
attempting to stop-the-world. Additionally, we already have some
visibility into sched.lock contention via contention profiles (#57071).
/sched/pauses/total/gc:seconds is conceptually equivalent to
/gc/pauses:seconds, so the latter is marked as deprecated and returns
the same histogram as the former.
In the implementation, there are a few minor differences:
* For both mark and sweep termination stops, /gc/pauses:seconds started
timing prior to calling startTheWorldWithSema, thus including lock
contention.
These details are minor enough, that I do not believe the slight change
in reporting will matter. For mark termination stops, moving timing stop
into startTheWorldWithSema does have the side effect of requiring moving
other GC metric calculations outside of the STW, as they depend on the
same end time.
Fixes#63340
Change-Id: Iacd0bab11bedab85d3dcfb982361413a7d9c0d05
Reviewed-on: https://go-review.googlesource.com/c/go/+/534161
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This change mostly implements the design described in #60773 and
includes a new scalable parser for the new trace format, available in
internal/trace/v2. I'll leave this commit message short because this is
clearly an enormous CL with a lot of detail.
This change does not hook up the new tracer into cmd/trace yet. A
follow-up CL will handle that.
For #60773.
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Change-Id: I5d2aca2cc07580ed3c76a9813ac48ec96b157de0
Reviewed-on: https://go-review.googlesource.com/c/go/+/494187
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Currently any thread that tries to get the attention of all Ps (e.g.
stopTheWorldWithSema and forEachP) ends up in a non-preemptible state
waiting to preempt another thread. Thing is, that other thread might
also be in a non-preemptible state, trying to preempt the first thread,
resulting in a deadlock.
This is a general problem, but in practice it only boils down to one
specific scenario: a thread in GC is blocked trying to preempt a
goroutine to scan its stack while that goroutine is blocked in a
non-preemptible state to get the attention of all Ps.
There's currently a hack in a few places in the runtime to move the
calling goroutine into _Gwaiting before it goes into a non-preemptible
state to preempt other threads. This lets the GC scan its stack because
the goroutine is trivially preemptible. The only restriction is that
forEachP and stopTheWorldWithSema absolutely cannot reference the
calling goroutine's stack. This is generally not necessary, so things
are good.
Anyway, to avoid exposing the details of this hack, this change creates
a safer wrapper around forEachP (and then renames it to forEachP and the
existing one to forEachPInternal) that performs the goroutine status
change, just like stopTheWorld does. We're going to need to use this
hack with forEachP in the new tracer, so this avoids propagating the
hack further and leaves it as an implementation detail.
Change-Id: I51f02e8d8e0a3172334d23787e31abefb8a129ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/533455
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
The function WriteTabs has been renamed WritePluginTable.
Change-Id: I5f04b99b91498c41121f898cb7774334a730d7b4
GitHub-Last-Rev: c98ab3f872
GitHub-Pull-Request: golang/go#63595
Reviewed-on: https://go-review.googlesource.com/c/go/+/535996
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
[This is an unmodified redo of CL 527056.]
Standard Ms set g0.stackguard1 to the same value as stackguard0 in
mstart0. For consistency, extra Ms should do the same for their g0. Do
this in needm -> callbackUpdateSystemStack.
Background: getg().stackguard1 is used as the stack guard for the stack
growth prolouge in functions marked //go:systemstack [1]. User Gs set
stackguard1 to ^uintptr(0) so that the check always fail, calling
morestackc, which throws to report a //go:systemstack function call on a
user stack.
g0 setting stackguard1 is unnecessary for this functionality. 0 would be
sufficient, as g0 is always allowed to call //go:systemstack functions.
However, since we have the check anyway, setting stackguard1 to the
actual stack bound is useful to detect actual stack overflows on g0
(though morestackc doesn't detect this case and would report a
misleading message about user stacks).
[1] cmd/internal/obj calls //go:systemstack functions AttrCFunc. This is
a holdover from when the runtime contained actual C functions. But since
CL 2275, it has simply meant "pretend this is a C function, which would
thus need to use the system stack". Hence the name morestackc. At this
point, this terminology is pretty far removed from reality and should
probably be updated to something more intuitive.
Change-Id: If315677217354465fbbfbd0d406d79be20db0cc3
Reviewed-on: https://go-review.googlesource.com/c/go/+/527716
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This reverts CL 527056.
CL 525455 breaks darwin, alpine, and android. This CL must be reverted
in order to revert that CL.
For #62440.
Change-Id: I4e1b16e384b475a605e0214ca36c918d50faa22c
Reviewed-on: https://go-review.googlesource.com/c/go/+/527316
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Standard Ms set g0.stackguard1 to the same value as stackguard0 in
mstart0. For consistency, extra Ms should do the same for their g0. Do
this in needm -> callbackUpdateSystemStack.
Background: getg().stackguard1 is used as the stack guard for the stack
growth prolouge in functions marked //go:systemstack [1]. User Gs set
stackguard1 to ^uintptr(0) so that the check always fail, calling
morestackc, which throws to report a //go:systemstack function call on a
user stack.
g0 setting stackguard1 is unnecessary for this functionality. 0 would be
sufficient, as g0 is always allowed to call //go:systemstack functions.
However, since we have the check anyway, setting stackguard1 to the
actual stack bound is useful to detect actual stack overflows on g0
(though morestackc doesn't detect this case and would report a
misleading message about user stacks).
[1] cmd/internal/obj calls //go:systemstack functions AttrCFunc. This is
a holdover from when the runtime contained actual C functions. But since
CL 2275, it has simply meant "pretend this is a C function, which would
thus need to use the system stack". Hence the name morestackc. At this
point, this terminology is pretty far removed from reality and should
probably be updated to something more intuitive.
Change-Id: I8d0e5628ce31ac6a189a7d7a4124be85aef89862
Reviewed-on: https://go-review.googlesource.com/c/go/+/527056
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
When passing pointers of Go objects from Go to C, the cgo command generate _Cgo_use(pN) for the unsafe.Pointer type arguments, so that the Go compiler will escape these object to heap.
Since the C function may callback to Go, then the Go stack might grow/shrink, that means the pointers that the C function have will be invalid.
After adding the #cgo noescape annotation for a C function, the cgo command won't generate _Cgo_use(pN), and the Go compiler won't force the object escape to heap.
After adding the #cgo nocallback annotation for a C function, which means the C function won't callback to Go, if it do callback to Go, the Go process will crash.
Fixes#56378
Change-Id: Ifdca070584e0d349c7b12276270e50089e481f7a
GitHub-Last-Rev: f1a17b08b0
GitHub-Pull-Request: golang/go#60399
Reviewed-on: https://go-review.googlesource.com/c/go/+/497837
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Bryan Mills <bcmills@google.com>
Auto-Submit: Bryan Mills <bcmills@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, the pcvalue cache is stack allocated for each operation
that needs to look up a lot of pcvalues. It's not always clear where
to put it, a lot of the time we just pass a nil cache, it doesn't get
reused across operations, and we put a surprising amount of effort
into threading these caches around.
This CL moves it to the M, where it can be long-lived and used by all
pcvalue lookups, and we don't have to carefully thread it across
operations.
This is a re-roll of CL 515276 with a fix for reentrant use of the
pcvalue cache from the signal handler.
Change-Id: Id94c0c0fb3004d1fda1b196790eebd949c621f28
Reviewed-on: https://go-review.googlesource.com/c/go/+/520063
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Change-Id: If93b6cfa5a598a5f4101c879a0cd88a194e4a6aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/518116
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Andy Pan <panjf2000@gmail.com>
The pprof mutex profile was meant to match the Google C++ (now Abseil)
mutex profiler, originally designed and implemented by Mike Burrows.
When we worked on the Go version, pjw and I missed that C++ counts the
time each thread is blocked, even if multiple threads are blocked on a
mutex. That is, if 100 threads are blocked on the same mutex for the
same 10ms, that still counts as 1000ms of contention in C++. In Go, to
date, /debug/pprof/mutex has counted that as only 10ms of contention.
If 100 goroutines are blocked on one mutex and only 1 goroutine is
blocked on another mutex, we probably do want to see the first mutex
as being more contended, so the Abseil approach is the more useful one.
This CL adopts "contention scales with number of goroutines blocked",
to better match Abseil [1]. However, it still makes sure to attribute the
time to the unlock that caused the backup, not subsequent innocent
unlocks that were affected by the congestion. In this way it still gives
more accurate profiles than Abseil does.
[1] https://github.com/abseil/abseil-cpp/blob/lts_2023_01_25/absl/synchronization/mutex.cc#L2390Fixes#61015.
Change-Id: I7eb9e706867ffa8c0abb5b26a1b448f6eba49331
Reviewed-on: https://go-review.googlesource.com/c/go/+/506415
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This is subtle and the compiler and runtime be in sync.
It is easier to develop the rest of the changes (especially when using
toolstash save/restore) if this change is separated out and done first.
Preparation for proposal #61405. The actual logic in the
compiler will be guarded by a GOEXPERIMENT, but it is
easier not to have GOEXPERIMENT-specific data structures
in the runtime, so just make the field always.
Change-Id: I7ec7049b99ae98bf0db365d42966baeec56e3774
Reviewed-on: https://go-review.googlesource.com/c/go/+/510539
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Currently, the pcvalue cache is stack allocated for each operation
that needs to look up a lot of pcvalues. It's not always clear where
to put it, a lot of the time we just pass a nil cache, it doesn't get
reused across operations, and we put a surprising amount of effort
into threading these caches around.
This CL moves it to the M, where it can be long-lived and used by all
pcvalue lookups, and we don't have to carefully thread it across
operations.
Change-Id: I675e583e0daac887c8ef77a402ba792648d96027
Reviewed-on: https://go-review.googlesource.com/c/go/+/515276
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
This CL changes deferreturn so that it never needs to invoke the
unwinder. Instead, in the unusual case that we recover into a frame
with pending open-coded defers, we now save the extra state needed to
find them in g.param.
Change-Id: Ied35f6c1063fee5b6044cc37b2bccd3f90682fe6
Reviewed-on: https://go-review.googlesource.com/c/go/+/515856
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
This CL optimizes open-coded defers in two ways:
1. It modifies local variable sorting to place all open-coded defer
closure slots in order, so that rather than requiring the metadata to
contain each offset individually, we just need a single offset to the
first slot.
2. Because the slots are in ascending order and can be directly
indexed, we can get rid of the count of how many defers are in the
frame. Instead, we just find the top set bit in the active defers
bitmask, and load the corresponding closure.
Change-Id: I6f912295a492211023a9efe12c94a14f449d86ad
Reviewed-on: https://go-review.googlesource.com/c/go/+/516199
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
These fields are no longer needed since go.dev/cl/513837.
Change-Id: I980fc9db998c293e930094bbb87e8c8f1654e39c
Reviewed-on: https://go-review.googlesource.com/c/go/+/516198
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This CL refactors gopanic, Goexit, and deferreturn to share a common
state machine for processing pending defers. The new state machine
removes a lot of redundant code and does overall less work.
It should also make it easier to implement further optimizations
(e.g., TODOs added in this CL).
Change-Id: I71d3cc8878a6f951d8633505424a191536c8e6b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/513837
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change adds traceBlockReason which leaks fewer implementation
details of the tracer to the runtime. Currently, gopark is called with
an explicit trace event, but this leaks details about trace internals
throughout the runtime.
This change will make it easier to change out the trace implementation.
Change-Id: Id633e1704d2c8838c6abd1214d9695537c4ac7db
Reviewed-on: https://go-review.googlesource.com/c/go/+/494185
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
This reapplies CL 485500, with a fix drafted in CL 492987 incorporated.
CL 485500 is reverted due to #60004 and #60007. #60004 is fixed in
CL 492743. #60007 is fixed in CL 492987 (incorporated in this CL).
[Original CL 485500 description]
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 492987 description]
On the first call into Go from a C thread, currently we set the g0
stack's high bound imprecisely based on the SP. With CL 485500, we
keep the M and don't recompute the stack bounds when it calls into
Go again. If the first call is made when the C thread uses some
deep stack, but a subsequent call is made with a shallower stack,
the SP may be above g0.stack.hi.
This is usually okay as we don't check usually stack.hi. One place
where we do check for stack.hi is in the signal handler, in
adjustSignalStack. In particular, C TSAN delivers signals on the
g0 stack (instead of the usual signal stack). If the SP is above
g0.stack.hi, we don't see it is on the g0 stack, and throws.
This CL makes it get an accurate stack upper bound with the
pthread API (on the platforms where it is available).
Also add some debug print for the "handler not on signal stack"
throw.
Fixes#51676.
Fixes#59294.
Fixes#59678.
Fixes#60007.
Change-Id: Ie51c8e81ade34ec81d69fd7bce1fe0039a470776
Reviewed-on: https://go-review.googlesource.com/c/go/+/495855
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
More tightening up of the tracer's interface.
Change-Id: I992141c7f30e5c2d5d77d1fcd6817d35bc6e5f6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/494191
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
More tightening up of the tracer's interface.
While we're here, clarify why waittraceskip isn't included by explaining
what the wait* fields in the M are really for.
Change-Id: I0e7b4cac79fb77a7a0b3ca6b6cc267668e3610bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/494190
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
More tightening up of the tracer's interface.
This increases the size of each G very slightly, which isn't great, but
we stay within the same size class, so actually memory use will be
unchanged.
Change-Id: I7d1f5798edcf437c212beb1e1a2619eab833aafb
Reviewed-on: https://go-review.googlesource.com/c/go/+/494188
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Fix comments, including duplicate is, wrong phrases and articles, misspellings, etc.
Change-Id: I8bfea53b9b275e649757cc4bee6a8a026ed9c7a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/493035
Reviewed-by: Benny Siegert <bsiegert@gmail.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: shuang cui <imcusg@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
This reverts CL 485500.
Reason for revert: This breaks internal tests at Google, see b/280861579 and b/280820455.
Change-Id: I426278d400f7611170918fc07c524cb059b9cc55
Reviewed-on: https://go-review.googlesource.com/c/go/+/492995
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Chressie Himpel <chressie@google.com>
This reapplies CL 481061, with the followup fixes in CL 482975, CL 485315, and
CL 485316 incorporated.
CL 481061, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 482975 is a followup fix to a C declaration in testprogcgo.
CL 485315 is a followup fix for x_cgo_getstackbound on Illumos.
CL 485316 is a followup cleanup for ppc64 assembly.
[Original CL 481061 description]
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
[CL 485500 description]
CL 479915 passed the G to _cgo_getstackbound for direct updates to
gp.stack.lo. A G can be reused on a new thread after the previous thread
exited. This could trigger the C TSAN race detector because it couldn't
see the synchronization in Go (lockextra) preventing the same G from
being used on multiple threads at the same time.
We work around this by passing the address of a stack variable to
_cgo_getstackbound rather than the G. The stack is generally unique per
thread, so TSAN won't see the same address from multiple threads. Even
if stacks are reused across threads by pthread, C TSAN should see the
synchonization in the stack allocator.
A regression test is added to misc/cgo/testsanitizer.
Fixes#51676.
Fixes#59294.
Fixes#59678.
Change-Id: Ic62be31a06ee83568215e875a891df37084e08ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/485500
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
This reverts CL 481061.
Reason for revert: When built with C TSAN, x_cgo_getstackbound triggers
race detection on `g->stacklo` because the synchronization is in Go,
which isn't instrumented.
For #51676.
For #59294.
For #59678.
Change-Id: I38afcda9fcffd6537582a39a5214bc23dc147d47
Reviewed-on: https://go-review.googlesource.com/c/go/+/485275
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
This reapplies CL 392854, with the followup fixes in CL 479255,
CL 479915, and CL 481057 incorporated.
CL 392854, by doujiang24 <doujiang24@gmail.com>, speed up C to Go
calls by binding the M to the C thread. See below for its
description.
CL 479255 is a followup fix for a small bug in ARM assembly code.
CL 479915 is another followup fix to address C to Go calls after
the C code uses some stack, but that CL is also buggy.
CL 481057, by Michael Knyszek, is a followup fix for a memory leak
bug of CL 479915.
[Original CL 392854 description]
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
[CL 479915 description]
Currently, when C calls into Go the first time, we grab an M
using needm, which sets m.g0's stack bounds using the SP. We don't
know how big the stack is, so we simply assume 32K. Previously,
when the Go function returns to C, we drop the M, and the next
time C calls into Go, we put a new stack bound on the g0 based on
the current SP. After CL 392854, we don't drop the M, and the next
time C calls into Go, we reuse the same g0, without recomputing
the stack bounds. If the C code uses quite a bit of stack space
before calling into Go, the SP may be well below the 32K stack
bound we assumed, so the runtime thinks the g0 stack overflows.
This CL makes needm get a more accurate stack bound from
pthread. (In some platforms this may still be a guess as we don't
know exactly where we are in the C stack), but it is probably
better than simply assuming 32K.
Fixes#51676.
Fixes#59294.
Change-Id: I9bf1400106d5c08ce621d2ed1df3a2d9e3f55494
Reviewed-on: https://go-review.googlesource.com/c/go/+/481061
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: DeJiang Zhu (doujiang) <doujiang24@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts CL 392854.
Reason for revert: caused #59294, which was derived from google
internal tests. The attempted fix of #59294 caused more breakage.
Change-Id: I5a061561ac2740856b7ecc09725ac28bd30f8bba
Reviewed-on: https://go-review.googlesource.com/c/go/+/481060
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
In a C thread, it's necessary to acquire an extra M by using needm while invoking a Go function from C. But, needm and dropm are heavy costs due to the signal-related syscalls.
So, we change to not dropm while returning back to C, which means binding the extra M to the C thread until it exits, to avoid needm and dropm on each C to Go call.
Instead, we only dropm while the C thread exits, so the extra M won't leak.
When invoking a Go function from C:
Allocate a pthread variable using pthread_key_create, only once per shared object, and register a thread-exit-time destructor.
And store the g0 of the current m into the thread-specified value of the pthread key, only once per C thread, so that the destructor will put the extra M back onto the extra M list while the C thread exits.
When returning back to C:
Skip dropm in cgocallback, when the pthread variable has been created, so that the extra M will be reused the next time invoke a Go function from C.
This is purely a performance optimization. The old version, in which needm & dropm happen on each cgo call, is still correct too, and we have to keep the old version on systems with cgo but without pthreads, like Windows.
This optimization is significant, and the specific value depends on the OS system and CPU, but in general, it can be considered as 10x faster, for a simple Go function call from a C thread.
For the newly added BenchmarkCGoInCThread, some benchmark results:
1. it's 28x faster, from 3395 ns/op to 121 ns/op, in darwin OS & Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
2. it's 6.5x faster, from 1495 ns/op to 230 ns/op, in Linux OS & Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz
Fixes#51676
Change-Id: I380702fe2f9b6b401b2d6f04b0aba990f4b9ee6c
GitHub-Last-Rev: 93dc64ad98
GitHub-Pull-Request: golang/go#51679
Reviewed-on: https://go-review.googlesource.com/c/go/+/392854
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: thepudds <thepudds1460@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This is relatively easy using the new traceback iterator.
Ancestor tracebacks are now limited to 50 frames. We could keep that
at 100, but the fact that it used 100 before seemed arbitrary and
unnecessary.
Fixes#7181
Updates #54466
Change-Id: If693045881d84848f17e568df275a5105b6f1cb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/475960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Printing is the only remaining functionality of gentraceback. Move
this into the traceback printing code and eliminate gentraceback. This
lets us simplify the logic, which fixes at least one minor bug:
previously, if inline unwinding pushed the total printed count over
_TracebackMaxFrames, we would print extra frames and then fail to
print "additional frames elided".
The cumulative performance effect of the series of changes starting
with "add a benchmark of Callers" (CL 472956) is:
goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) CPU E5-2690 v3 @ 2.60GHz
│ baseline │ unwinder │
│ sec/op │ sec/op vs base │
Callers/cached-48 1.464µ ± 1% 1.684µ ± 1% +15.03% (p=0.000 n=20)
Callers/inlined-48 1.391µ ± 1% 1.536µ ± 1% +10.42% (p=0.000 n=20)
Callers/no-cache-48 10.50µ ± 1% 11.11µ ± 0% +5.82% (p=0.000 n=20)
StackCopyPtr-48 88.74m ± 1% 81.22m ± 2% -8.48% (p=0.000 n=20)
StackCopy-48 80.90m ± 1% 70.56m ± 1% -12.78% (p=0.000 n=20)
StackCopyNoCache-48 2.458m ± 1% 2.209m ± 1% -10.15% (p=0.000 n=20)
StackCopyWithStkobj-48 26.81m ± 1% 25.66m ± 1% -4.28% (p=0.000 n=20)
geomean 518.8µ 512.9µ -1.14%
The performance impact of intermediate CLs in this sequence varies a
lot as we went through many refactorings. The slowdown in Callers
comes primarily from the introduction of unwinder because that doesn't
get inlined and results in somewhat worse code generation in code
that's extremely hot in those microbenchmarks. The performance gains
on stack copying come mostly from replacing callbacks with direct use
of the unwinder.
Updates #54466.
Fixes#32383.
Change-Id: I4970603b2861633eecec30545e852688bc7cc9a4
Reviewed-on: https://go-review.googlesource.com/c/go/+/468301
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, all stack walking logic is in one venerable, large, and
very, very complicated function: runtime.gentraceback. This function
has three distinct operating modes: printing, populating a PC buffer,
or invoking a callback. And it has three different modes of unwinding:
physical Go frames, inlined Go frames, and cgo frames. It also has
several flags. All of this logic is very interwoven.
This CL reimplements the monolithic gentraceback function as an
"unwinder" type with an iterator API. It moves all of the logic for
stack walking into this new type, and gentraceback is now a
much-simplified wrapper around the new unwinder type that still
implements printing, populating a PC buffer, and invoking a callback.
Follow-up CLs will replace uses of gentraceback with direct uses of
unwinder.
Exposing traceback functionality as an iterator API will enable a lot
of follow-up work such as simplifying the open-coded defer
implementation (which should in turn help with #26813 and #37233),
printing the bottom of deep stacks (#7181), and eliminating the small
limit on CPU stacks in profiles (#56029).
Fixes#54466.
Change-Id: I36e046dc423c9429c4f286d47162af61aff49a0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/458218
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
We've replicated the code to expand inlined frames in many places in
the runtime at this point. This CL adds a simple iterator API that
abstracts this out.
We also use this to try out a new idea for structuring tests of
runtime internals: rather than exporting this whole internal data type
and API, we write the test in package runtime and import the few bits
of std we need. The idea is that, for tests of internals, it's easier
to inject public APIs from std than it is to export non-public APIs
from runtime. This is discussed more in #55108.
For #54466.
Change-Id: Iebccc04ff59a1509694a8ac0e0d3984e49121339
Reviewed-on: https://go-review.googlesource.com/c/go/+/466096
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Austin Clements <austin@google.com>
This change adds a new GODEBUG flag called pagetrace that writes a
low-overhead trace of how pages of memory are managed by the Go runtime.
The page tracer is kept behind a GOEXPERIMENT flag due to a potential
security risk for setuid binaries.
Change-Id: I6f4a2447d02693c25214400846a5d2832ad6e5c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/444157
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Ms are allocated via standard heap allocation (`new(m)`), which means we
must keep them alive (i.e., reachable by the GC) until we are completely
done using them.
Ms are primarily reachable through runtime.allm. However, runtime.mexit
drops the M from allm fairly early, long before it is done using the M
structure. If that was the last reference to the M, it is now at risk of
being freed by the GC and used for some other allocation, leading to
memory corruption.
Ms with a Go-allocated stack coincidentally already keep a reference to
the M in sched.freem, so that the stack can be freed lazily. This
reference has the side effect of keeping this Ms reachable. However, Ms
with an OS stack skip this and are at risk of corruption.
Fix this lifetime by extending sched.freem use to all Ms, with the value
of mp.freeWait determining whether the stack needs to be freed or not.
Fixes#56243.
Change-Id: Ic0c01684775f5646970df507111c9abaac0ba52e
Reviewed-on: https://go-review.googlesource.com/c/go/+/443716
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This adds the function "start line number" to runtime._func and
runtime.inlinedCall objects. The "start line number" is the line number
of the func keyword or TEXT directive for assembly.
Subtracting the start line number from PC line number provides the
relative line offset of a PC from the the start of the function. This
helps with source stability by allowing code above the function to move
without invalidating samples within the function.
Encoding start line rather than relative lines directly is convenient
because the pprof format already contains a start line field.
This CL uses a straightforward encoding of explictly including a start
line field in every _func and inlinedCall. It is possible that we could
compress this further in the future. e.g., functions with a prologue
usually have <line of PC 0> == <start line>. In runtime.test, 95% of
functions have <line of PC 0> == <start line>.
According to bent, this is geomean +0.83% binary size vs master and
-0.31% binary size vs 1.19.
Note that //line directives can change the file and line numbers
arbitrarily. The encoded start line is as adjusted by //line directives.
Since this can change in the middle of a function, `line - start line`
offset calculations may not be meaningful if //line directives are in
use.
For #55022.
Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429638
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Extra Ms may lead to the "no consistent ordering of events possible" error when parsing trace file with cgo enabled, since:
1. The gs in the extra Ms may be in `_Gdead` status while starting trace by invoking `runtime.StartTrace`,
2. and these gs will trigger `traceEvGoSysExit` events in `runtime.exitsyscall` when invoking go functions from c,
3. then, the events of those gs are under non-consistent ordering, due to missing the previous events.
Add two events, `traceEvGoCreate` and `traceEvGoInSyscall`, in `runtime.StartTrace`, will make the trace parser happy.
Fixes#29707
Change-Id: I2fd9d1713cda22f0ddb36efe1ab351f88da10881
GitHub-Last-Rev: 7bbfddb81b
GitHub-Pull-Request: golang/go#54974
Reviewed-on: https://go-review.googlesource.com/c/go/+/429858
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: xie cui <523516579@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>
This change adds a metric to the runtime/metrics package which tracks
total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of
this metric is to be able to quickly get an idea of the total mutex wait
time.
The implementation of this metric piggybacks off of the existing G
runnable tracking infrastructure, as well as the wait reason set on a G
when it goes into _Gwaiting.
Fixes#49881.
Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517
Reviewed-on: https://go-review.googlesource.com/c/go/+/427618
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, wait reasons are set somewhat inconsistently. In a follow-up
CL, we're going to want to rely on the wait reason being there for
casgstatus, so the status quo isn't really going to work for that. Plus
this inconsistency means there are a whole bunch of cases where we could
be more specific about the G's status but aren't.
So, this change adds a new function, casGToWaiting which is like
casgstatus but also sets the wait reason. The goal is that by using this
API it'll be harder to forget to set a wait reason (or the lack thereof
will at least be explicit). This change then updates all casgstatus(gp,
..., _Gwaiting) calls to casGToWaiting(gp, ..., waitReasonX) instead.
For a number of these cases, we're missing a wait reason, and it
wouldn't hurt to add a wait reason for them, so this change also adds
those wait reasons.
For #49881.
Change-Id: Ia95e06ecb74ed17bb7bb94f1a362ebfe6bec1518
Reviewed-on: https://go-review.googlesource.com/c/go/+/427617
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change adds 3 new waitReasons that correspond to sync.Mutex.Lock,
sync.RWMutex.RLock, and sync.RWMutex.Lock that are plumbed down into
semacquire1 by exporting new functions to the sync package from the
runtime.
Currently these three functions show up as "semacquire" in backtraces
which isn't very clear, though the stack trace itself should reveal
what's really going on. This represents a minor improvement to backtrace
readability, though blocking on an RWMutex.w.Lock will still show up as
blocking on a regular mutex (I suppose technically it is).
This is a step toward helping the runtime identify when a goroutine is
blocked on a mutex of some kind.
For #49881.
Change-Id: Ia409b4d27e117fe4bfdc25fa541e9c58d6d587b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/427616
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
This changes adds a breakdown for estimated CPU usage by time. These
estimates are not based on real on-CPU counters, so each metric has a
disclaimer explaining so. They can, however, be more reasonably
compared to a total CPU time metric that this change also adds.
Fixes#47216.
Change-Id: I125006526be9f8e0d609200e193da5a78d9935be
Reviewed-on: https://go-review.googlesource.com/c/go/+/404307
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Josh MacDonald <jmacd@lightstep.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
To match _func.nameOff.
Change-Id: I75e71cadaa0f7ca8844d1b49950673797b227074
Reviewed-on: https://go-review.googlesource.com/c/go/+/428658
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Michael Pratt <mpratt@google.com>