runtime: allow Stack to traceback goroutines in syscall _Grunning window

net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of
all goroutines, and searches for a specific line in that stack trace.

It currently sometimes fails because it encounters the goroutine its
looking for in the small window where a goroutine might be in _Grunning
while in a syscall, introduced in CL 646198. In that case, the traceback
will give up, failing to print the stack TestCopyError is expecting.

This represents a general regression, since previously runtime.Stack
could never fail to take a goroutine's stack; giving up was only
possible in fatal panic cases.

Fix this the same way we fixed goroutine profiles: allow the stack trace
to proceed if the g's syscallsp != 0. This is safe in any
stop-the-world-related context, because syscallsp won't be mutated while
the goroutine fails to acquire a P, and thus fails to fully exit the
syscall context. This also means the stack below syscallsp won't be
mutated, and thus taking a traceback is also safe.

Fixes #66639.

Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d
Reviewed-on: https://go-review.googlesource.com/c/go/+/716680
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This commit is contained in:
Michael Anthony Knyszek 2025-10-30 20:26:56 +00:00 committed by Michael Knyszek
parent b5353fd90a
commit c93cc603cd

View file

@ -1314,7 +1314,16 @@ func tracebacksomeothers(me *g, showf func(*g) bool) {
// from a signal handler initiated during a systemstack call.
// The original G is still in the running state, and we want to
// print its stack.
if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning {
//
// There's a small window of time in exitsyscall where a goroutine could be
// in _Grunning as it's exiting a syscall. This could be the case even if the
// world is stopped or frozen.
//
// This is OK because the goroutine will not exit the syscall while the world
// is stopped or frozen. This is also why it's safe to check syscallsp here,
// and safe to take the goroutine's stack trace. The syscall path mutates
// syscallsp only just before exiting the syscall.
if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning && gp.syscallsp == 0 {
print("\tgoroutine running on other thread; stack unavailable\n")
printcreatedby(gp)
} else {