From c93cc603cd5c731d00dc019c94490edca6160841 Mon Sep 17 00:00:00 2001 From: Michael Anthony Knyszek Date: Thu, 30 Oct 2025 20:26:56 +0000 Subject: [PATCH] runtime: allow Stack to traceback goroutines in syscall _Grunning window net/http/cgi.TestCopyError calls runtime.Stack to take a stack trace of all goroutines, and searches for a specific line in that stack trace. It currently sometimes fails because it encounters the goroutine its looking for in the small window where a goroutine might be in _Grunning while in a syscall, introduced in CL 646198. In that case, the traceback will give up, failing to print the stack TestCopyError is expecting. This represents a general regression, since previously runtime.Stack could never fail to take a goroutine's stack; giving up was only possible in fatal panic cases. Fix this the same way we fixed goroutine profiles: allow the stack trace to proceed if the g's syscallsp != 0. This is safe in any stop-the-world-related context, because syscallsp won't be mutated while the goroutine fails to acquire a P, and thus fails to fully exit the syscall context. This also means the stack below syscallsp won't be mutated, and thus taking a traceback is also safe. Fixes #66639. Change-Id: Ie6f4b0661d9f8df02c9b8434e99bc95f26fe5f0d Reviewed-on: https://go-review.googlesource.com/c/go/+/716680 Reviewed-by: Michael Pratt LUCI-TryBot-Result: Go LUCI --- src/runtime/traceback.go | 11 ++++++++++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/src/runtime/traceback.go b/src/runtime/traceback.go index de62762ba76..6649f724716 100644 --- a/src/runtime/traceback.go +++ b/src/runtime/traceback.go @@ -1314,7 +1314,16 @@ func tracebacksomeothers(me *g, showf func(*g) bool) { // from a signal handler initiated during a systemstack call. // The original G is still in the running state, and we want to // print its stack. - if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning { + // + // There's a small window of time in exitsyscall where a goroutine could be + // in _Grunning as it's exiting a syscall. This could be the case even if the + // world is stopped or frozen. + // + // This is OK because the goroutine will not exit the syscall while the world + // is stopped or frozen. This is also why it's safe to check syscallsp here, + // and safe to take the goroutine's stack trace. The syscall path mutates + // syscallsp only just before exiting the syscall. + if gp.m != getg().m && readgstatus(gp)&^_Gscan == _Grunning && gp.syscallsp == 0 { print("\tgoroutine running on other thread; stack unavailable\n") printcreatedby(gp) } else {