runtime: optimistically CAS atomicstatus directly in enter/exitsyscall

This change steals the performance trick from the coro implementation to
try to do the CAS directly first before calling into casgstatus, a much
more heavyweight function. We have to be careful about synctest
bubbling, but overall it's a good bit faster, and easy low-hanging
fruit.

goos: linux
goarch: amd64
pkg: internal/runtime/cgobench
cpu: AMD EPYC 7B13
           │ after-2-2.out │            after-3.out             │
           │    sec/op     │   sec/op     vs base               │
CgoCall-64     34.62n ± 1%   30.55n ± 1%  -11.76% (p=0.002 n=6)

Change-Id: Ic38620233b55f58b8a07510666aa18648373e2e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/708596
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This commit is contained in:
Michael Anthony Knyszek 2025-10-02 17:16:49 +00:00 committed by Gopher Robot
parent 5b8e850340
commit 8683bb846d

View file

@ -4622,7 +4622,12 @@ func reentersyscall(pc, sp, bp uintptr) {
}
// As soon as we switch to _Gsyscall, we are in danger of losing our P.
// We must not touch it after this point.
casgstatus(gp, _Grunning, _Gsyscall)
//
// Try to do a quick CAS to avoid calling into casgstatus in the common case.
// If we have a bubble, we need to fall into casgstatus.
if gp.bubble != nil || !gp.atomicstatus.CompareAndSwap(_Grunning, _Gsyscall) {
casgstatus(gp, _Grunning, _Gsyscall)
}
if staticLockRanking {
// casgstatus clobbers gp.sched via systemstack under staticLockRanking. Restore it.
save(pc, sp, bp)
@ -4825,7 +4830,12 @@ func exitsyscall() {
// need to be held ahead of time. We're effectively atomic with respect to
// the tracer because we're non-preemptible and in the runtime. It can't stop
// us to read a bad status.
casgstatus(gp, _Gsyscall, _Grunning)
//
// Try to do a quick CAS to avoid calling into casgstatus in the common case.
// If we have a bubble, we need to fall into casgstatus.
if gp.bubble != nil || !gp.atomicstatus.CompareAndSwap(_Gsyscall, _Grunning) {
casgstatus(gp, _Gsyscall, _Grunning)
}
// Caution: we're in a window where we may be in _Grunning without a P.
// Either we will grab a P or call exitsyscall0, where we'll switch to