runtime: try to elide timer stealing if P has no timers

Following golang.org/cl/259578, findrunnable still must touch every
other P in checkTimers in order to look for timers to steal. This scales
poorly with GOMAXPROCS and potentially performs poorly by pulling remote
Ps into cache.

Add timerpMask, a bitmask that tracks whether each P may have any timers
on its timer heap.

Ideally we would update this field on any timer add / remove to always
keep it up to date. Unfortunately, updating a shared global structure is
antithetical to sharding timers by P, and doing so approximately doubles
the cost of addtimer / deltimer in microbenchmarks.

Instead we only (potentially) clear the mask when the P goes idle. This
covers the best case of avoiding looking at a P _at all_ when it is idle
and has no timers. See the comment on updateTimerPMask for more details
on the trade-off. Future CLs may be able to expand cases we can avoid
looking at the timers.

Note that the addition of idlepMask to p.init is a no-op. The zero value
of the mask is the correct init value so it is not necessary, but it is
included for clarity.

Benchmark results from WakeupParallel/syscall/pair/race/1ms (see
golang.org/cl/228577). Note that these are on top of golang.org/cl/259578:

name                        old msec           new msec   delta
Perf-task-clock-8           244 ± 4%           246 ± 4%     ~     (p=0.841 n=5+5)
Perf-task-clock-16          247 ±11%           252 ± 4%     ~     (p=1.000 n=5+5)
Perf-task-clock-32          270 ± 1%           268 ± 2%     ~     (p=0.548 n=5+5)
Perf-task-clock-64          302 ± 3%           296 ± 1%     ~     (p=0.222 n=5+5)
Perf-task-clock-128         358 ± 3%           352 ± 2%     ~     (p=0.310 n=5+5)
Perf-task-clock-256         483 ± 3%           458 ± 1%   -5.16%  (p=0.008 n=5+5)
Perf-task-clock-512         663 ± 1%           612 ± 4%   -7.61%  (p=0.008 n=5+5)
Perf-task-clock-1024      1.06k ± 1%         0.95k ± 2%  -10.24%  (p=0.008 n=5+5)

Updates #28808
Updates #18237

Change-Id: I4239cd89f21ad16dfbbef58d81981da48acd0605
Reviewed-on: https://go-review.googlesource.com/c/go/+/264477
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Trust: Michael Pratt <mpratt@google.com>
This commit is contained in:
Michael Pratt 2020-10-05 18:12:35 -04:00
parent 642329fdd5
commit fc116b69e2
2 changed files with 81 additions and 23 deletions

View file

@ -1052,15 +1052,26 @@ var (
sched schedt
newprocs int32
// allpLock protects P-less reads and size changes of allp and
// idlepMask, and all writes to allp.
// allpLock protects P-less reads and size changes of allp, idlepMask,
// and timerpMask, and all writes to allp.
allpLock mutex
// len(allp) == gomaxprocs; may change at safe points, otherwise
// immutable.
allp []*p
// Bitmask of Ps in _Pidle list, one bit per P. Reads and writes must
// be atomic. Length may change at safe points.
idlepMask pIdleMask
//
// Each P must update only its own bit. In order to maintain
// consistency, a P going idle must the idle mask simultaneously with
// updates to the idle P list under the sched.lock, otherwise a racing
// pidleget may clear the mask before pidleput sets the mask,
// corrupting the bitmap.
//
// N.B., procresize takes ownership of all Ps in stopTheWorldWithSema.
idlepMask pMask
// Bitmask of Ps that may have a timer, one bit per P. Reads and writes
// must be atomic. Length may change at safe points.
timerpMask pMask
// Information about what cpu features are available.
// Packages outside the runtime should not use these