Commit graph

324 commits

Author SHA1 Message Date
Cuong Manh Le
5b0ce94c07 runtime: convert g.parkingOnChan to atomic type
Updates #53821

Change-Id: I54de39b984984fb3c160aba5afacb90131fd47c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/424394
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-17 16:26:22 +00:00
Cuong Manh Le
8adc63b3eb Revert "runtime/trace: add missing events for the locked g in extra M."
This reverts commit ea9c3fd42d.

Reason for revert: break linux/ricsv64, openbsd/arm, illumos/amd64 builders

Change-Id: I98479a8f63e76eed89a0e8846acf2c73e8441377
Reviewed-on: https://go-review.googlesource.com/c/go/+/423437
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-15 15:22:28 +00:00
doujiang24
ea9c3fd42d runtime/trace: add missing events for the locked g in extra M.
Extra Ms may lead to the "no consistent ordering of events possible" error when parsing trace file with cgo enabled, since:
1. The gs in the extra Ms may be in `_Gdead` status while starting trace by invoking `runtime.StartTrace`,
2. and these gs will trigger `traceEvGoSysExit` events in `runtime.exitsyscall` when invoking go functions from c,
3. then, the events of those gs are under non-consistent ordering, due to missing the previous events.

Add two events, `traceEvGoCreate` and `traceEvGoInSyscall`, in `runtime.StartTrace`, will make the trace parser happy.

Fixes #29707

Change-Id: I7cc4b80822d2c46591304a59c9da2c9fc470f1d0
GitHub-Last-Rev: 445de8eaf3
GitHub-Pull-Request: golang/go#53284
Reviewed-on: https://go-review.googlesource.com/c/go/+/411034
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-08-12 18:26:31 +00:00
Cuong Manh Le
a2af095699 runtime: run "gofmt -s -w"
Change-Id: I7eb3de35d1f1f0237962735450b37d738966f30c
Reviewed-on: https://go-review.googlesource.com/c/go/+/423254
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-08-12 15:45:17 +00:00
Michael Pratt
8cb350d69a runtime: synchronize P wakeup and dropping Ps
CL 310850 dropped work re-checks on non-spinning Ms to fix #43997.

This introduced a new race condition: a non-spinning M may drop its P
and then park at the same time a spinning M attempts to wake a P to
handle some new work. The spinning M fails to find an idle P (because
the non-spinning M hasn't quite made its P idle yet), and does nothing
assuming that the system is fully loaded. This results in loss of work
conservation. In the worst case we could have a complete deadlock if
injectglist fails to wake anything just as all Ps are going idle.

sched.needspinning adds new synchronization to cover this case. If work
submission fails to find a P, it sets needspinning to indicate that a
spinning M is required. When non-spinning Ms prepare to drop their P,
they check needspinning and abort going idle to become a spinning M
instead. This addresses the race without extra spurious wakeups. In the
normal (non-racing case), an M will become spinning via the normal path
and clear the flag.

injectglist must change in addition to wakep because it is a similar
form of work submission, notably used following netpoll at a point when
we might not have a P that would guarantee the work runs.

Fixes #45867

Change-Id: Ieb623a6d4162fb8c2be7b4ff8acdebcc3a0d69a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/389014
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
2022-08-12 03:22:59 +00:00
Michael Pratt
b04e4637db runtime: convert timeHistogram to atomic types
I've dropped the note that sched.timeToRun is protected by sched.lock,
as it does not seem to be true.

For #53821.

Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/419449
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-12 01:53:11 +00:00
Michael Pratt
09cc9bac72 runtime: convert schedt.sysmonwait to atomic type
This converts a few unsynchronized accesses.

For #53821.

Change-Id: Ie2728779111e3e042696f15648981c5d5a86ca6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/419448
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2022-08-12 01:52:46 +00:00
Michael Pratt
88ef50e662 runtime: convert schedt.gcwaiting to atomic type
Note that this replaces numerous unsynchronized loads throughout the
scheduler.

For #53821.

Change-Id: Ica80b04c9e8c184bfef186e549526fc3f117c387
Reviewed-on: https://go-review.googlesource.com/c/go/+/419447
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-12 01:52:29 +00:00
Michael Pratt
1eeef5d5b4 runtime: convert schedt.nmspinning to atomic type
Note that this converts nmspinning from uint32 to int32 for consistency
with the other count fields in schedt.

For #53821.

Change-Id: Ia6ca7a2b476128eda3b68e9f0c7775ae66c0c744
Reviewed-on: https://go-review.googlesource.com/c/go/+/419446
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2022-08-12 01:52:18 +00:00
Michael Pratt
58a1dabfd1 runtime: convert schedt.npidle to atomic type
Note that this converts npidle from uint32 to int32 for consistency with
the other count fields in schedt and the type of p.id.

Note that this changes previously unsynchronized operations to
synchronized operations in:
  * handoffp
  * injectglist
  * schedtrace
  * schedEnableUser
  * sync_runtime_canSpin

For #53821.

Change-Id: I36d1b3b4a28131c9d47884fade6bc44439dd6937
Reviewed-on: https://go-review.googlesource.com/c/go/+/419445
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-12 01:52:06 +00:00
Michael Pratt
449691b3ef runtime: convert schedt.ngsys to atomic type
Note that this converts ngsys from uint32 to int32 to match the other
(non-atomic) counters.

For #53821.

Change-Id: I3acbfbbd1dabc59b0ea5ddc86a97e0d0afa9f80c
Reviewed-on: https://go-review.googlesource.com/c/go/+/419444
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2022-08-12 01:51:51 +00:00
Michael Pratt
0fc774a68f runtime: convert schedt.pollUntil to atomic type
Note that this converts pollUntil from uint64 to int64, the type used by
nanotime().

For #53821.

Change-Id: Iec9ec7e09d3350552561d0708ba6ea9e8a8ae7ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/419443
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-12 01:49:13 +00:00
Michael Pratt
5d7d50111f runtime: convert schedt.lastpoll to atomic type
Note that this changes the type from uint64 to int64, the type used by
nanotime(). It also adds an atomic load in pollWork(), which used to use
a non-atomic load.

For #53821.

Change-Id: I6173c90f20bfdc0e0a4bc3a7b1c798d1c429fff5
Reviewed-on: https://go-review.googlesource.com/c/go/+/419442
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-12 01:49:02 +00:00
Michael Pratt
dd8cb66d0b runtime: convert g.goid to uint64
schedt.goidgen and p.goidcache are already uint64, this makes all cases
consistent.

The only oddball here is schedtrace which prints -1 as an equivalent for
N/A or nil. A future CL will make this more explicit.

Change-Id: I489626f3232799f6ca333d0d103b71d9d3aa7494
Reviewed-on: https://go-review.googlesource.com/c/go/+/419440
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-12 01:39:28 +00:00
Michael Pratt
b464708b46 runtime: convert schedt.goidgen to atomic type
For #53821.

Change-Id: I84c96ade5982b8e68d1d1787bf1bfa16a17a4fb4
Reviewed-on: https://go-review.googlesource.com/c/go/+/419439
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-12 01:39:14 +00:00
Michael Anthony Knyszek
d7941030c9 runtime: only use CPU time from the current window in the GC CPU limiter
Currently the GC CPU limiter consumes CPU time from a few pools, but
because the events that flush to those pools may overlap, rather than be
strictly contained within, the update window for the GC CPU limiter, the
limiter's accounting is ultimately sloppy.

This sloppiness complicates accounting for idle time more completely,
and makes reasoning about the transient behavior of the GC CPU limiter
much more difficult.

To remedy this, this CL adds a field to the P struct that tracks the
start time of any in-flight event the limiter might care about, along
with information about the nature of that event. This timestamp is
managed atomically so that the GC CPU limiter can come in and perform a
read of the partial CPU time consumed by a given event. The limiter also
updates the timestamp so that only what's left over is flushed by the
event itself when it completes.

The end result of this change is that, since the GC CPU limiter is aware
of all past completed events, and all in-flight events, it can much more
accurately collect the CPU time of events since the last update. There's
still the possibility for skew, but any leftover time will be captured
in the following update, and the magnitude of this leftover time is
effectively bounded by the update period of the GC CPU limiter, which is
much easier to consider.

One caveat of managing this timestamp-type combo atomically is that they
need to be packed in 64 bits. So, this CL gives up the top 3 bits of the
timestamp and places the type information there. What this means is we
effectively have only a 61-bit resolution timestamp. This is fine when
the top 3 bits are the same between calls to nanotime, but becomes a
problem on boundaries when those 3 bits change. These cases may cause
hiccups in the GC CPU limiter by not accounting for some source of CPU
time correctly, but with 61 bits of resolution this should be extremely
rare. The rate of update is on the order of milliseconds, so at worst
the runtime will be off of any given measurement by only a few
CPU-milliseconds (and this is directly bounded by the rate of update).
We're probably more inaccurate from the fact that we don't measure real
CPU time but only approximate it.

For #52890.

Change-Id: I347f30ac9e2ba6061806c21dfe0193ef2ab3bbe9
Reviewed-on: https://go-review.googlesource.com/c/go/+/410120
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-06-03 20:16:15 +00:00
Keith Randall
016d755213 runtime: measure stack usage; start stacks larger if needed
Measure the average stack size used by goroutines at every GC. When
starting a new goroutine, allocate an initial goroutine stack of that
average size. Intuition is that we'll waste at most 2x in stack space
because only half the goroutines can be below average. In turn, we
avoid some of the early stack growth / copying needed in the average
case.

More details in the design doc at: https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing

name        old time/op  new time/op  delta
Issue18138  95.3µs ± 0%  67.3µs ±13%  -29.35%  (p=0.000 n=9+10)

Fixes #18138

Change-Id: Iba34d22ed04279da7e718bbd569bbf2734922eaa
Reviewed-on: https://go-review.googlesource.com/c/go/+/345889
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2022-05-12 22:32:42 +00:00
Rhys Hiltner
52bd1c4d6c runtime: decrease STW pause for goroutine profile
The goroutine profile needs to stop the world to get a consistent
snapshot of all goroutines in the app. Leaving the world stopped while
iterating over allgs leads to a pause proportional to the number of
goroutines in the app (or its high-water mark).

Instead, do only a fixed amount of bookkeeping while the world is
stopped. Install a barrier so the scheduler confirms that a goroutine
appears in the profile, with its stack recorded exactly as it was during
the stop-the-world pause, before it allows that goroutine to execute.
Iterate over allgs while the app resumes normal operations, adding each
to the profile unless they've been scheduled in the meantime (and so
have profiled themselves). Stop the world a second time to remove the
barrier and do a fixed amount of cleanup work.

This increases both the fixed overhead and per-goroutine CPU-time cost
of GoroutineProfile. It also increases the wall-clock latency of the
call to GoroutineProfile, since the scheduler may interrupt it to
execute other goroutines.

    name                                  old time/op    new time/op    delta
    GoroutineProfile/small/loaded-8         1.05ms ± 5%    4.99ms ±31%   +376.85%  (p=0.000 n=10+9)
    GoroutineProfile/sparse/loaded-8        1.04ms ± 4%    3.61ms ±27%   +246.61%  (p=0.000 n=10+10)
    GoroutineProfile/large/loaded-8         7.69ms ±17%   20.35ms ± 4%   +164.50%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle              958µs ± 0%    1820µs ±23%    +89.91%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8          1.00ms ± 3%    1.52ms ±17%    +51.18%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8           1.01ms ± 4%    1.47ms ± 7%    +45.28%  (p=0.000 n=9+9)
    GoroutineProfile/sparse/idle             980µs ± 1%    1403µs ± 2%    +43.22%  (p=0.000 n=9+10)
    GoroutineProfile/large/idle-8           7.19ms ± 8%    8.43ms ±21%    +17.22%  (p=0.011 n=10+10)
    PingPongHog                              511ns ± 8%     585ns ± 9%    +14.39%  (p=0.000 n=10+10)
    GoroutineProfile/large/idle             6.71ms ± 0%    7.58ms ± 3%    +13.08%  (p=0.000 n=8+10)
    PingPongHog-8                            469ns ± 8%     509ns ±12%     +8.62%  (p=0.010 n=9+10)
    WakeupParallelSyscall/5µs                216µs ± 4%     229µs ± 3%     +6.06%  (p=0.000 n=10+9)
    WakeupParallelSyscall/5µs-8              147µs ± 1%     149µs ± 2%     +1.12%  (p=0.009 n=10+10)
    WakeupParallelSyscall/2µs-8              140µs ± 0%     142µs ± 1%     +1.11%  (p=0.001 n=10+9)
    WakeupParallelSyscall/50µs-8             236µs ± 0%     238µs ± 1%     +1.08%  (p=0.000 n=9+10)
    WakeupParallelSyscall/1µs-8              138µs ± 0%     140µs ± 1%     +1.05%  (p=0.013 n=10+9)
    Matmult                                 8.52ns ± 1%    8.61ns ± 0%     +0.98%  (p=0.002 n=10+8)
    WakeupParallelSyscall/10µs-8             157µs ± 1%     158µs ± 1%     +0.58%  (p=0.003 n=10+8)
    CreateGoroutinesSingle-8                 328ns ± 0%     330ns ± 1%     +0.57%  (p=0.000 n=9+9)
    WakeupParallelSpinning/100µs-8           343µs ± 0%     344µs ± 1%     +0.30%  (p=0.015 n=8+8)
    WakeupParallelSyscall/20µs-8             178µs ± 0%     178µs ± 0%     +0.18%  (p=0.043 n=10+9)
    StackGrowthDeep-8                       22.8µs ± 0%    22.9µs ± 0%     +0.12%  (p=0.006 n=10+10)
    StackGrowth                             1.06µs ± 0%    1.06µs ± 0%     +0.09%  (p=0.000 n=8+9)
    WakeupParallelSpinning/0s               10.7µs ± 0%    10.7µs ± 0%     +0.08%  (p=0.000 n=9+9)
    WakeupParallelSpinning/5µs              30.7µs ± 0%    30.7µs ± 0%     +0.04%  (p=0.000 n=10+10)
    WakeupParallelSpinning/100µs             411µs ± 0%     411µs ± 0%     +0.03%  (p=0.000 n=10+9)
    WakeupParallelSpinning/2µs              18.7µs ± 0%    18.7µs ± 0%     +0.02%  (p=0.026 n=10+10)
    WakeupParallelSpinning/20µs-8           93.0µs ± 0%    93.0µs ± 0%     +0.01%  (p=0.021 n=9+10)
    StackGrowth-8                            216ns ± 0%     216ns ± 0%       ~     (p=0.209 n=10+10)
    CreateGoroutinesParallel-8              49.5ns ± 2%    49.3ns ± 1%       ~     (p=0.591 n=10+10)
    CreateGoroutinesSingle                   699ns ±20%     748ns ±19%       ~     (p=0.353 n=10+10)
    WakeupParallelSpinning/0s-8             15.9µs ± 2%    16.0µs ± 3%       ~     (p=0.315 n=10+10)
    WakeupParallelSpinning/1µs              14.6µs ± 0%    14.6µs ± 0%       ~     (p=0.513 n=10+10)
    WakeupParallelSpinning/2µs-8            24.2µs ± 3%    24.1µs ± 2%       ~     (p=0.971 n=10+10)
    WakeupParallelSpinning/10µs             50.7µs ± 0%    50.7µs ± 0%       ~     (p=0.101 n=10+10)
    WakeupParallelSpinning/20µs             90.7µs ± 0%    90.7µs ± 0%       ~     (p=0.898 n=10+10)
    WakeupParallelSpinning/50µs              211µs ± 0%     211µs ± 0%       ~     (p=0.382 n=10+10)
    WakeupParallelSyscall/0s-8               137µs ± 1%     138µs ± 0%       ~     (p=0.075 n=10+10)
    WakeupParallelSyscall/1µs                216µs ± 1%     219µs ± 3%       ~     (p=0.065 n=10+9)
    WakeupParallelSyscall/2µs                214µs ± 7%     219µs ± 1%       ~     (p=0.101 n=10+8)
    WakeupParallelSyscall/50µs               317µs ± 5%     326µs ± 4%       ~     (p=0.123 n=10+10)
    WakeupParallelSyscall/100µs              450µs ± 9%     459µs ± 8%       ~     (p=0.247 n=10+10)
    WakeupParallelSyscall/100µs-8            337µs ± 0%     338µs ± 1%       ~     (p=0.089 n=10+10)
    WakeupParallelSpinning/5µs-8            32.2µs ± 0%    32.2µs ± 0%     -0.05%  (p=0.026 n=9+10)
    WakeupParallelSpinning/50µs-8            216µs ± 0%     216µs ± 0%     -0.12%  (p=0.004 n=10+10)
    WakeupParallelSpinning/1µs-8            20.6µs ± 0%    20.5µs ± 0%     -0.22%  (p=0.014 n=10+10)
    WakeupParallelSpinning/10µs-8           54.5µs ± 0%    54.2µs ± 1%     -0.57%  (p=0.000 n=10+10)
    CreateGoroutines-8                       213ns ± 1%     211ns ± 1%     -0.86%  (p=0.002 n=10+10)
    CreateGoroutinesCapture                 1.03µs ± 0%    1.02µs ± 0%     -0.91%  (p=0.000 n=10+10)
    CreateGoroutinesCapture-8               1.32µs ± 1%    1.31µs ± 1%     -1.06%  (p=0.001 n=10+9)
    CreateGoroutines                         188ns ± 0%     186ns ± 0%     -1.06%  (p=0.000 n=9+10)
    CreateGoroutinesParallel                 188ns ± 0%     186ns ± 0%     -1.27%  (p=0.000 n=8+10)
    WakeupParallelSyscall/0s                 210µs ± 3%     207µs ± 3%     -1.60%  (p=0.043 n=10+10)
    StackGrowthDeep                          121µs ± 1%     119µs ± 1%     -1.70%  (p=0.000 n=9+10)
    Matmult-8                               1.82ns ± 3%    1.78ns ± 3%     -2.16%  (p=0.020 n=10+10)
    WakeupParallelSyscall/20µs               281µs ± 3%     269µs ± 4%     -4.44%  (p=0.000 n=10+10)
    WakeupParallelSyscall/10µs               239µs ± 3%     228µs ± 9%     -4.70%  (p=0.001 n=10+10)
    GoroutineProfile/sparse-nil/idle-8       485µs ± 2%      12µs ± 4%    -97.56%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle-8        484µs ± 2%      12µs ± 1%    -97.60%  (p=0.000 n=10+7)
    GoroutineProfile/small-nil/loaded-8      487µs ± 2%      11µs ± 3%    -97.68%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/loaded-8     507µs ± 4%      11µs ± 6%    -97.78%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/idle-8        709µs ± 2%      11µs ± 4%    -98.38%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8      717µs ± 2%      11µs ± 3%    -98.43%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle         465µs ± 3%       1µs ± 1%    -99.84%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle          493µs ± 3%       1µs ± 0%    -99.85%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle          716µs ± 1%       1µs ± 2%    -99.89%  (p=0.000 n=7+10)

    name                                  old alloc/op   new alloc/op   delta
    CreateGoroutinesCapture                   144B ± 0%      144B ± 0%       ~     (all equal)
    CreateGoroutinesCapture-8                 144B ± 0%      144B ± 0%       ~     (all equal)

    name                                  old allocs/op  new allocs/op  delta
    CreateGoroutinesCapture                   5.00 ± 0%      5.00 ± 0%       ~     (all equal)
    CreateGoroutinesCapture-8                 5.00 ± 0%      5.00 ± 0%       ~     (all equal)

    name                                  old p50-ns     new p50-ns     delta
    GoroutineProfile/small/loaded-8          1.01M ± 3%     3.87M ±45%   +282.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/loaded-8         1.02M ± 3%     2.43M ±41%   +138.42%  (p=0.000 n=10+10)
    GoroutineProfile/large/loaded-8          7.43M ±16%    17.28M ± 2%   +132.43%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle               956k ± 0%     1559k ±16%    +63.03%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8            1.01M ± 3%     1.45M ± 7%    +44.31%  (p=0.000 n=10+9)
    GoroutineProfile/sparse/idle              977k ± 1%     1399k ± 2%    +43.20%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8           1.00M ± 3%     1.41M ± 3%    +40.47%  (p=0.000 n=10+10)
    GoroutineProfile/large/idle-8            6.97M ± 1%     8.41M ±25%    +20.54%  (p=0.003 n=8+10)
    GoroutineProfile/large/idle              6.71M ± 1%     7.46M ± 4%    +11.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle-8        483k ± 3%       13k ± 3%    -97.41%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/idle-8         483k ± 2%       12k ± 1%    -97.43%  (p=0.000 n=10+8)
    GoroutineProfile/small-nil/loaded-8       484k ± 3%       10k ± 2%    -97.93%  (p=0.000 n=10+8)
    GoroutineProfile/sparse-nil/loaded-8      492k ± 2%       10k ± 4%    -97.97%  (p=0.000 n=10+8)
    GoroutineProfile/large-nil/idle-8         708k ± 2%       12k ±15%    -98.36%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8       714k ± 2%       10k ± 2%    -98.60%  (p=0.000 n=10+8)
    GoroutineProfile/sparse-nil/idle          459k ± 1%        1k ± 1%    -99.85%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle           477k ± 1%        1k ± 0%    -99.85%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle           712k ± 1%        1k ± 1%    -99.90%  (p=0.000 n=7+10)

    name                                  old p90-ns     new p90-ns     delta
    GoroutineProfile/small/loaded-8          1.13M ±10%     7.49M ±35%   +562.07%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/loaded-8         1.10M ±12%     4.58M ±31%   +318.02%  (p=0.000 n=10+9)
    GoroutineProfile/large/loaded-8          8.78M ±24%    27.83M ± 2%   +217.00%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle               967k ± 0%     2909k ±50%   +200.91%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8           1.02M ± 3%     1.96M ±76%    +92.99%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8            1.07M ±17%     1.55M ±12%    +45.23%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle              992k ± 1%     1417k ± 3%    +42.79%  (p=0.000 n=9+10)
    GoroutineProfile/large/idle              6.73M ± 0%     7.99M ± 8%    +18.80%  (p=0.000 n=8+10)
    GoroutineProfile/large/idle-8            8.20M ±25%     9.18M ±25%       ~     (p=0.315 n=10+10)
    GoroutineProfile/sparse-nil/idle-8        495k ± 3%       13k ± 1%    -97.36%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/idle-8         494k ± 2%       13k ± 3%    -97.36%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/loaded-8       496k ± 2%       13k ± 1%    -97.41%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/loaded-8      544k ±11%       13k ± 1%    -97.62%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle-8         724k ± 1%       13k ± 3%    -98.20%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8       729k ± 3%       13k ± 2%    -98.23%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle          476k ± 4%        1k ± 1%    -99.85%  (p=0.000 n=9+10)
    GoroutineProfile/small-nil/idle           537k ±10%        1k ± 0%    -99.87%  (p=0.000 n=10+9)
    GoroutineProfile/large-nil/idle           729k ± 0%        1k ± 1%    -99.90%  (p=0.000 n=7+10)

    name                                  old p99-ns     new p99-ns     delta
    GoroutineProfile/sparse/loaded-8         1.27M ±33%    20.49M ±17%  +1514.61%  (p=0.000 n=10+10)
    GoroutineProfile/small/loaded-8          1.37M ±29%    20.48M ±23%  +1399.35%  (p=0.000 n=10+10)
    GoroutineProfile/large/loaded-8          9.76M ±23%    39.98M ±22%   +309.52%  (p=0.000 n=10+8)
    GoroutineProfile/small/idle               976k ± 1%     3367k ±55%   +244.94%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle-8           1.03M ± 3%     2.50M ±65%   +142.30%  (p=0.000 n=10+10)
    GoroutineProfile/small/idle-8            1.17M ±34%     1.70M ±14%    +45.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse/idle             1.02M ± 3%     1.45M ± 4%    +42.64%  (p=0.000 n=9+10)
    GoroutineProfile/large/idle              6.92M ± 2%     9.00M ± 7%    +29.98%  (p=0.000 n=8+9)
    GoroutineProfile/large/idle-8            8.74M ±23%     9.90M ±24%       ~     (p=0.190 n=10+10)
    GoroutineProfile/sparse-nil/idle-8        508k ± 4%       16k ± 2%    -96.90%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/idle-8         508k ± 4%       16k ± 3%    -96.91%  (p=0.000 n=10+9)
    GoroutineProfile/small-nil/loaded-8       542k ± 5%       15k ±15%    -97.15%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/loaded-8      649k ±16%       15k ±18%    -97.67%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/idle-8         738k ± 2%       16k ± 2%    -97.86%  (p=0.000 n=10+10)
    GoroutineProfile/large-nil/loaded-8       765k ± 4%       15k ±17%    -98.03%  (p=0.000 n=10+10)
    GoroutineProfile/sparse-nil/idle          539k ±26%        1k ±17%    -99.84%  (p=0.000 n=10+10)
    GoroutineProfile/small-nil/idle           659k ±25%        1k ± 0%    -99.84%  (p=0.000 n=10+8)
    GoroutineProfile/large-nil/idle           760k ± 2%        1k ±22%    -99.88%  (p=0.000 n=9+10)

Fixes #33250
For #50794

Change-Id: I862a2bc4e991cec485f21a6fce4fca84f2c6435b
Reviewed-on: https://go-review.googlesource.com/c/go/+/387415
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Rhys Hiltner <rhys@justin.tv>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 20:49:34 +00:00
Michael Pratt
4289bd365c runtime: simply user throws, expand runtime throws
This gives explicit names to the possible states of throwing (-1, 0, 1).

m.throwing is now one of:

throwTypeOff: not throwing, previously == 0
throwTypeUser: user throw, previously == -1
throwTypeRuntime: runtime throw, previously == 1

For runtime throws, we now always include frame metadata and system
goroutines regardless of GOTRACEBACK to aid in debugging the runtime.

For user throws, we no longer include frame metadata or runtime frames,
unless GOTRACEBACK=system or higher.

For #51485.

Change-Id: If252e2377a0b6385ce7756b937929be4273a56c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/390421
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2022-04-28 17:14:41 +00:00
Keith Randall
ea7e3e3c0f runtime: align m.procid to 8 bytes on 32-bit systems
https://go-review.googlesource.com/c/go/+/383434 started using
atomic Load64 on this field, which breaks 32 bit platforms which
require 64-bit alignment of uint64s that are passed to atomic operations.

Not sure why this doesn't break everywhere, but I saw it break on
my laptop during all.bash.

Change-Id: Ida27b23068b3cc7208fce3c97b69a464ccf68209
Reviewed-on: https://go-review.googlesource.com/c/go/+/399754
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2022-04-11 22:49:56 +00:00
Russ Cox
9839668b56 all: separate doc comment from //go: directives
A future change to gofmt will rewrite

	// Doc comment.
	//go:foo

to

	// Doc comment.
	//
	//go:foo

Apply that change preemptively to all comments (not necessarily just doc comments).

For #51082.

Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9
Reviewed-on: https://go-review.googlesource.com/c/go/+/384260
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-05 17:54:15 +00:00
Russ Cox
7d87ccc860 all: fix various doc comment formatting nits
A run of lines that are indented with any number of spaces or tabs
format as a <pre> block. This commit fixes various doc comments
that format badly according to that (standard) rule.

For example, consider:

	// - List item.
	//   Second line.
	// - Another item.

Because the - lines are unindented, this is actually two paragraphs
separated by a one-line <pre> block. This CL rewrites it to:

	//  - List item.
	//    Second line.
	//  - Another item.

Today, that will format as a single <pre> block.
In a future release, we hope to format it as a bulleted list.

Various other minor fixes as well, all in preparation for reformatting.

For #51082.

Change-Id: I95cf06040d4186830e571cd50148be3bf8daf189
Reviewed-on: https://go-review.googlesource.com/c/go/+/384257
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-01 18:18:01 +00:00
Romanos Skiadas
b2643c6739 runtime: update framepointer_enabled doc
Change-Id: I69e64ebf8c11145ce32aa4c11178e3a47d22fb84
Reviewed-on: https://go-review.googlesource.com/c/go/+/394915
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
2022-03-24 15:51:28 +00:00
Michael Pratt
0a5fae2a0e runtime, syscall: reimplement AllThreadsSyscall using only signals.
In issue 50113, we see that a thread blocked in a system call can result
in a hang of AllThreadsSyscall. To resolve this, we must send a signal
to these threads to knock them out of the system call long enough to run
the per-thread syscall.

Stepping back, if we need to send signals anyway, it should be possible
to implement this entire mechanism on top of signals. This CL does so,
vastly simplifying the mechanism, both as a direct result of
newly-unnecessary code as well as some ancillary simplifications to make
things simpler to follow.

Major changes:

* The rest of the mechanism is moved to os_linux.go, with fields in mOS
  instead of m itself.
* 'Fixup' fields and functions are renamed to 'perThreadSyscall' so they
  are more precise about their purpose.
* Rather than getting passed a closure, doAllThreadsSyscall takes the
  syscall number and arguments. This avoids a lot of hairy behavior:
    * The closure may potentially only be live in fields in the M,
      hidden from the GC. Not necessary with no closure.
    * The need to loan out the race context. A direct RawSyscall6 call
      does not require any race context.
    * The closure previously conditionally panicked in strange
      locations, like a signal handler. Now we simply throw.
* All manual fixup synchronization with mPark, sysmon, templateThread,
  sigqueue, etc is gone. The core approach is much simpler:
  doAllThreadsSyscall sends a signal to every thread in allm, which
  executes the system call from the signal handler. We use (SIGRTMIN +
  1), aka SIGSETXID, the same signal used by glibc for this purpose. As
  such, we are careful to only handle this signal on non-cgo binaries.

Synchronization with thread creation is a key part of this CL. The
comment near the top of doAllThreadsSyscall describes the required
synchronization semantics and how they are achieved.

Note that current use of allocmLock protects the state mutations of allm
that are also protected by sched.lock. allocmLock is used instead of
sched.lock simply to avoid holding sched.lock for so long.

Fixes #50113

Change-Id: Ic7ea856dc66cf711731540a54996e08fc986ce84
Reviewed-on: https://go-review.googlesource.com/c/go/+/383434
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-15 15:40:35 +00:00
Ian Lance Taylor
53d6a725f8 runtime: update _defer comment to not mention freedefer
CL 339669 changed freedefer to not mention every field of _defer,
so no need to call it out in the _defer comment.

Change-Id: Id8b67ba2298685f609bf901b5948fd30666bd6e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/382251
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2022-02-02 21:12:39 +00:00
Russ Cox
17b2fb1b65 runtime: fix net poll races
The netpoll code was written long ago, when the
only multiprocessors that Go ran on were x86.
It assumed that an atomic store would trigger a
full memory barrier and then used that barrier
to order otherwise racy access to a handful of fields,
including pollDesc.closing.

On ARM64, this code has finally failed, because
the atomic store is on a value completely unrelated
to any of the racily-accessed fields, and the ARMv8
hardware, unlike x86, is clever enough not to do a
full memory barrier for a simple atomic store.
We are seeing a constant background rate of trybot
failures where the net/http tests deadlock - a netpollblock
has clearly happened after the pollDesc has begun to close.

The code that does the racy reads is netpollcheckerr,
which needs to be able to run without acquiring a lock.
This CL fixes the race, without introducing unnecessary
inefficiency or deadlock, by arranging for every updater
of the relevant fields to publish a summary as a single
atomic uint32, and then having netpollcheckerr use a
single atomic load to fetch the relevant bits and then
proceed as before.

Fixes #45211 (until proven otherwise!).

Change-Id: Ib6788c8da4d00b7bda84d55ca3fdffb5a64c1a0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/378234
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Trust: Bryan Mills <bcmills@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2022-01-14 17:49:58 +00:00
Russ Cox
2580d0e08d all: gofmt -w -r 'interface{} -> any' src
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.

Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.

A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.

Fixes #49884.

Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2021-12-13 18:45:54 +00:00
Michael Anthony Knyszek
9ac1ee2d46 runtime: track the amount of scannable allocated stack for the GC pacer
This change adds two fields to gcControllerState: stackScan, used for
pacing decisions, and scannableStackSize, which directly tracks the
amount of space allocated for inuse stacks that will be scanned.

scannableStackSize is not updated directly, but is instead flushed from
each P when at an least 8 KiB delta has accumulated. This helps reduce
issues with atomics contention for newly created goroutines. Stack
growth paths are largely unaffected.

StackGrowth-48			51.4ns ± 0%	51.4ns ± 0%	~	(p=0.927 n=10+10)
StackGrowthDeep-48		6.14µs ± 3%	6.25µs ± 4%	~	(p=0.090 n=10+9)
CreateGoroutines-48		273ns ± 1%	273ns ± 1%	~	(p=0.676 n=9+10)
CreateGoroutinesParallel-48	65.5ns ± 5%	66.6ns ± 7%	~	(p=0.340 n=9+9)
CreateGoroutinesCapture-48	2.06µs ± 1%	2.07µs ± 4%	~	(p=0.217 n=10+10)
CreateGoroutinesSingle-48	550ns ± 3%	563ns ± 4%	+2.41%	(p=0.034 n=8+10)

For #44167.

Change-Id: Id1800d41d3a6c211b43aeb5681c57c0dc8880daf
Reviewed-on: https://go-review.googlesource.com/c/go/+/309589
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-10-29 18:35:20 +00:00
Josh Bleecher Snyder
d3ad216f8e cmd/link, runtime: use offset for _func.entry
The first field of the func data stored by the linker is the
entry PC for the function. Prior to this change, this was stored
as a relocation to the function. Change this to be an offset
relative to runtime.text.

This reduces the number of relocations on darwin/arm64 by about 10%.
It also slightly shrinks binaries:

file      before    after     Δ       %
addr2line 3803058   3791298   -11760  -0.309%
api       5140114   5104242   -35872  -0.698%
asm       4886850   4840626   -46224  -0.946%
buildid   2512466   2503042   -9424   -0.375%
cgo       4374770   4342274   -32496  -0.743%
compile   22920530  22769202  -151328 -0.660%
cover     4624626   4588242   -36384  -0.787%
dist      3217570   3205522   -12048  -0.374%
doc       3715026   3684498   -30528  -0.822%
fix       3148226   3119266   -28960  -0.920%
link      6350226   6313362   -36864  -0.581%
nm        3768850   3757106   -11744  -0.312%
objdump   4140594   4127618   -12976  -0.313%
pack      2227474   2218818   -8656   -0.389%
pprof     13598706  13506786  -91920  -0.676%
test2json 2497234   2487426   -9808   -0.393%
trace     10198066  10118498  -79568  -0.780%
vet       6930658   6889074   -41584  -0.600%
total     108055044 107366900 -688144 -0.637%

It should also incrementally speed up binary launching.

This is the first step towards removing enough relocations
that pages that were previously dirtied by the loader may remain clean,
which will offer memory savings useful in constrained environments.

Change-Id: Icfba55e696ba2f9c99c4f179125ba5a3ba4369c9
Reviewed-on: https://go-review.googlesource.com/c/go/+/351463
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-29 22:14:22 +00:00
Josh Bleecher Snyder
61a0a70113 runtime: convert _func.entry to a method
A subsequent change will alter the semantics of _func.entry.
To make that change obvious and clear, change _func.entry to a method,
and rename the field to _func.entryPC.

Change-Id: I05d66b54d06c5956d4537b0729ddf4290c3e2635
Reviewed-on: https://go-review.googlesource.com/c/go/+/351460
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-27 20:58:49 +00:00
Josh Bleecher Snyder
6c163e5ac9 runtime: change funcinl sentinel value from 0 to ^0
_func and funcinl are type-punned.
We distinguish them at runtime by inspecting the first word.

Prior to this change, we used 0 as the sentinel value
that means that a Func is a funcinl.
That worked because _func's first word is the functions' entry PC,
and 0 is not a valid PC. I plan to make *_func's entry PC relative
to the containing module. As a result, 0 will be a valid value,
for the first function in the module.

Switch to ^0 as the new sentinel value, which is neither a valid
entry PC nor a valid PC offset.

Change-Id: I4c718523a083ed6edd57767c3548640681993522
Reviewed-on: https://go-review.googlesource.com/c/go/+/351459
Trust: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-27 20:58:24 +00:00
Meng Zhuo
483533df9e runtime: using wyrand for fastrand
For modern 64-bit CPU architecture multiplier is faster than xorshift

darwin/amd64
name              old time/op  new time/op  delta
Fastrand          2.13ns ± 1%  1.78ns ± 1%  -16.47%  (p=0.000 n=9+10)
FastrandHashiter  32.5ns ± 4%  32.1ns ± 3%     ~     (p=0.277 n=8+9)
Fastrandn/2       2.16ns ± 1%  1.99ns ± 1%   -7.53%  (p=0.000 n=10+10)
Fastrandn/3       2.13ns ± 3%  2.00ns ± 1%   -5.88%  (p=0.000 n=10+10)
Fastrandn/4       2.08ns ± 2%  1.98ns ± 2%   -4.71%  (p=0.000 n=10+10)
Fastrandn/5       2.08ns ± 2%  1.98ns ± 1%   -4.90%  (p=0.000 n=10+9)

linux/mips64le
name              old time/op  new time/op  delta
Fastrand          12.1ns ± 0%  10.8ns ± 1%  -10.58%  (p=0.000 n=8+10)
FastrandHashiter   105ns ± 1%   105ns ± 1%     ~     (p=0.138 n=10+10)
Fastrandn/2       16.9ns ± 0%  16.4ns ± 4%   -2.84%  (p=0.020 n=10+10)
Fastrandn/3       16.9ns ± 0%  16.4ns ± 3%   -3.03%  (p=0.000 n=10+10)
Fastrandn/4       16.9ns ± 0%  16.5ns ± 2%   -2.01%  (p=0.002 n=8+10)
Fastrandn/5       16.9ns ± 0%  16.4ns ± 3%   -2.70%  (p=0.000 n=8+10)

linux/riscv64
name              old time/op  new time/op  delta
Fastrand          22.7ns ± 0%  12.7ns ±19%  -44.09%  (p=0.000 n=9+10)
FastrandHashiter   255ns ± 4%   250ns ± 7%     ~     (p=0.363 n=10+10)
Fastrandn/2       31.8ns ± 2%  28.5ns ±13%  -10.45%  (p=0.000 n=10+10)
Fastrandn/3       33.0ns ± 2%  27.4ns ± 8%  -17.16%  (p=0.000 n=9+10)
Fastrandn/4       29.6ns ± 3%  28.2ns ± 5%   -4.81%  (p=0.000 n=8+9)
Fastrandn/5       33.4ns ± 3%  26.5ns ± 9%  -20.49%  (p=0.000 n=8+10)

Change-Id: I88ac69625ef923f8be66647e3361e3be986de002
Reviewed-on: https://go-review.googlesource.com/c/go/+/337350
Trust: Meng Zhuo <mzh@golangcn.org>
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2021-09-24 01:25:24 +00:00
Martin Möhrmann
22540abf76 runtime: use RDTSCP for instruction stream serialized read of TSC
To measure all instructions having been completed before reading
the time stamp counter with RDTSC an instruction sequence that
has instruction stream serializing properties which guarantee
waiting until all previous instructions have been executed is
needed. This does not necessary mean to wait for all stores to
be globally visible.

This CL aims to remove vendor specific logic for determining the
instruction sequence with CPU feature flag checks that are
CPU vendor independent.

For intel LFENCE has the wanted properties at least
since it was introduced together with SSE2 support.

On AMD instruction stream serializing LFENCE is supported by setting
an MSR C001_1029[1]=1 on AMD family 10h/12h/14h/15h/16h/17h processors.
AMD family 0Fh/11h processors support LFENCE as serializing always.
AMD plans support for this MSR and access to this bit for all future processors.
Source: https://developer.amd.com/wp-content/resources/Managing-Speculation-on-AMD-Processors.pdf

Reading the MSR to determine LFENCE properties is not always possible
or reliable (hypervisors). The Linux kernel is relying on serializing
LFENCE on AMD CPUs since a commit in July 2019: https://lkml.org/lkml/2019/7/22/295
and the MSR C001_1029 to enable serialization has been set by default
with the Spectre v1 mitigations.

Using an MFENCE on AMD is waiting on previous instructions having been executed
but in addition also flushes store buffers.

To align the serialization properties without runtime detection
of CPU manufacturers we can use the newer RDTSCP instruction which
waits until all previous instructions have been executed.

RDTSCP is available on Intel since around 2008 and on AMD CPUs since
around 2006. Support for RDTSCP can be checked independently
of manufacturer by checking CPUID bits.

Using RDTSCP is the default in Linux to read TSC in program order
when the instruction is available.
e22ce8eb63/arch/x86/include/asm/msr.h (L231)

Change-Id: Ifa841843b9abb2816f8f0754a163ebf01385306d
Reviewed-on: https://go-review.googlesource.com/c/go/+/344429
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Martin Möhrmann <martin@golang.org>
Run-TryBot: Martin Möhrmann <martin@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-23 20:32:04 +00:00
Matthew Dempsky
a64ab8d3ec [dev.typeparams] all: merge master (46fd547) into dev.typeparams
Conflicts:

- src/go/types/check_test.go

  CL 324730 on dev.typeparams changed the directory paths in TestCheck,
  TestExamples, and TestFixedbugs and renamed checkFiles to testFiles;
  whereas CL 337529 on master added a new test case just above them and
  that used checkFiles.

Merge List:

+ 2021-08-12 46fd547d89 internal/goversion: update Version to 1.18
+ 2021-08-12 5805efc78e doc/go1.17: remove draft notice
+ 2021-08-12 39634e7dae CONTRIBUTORS: update for the Go 1.17 release
+ 2021-08-12 095bb790e1 os/exec: re-enable LookPathTest/16
+ 2021-08-11 dea23e9ca8 src/make.*: make --no-clean flag a no-op that prints a warning
+ 2021-08-11 d4c0ed26ac doc/go1.17: linker passes -I to extld as -Wl,--dynamic-linker
+ 2021-08-10 1f9c9d8530 doc: use "high address/low address" instead of "top/bottom"
+ 2021-08-09 f1dce319ff cmd/go: with -mod=vendor, don't panic if there are duplicate requirements
+ 2021-08-09 7aeaad5c86 runtime/cgo: when using msan explicitly unpoison cgoCallers
+ 2021-08-08 507cc341ec doc: add example for conversion from slice expressions to array ptr
+ 2021-08-07 891547e2d4 doc/go1.17: fix a typo introduced in CL 335135
+ 2021-08-06 8eaf4d16bc make.bash: do not overwrite GO_LDSO if already set
+ 2021-08-06 63b968f4f8 doc/go1.17: clarify Modules changes
+ 2021-08-06 70546f6404 runtime: allow arm64 SEH to be called if illegal instruction
+ 2021-08-05 fd45e267c2 runtime: warn that KeepAlive is not an unsafe.Pointer workaround
+ 2021-08-04 6e738868a7 net/http: speed up and deflake TestCancelRequestWhenSharingConnection
+ 2021-08-02 8a7ee4c51e io/fs: don't use absolute path in DirEntry.Name doc
+ 2021-07-31 b8ca6e59ed all: gofmt
+ 2021-07-30 b7a85e0003 net/http/httputil: close incoming ReverseProxy request body
+ 2021-07-29 70fd4e47d7 runtime: avoid possible preemption when returning from Go to C
+ 2021-07-28 9eee0ed439 cmd/go: fix go.mod file name printed in error messages for replacements
+ 2021-07-28 b39e0f461c runtime: don't crash on nil pointers in checkptrAlignment
+ 2021-07-27 7cd10c1149 cmd/go: use .mod instead of .zip to determine if version has go.mod file
+ 2021-07-27 c8cf0f74e4 cmd/go: add missing flag in UsageLine
+ 2021-07-27 7ba8e796c9 testing: clarify T.Name returns a distinct name of the running test
+ 2021-07-27 33ff155970 go/types: preserve untyped constants on the RHS of a shift expression
+ 2021-07-26 840e583ff3 runtime: correct variable name in comment
+ 2021-07-26 bfbb288574 runtime: remove adjustTimers counter
+ 2021-07-26 9c81fd53b3 cmd/vet: add missing copyright header

Change-Id: Ia80604d24c6f4205265683024e3100769cf32065
2021-08-12 12:43:12 -07:00
Austin Clements
e590cb64f9 [dev.typeparams] runtime: handle d.link carefully when freeing a defer
CL 339396 allowed stack copying on entry to and during freedefer, but
this introduced a subtle bug: if d is heap-allocated, and d.link
points to a stack-allocated defer, stack copying during freedefer can
briefly introduce a stale pointer, which the garbage collector can
discover and panic about. This happens because d has already been
unlinked from the defer chain when freedefer is called, so stack
copying won't update stack pointers in it.

Fix this by making freedefer nosplit again and immediately clearing
d.link.

This should fix the longtest builders, which currently fail on
GOMAXPROCS=2 runtime -cpu=1,2,4 -quick in the TestDeferHeapAndStack
test.

This seems like the simplest fix, but it just deals with the subtlety
rather than eliminating it. Really, every call site of freedefer (of
which there are surprisingly many) has hidden subtlety between
unlinking the defer and calling freedefer. We could consolidate the
subtlety into each call site by requiring that they unlink the defer
and set d.link to nil before calling freedefer. freedefer could check
this condition like it checks that various other fields have already
been zeroed. A more radical option is to replace freedefer with
"popDefer", which would both pop the defer off the link and take care
of freeing it. There would still be a brief moment of subtlety, but it
would be in one place, in popDefer. Annoyingly, *almost* every call to
freedefer just pops the defer from the head of the G's list, but
there's one place when handling open-coded defers where we have to
remove a defer from the middle of the list. I'm inclined to first fix
that subtlety by only expanding open-coded defer records when they're
at the head of the defer list, and then revisit the popDefer idea.

Change-Id: I3130d2542c01a421a5d60e8c31f5379263219627
Reviewed-on: https://go-review.googlesource.com/c/go/+/339730
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-08-04 15:51:58 +00:00
Austin Clements
ea94e5d3c5 [dev.typeparams] runtime: use func() for deferred functions
Prior to regabi, a deferred function could have any signature, so the
runtime always manipulated them as funcvals. Now, a deferred function
is always func(). Hence, this CL makes the runtime's manipulation of
deferred functions more type-safe by using func() directly instead of
*funcval.

Change-Id: Ib55f38ed49107f74149725c65044e4690761971d
Reviewed-on: https://go-review.googlesource.com/c/go/+/337650
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-07-30 18:49:41 +00:00
Ian Lance Taylor
bfbb288574 runtime: remove adjustTimers counter
In CL 336432 we changed adjusttimers so that it no longer cleared
timerModifiedEarliest if there were no timersModifiedEarlier timers.
This caused some Google internal tests to time out, presumably due
to the increased contention on timersLock.  We can avoid that by
simply not skipping the loop in adjusttimers, which lets us safely
clear timerModifiedEarliest.  And if we don't skip the loop, then there
isn't much reason to keep the count of timerModifiedEarlier timers at all.
So remove it.

The effect will be that for programs that create some timerModifiedEarlier
timers and then remove them all, the program will do an occasional
additional loop over all the timers.  And, programs that have some
timerModifiedEarlier timers will always loop over all the timers,
without the quicker exit when they have all been seen.  But the loops
should not occur all that often, due to timerModifiedEarliest.

For #47329

Change-Id: I7b244c1244d97b169a3c7fbc8f8a8b115731ddee
Reviewed-on: https://go-review.googlesource.com/c/go/+/337309
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-07-26 22:15:24 +00:00
Matthew Dempsky
a27e325c59 [dev.typeparams] all: merge master (798ec73) into dev.typeparams
Merge List:

+ 2021-07-22 798ec73519 runtime: don't clear timerModifiedEarliest if adjustTimers is 0
+ 2021-07-22 fdb45acd1f runtime: move mem profile sampling into m-acquired section
+ 2021-07-21 3e48c0381f reflect: add missing copyright header
+ 2021-07-21 48c88f1b1b reflect: add Value.CanConvert
+ 2021-07-20 9e26569293 cmd/go: don't add C compiler ID to hash for standard library
+ 2021-07-20 d568e6e075 runtime/debug: skip TestPanicOnFault on netbsd/arm

Change-Id: I87e1cd4614bb3b00807f18dfdd02664dcaecaebd
2021-07-22 12:50:31 -07:00
Ian Lance Taylor
798ec73519 runtime: don't clear timerModifiedEarliest if adjustTimers is 0
This avoids a race when a new timerModifiedEarlier timer is created by
a different goroutine.

Fixes #47329

Change-Id: I6f6c87b4a9b5491b201c725c10bc98e23e0ed9d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/336432
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-07-22 19:26:40 +00:00
Matthew Dempsky
2b1d70a137 [dev.typeparams] all: merge master (296ddf2) into dev.typeparams
Conflicts:

- src/runtime/runtime2.go

  On master, CL 317191 fixed the mentions of gc/reflect.go in comments
  to reflectdata/reflect.go; but on dev.typeparams, CL 325921 fixed
  that the same comment to reflect that deferstruct actually ended up
  in ssagen/ssa.go.

Merge List:

+ 2021-07-08 296ddf2a93 net: filter bad names from Lookup functions instead of hard failing
+ 2021-07-08 ce76298ee7 Update oudated comment
+ 2021-07-08 2ca44fe221 doc/go1.17: linkify time.UnixMilli and time.UnixMicro
+ 2021-07-07 5c59e11f5e cmd/compile: remove special-casing of blank in types.sconv{,2}
+ 2021-07-07 b003a8b1ae cmd/compile: optimize types.sconv
+ 2021-07-07 11f5df2d67 cmd/compile: extract pkgqual from symfmt
+ 2021-07-07 991fd381d5 cmd/go: don't lock .mod and .sum files for read in overlay
+ 2021-07-07 186a3bb4b0 cmd/go/internal/modfetch/codehost: skip hg tests if no hg binary is present
+ 2021-07-07 00c00558e1 cmd/go/internal/modload: remove unused functions
+ 2021-07-07 f264879f74 cmd/go/internal/modload: fix an apparent typo in the AutoRoot comment
+ 2021-07-07 c96833e5ba doc: remove stale comment about arm64 port

Change-Id: I849046b6d8f7421f60323549f3f763ef418bf9e7
2021-07-08 13:11:32 -07:00
makdon
ce76298ee7 Update oudated comment
Update comment cause gc/select.go has been moved to walk/select.go and gc/reflect.go has been moved to reflectdata/reflect.go

Change-Id: I6894527e1e9dbca50ace92a51bf942f9495ce88c
GitHub-Last-Rev: 6d6a447144
GitHub-Pull-Request: golang/go#45976
Reviewed-on: https://go-review.googlesource.com/c/go/+/317191
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Michael Pratt <mpratt@google.com>
2021-07-08 16:59:21 +00:00
Michael Anthony Knyszek
9c58e399a4 [dev.typeparams] runtime: fix import sort order [generated]
[git-generate]
cd src/runtime
goimports -w *.go

Change-Id: I1387af0f2fd1a213dc2f4c122e83a8db0fcb15f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/329189
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-06-17 20:42:23 +00:00
Michael Anthony Knyszek
6d85891b29 [dev.typeparams] runtime: replace uses of runtime/internal/sys.PtrSize with internal/goarch.PtrSize [generated]
[git-generate]
cd src/runtime/internal/math
gofmt -w -r "sys.PtrSize -> goarch.PtrSize" .
goimports -w *.go
cd ../..
gofmt -w -r "sys.PtrSize -> goarch.PtrSize" .
goimports -w *.go

Change-Id: I43491cdd54d2e06d4d04152b3d213851b7d6d423
Reviewed-on: https://go-review.googlesource.com/c/go/+/328337
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-06-17 18:54:48 +00:00
Cherry Mui
a4121d7dd6 [dev.typeparams] Revert "[dev.typeparams] runtime: make deferproc take a func() argument"
Temprary revert CL 325918.

Delve relies on the _defer.fn.fn field to get defer frames.
CL 325918 changes the type of _defer.fn to func(), which no
longer has an fn field.

Change-Id: If6c71b15a27bac579593f5273c9a49715e6e35b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/327775
Trust: Cherry Mui <cherryyz@google.com>
Trust: Dan Scales <danscales@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
2021-06-16 15:27:43 +00:00
Cherry Mui
e0e9fb8aff [dev.typeparams] runtime: simplify defer record allocation
Now that deferred functions are always argumentless and defer
records are no longer with arguments, defer record can be fixed
size (just the _defer struct). This allows us to simplify the
allocation of defer records, specifically, remove the defer
classes and the pools of different sized defers.

Change-Id: Icc4b16afc23b38262ca9dd1f7369ad40874cf701
Reviewed-on: https://go-review.googlesource.com/c/go/+/326062
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-06-11 18:33:07 +00:00
Cherry Mui
74b0b2772a [dev.typeparams] cmd/compile, runtime: remove _defer.siz field
As deferred function now always has zero arguments, _defer.siz is
always 0 and can be removed.

Change-Id: Ibb89f65b2f9d2ba4aeabe50438cc3d4b6a88320b
Reviewed-on: https://go-review.googlesource.com/c/go/+/325921
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-06-08 22:09:16 +00:00
Cherry Mui
83da32749c [dev.typeparams] runtime: make deferproc take a func() argument
Previously it takes a *funcval, as it can be any function types.
Now it must be func(). Make it so.

Change-Id: I04273047b024386f55dbbd5fbda4767cbee7ac93
Reviewed-on: https://go-review.googlesource.com/c/go/+/325918
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-06-08 20:55:14 +00:00
Andy Pan
3b304ce7fe runtime: implement runqdrain() for GC mark worker goroutines
Revive CL 310149

Change-Id: Ib4714ea5b2ade32c0f66edff841a79d8212bd79a
Reviewed-on: https://go-review.googlesource.com/c/go/+/313009
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
2021-05-05 15:53:20 +00:00
Michael Pratt
bedfeed54a runtime,runtime/metrics: add metric to track scheduling latencies
This change adds a metric to track scheduling latencies, defined as the
cumulative amount of time a goroutine spends being runnable before
running again. The metric is an approximations and samples instead of
trying to record every goroutine scheduling latency.

This change was primarily authored by mknyszek@google.com.

Change-Id: Ie0be7e6e7be421572eb2317d3dd8dd6f3d6aa152
Reviewed-on: https://go-review.googlesource.com/c/go/+/308933
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-04-23 13:48:10 +00:00
Andrew G. Morgan
7e97e4e8cc syscall: syscall.AllThreadsSyscall signal handling fixes
The runtime support for syscall.AllThreadsSyscall() functions had
some corner case deadlock issues when signal handling was in use.
This was observed in at least 3 build test failures on ppc64 and
amd64 architecture CGO_ENABLED=0 builds over the last few months.

The fixes involve more controlled handling of signals while the
AllThreads mechanism is being executed. Further details are
discussed in bug #44193.

The all-threads syscall support is new in go1.16, so earlier
releases are not affected by this bug.

Fixes #44193

Change-Id: I01ba8508a6e1bb2d872751f50da86dd07911a41d
Reviewed-on: https://go-review.googlesource.com/c/go/+/305149
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Pratt <mpratt@google.com>
Trust: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-04-21 21:25:26 +00:00