This adds signal-based preemption to preemptone.
Since STW and forEachP ultimately use preemptone, this also makes
these work with async preemption.
This also makes freezetheworld more robust so tracebacks from fatal
panics should be far less likely to report "goroutine running on other
thread; stack unavailable".
For #10958, #24543. (This doesn't fix it yet because asynchronous
preemption only works on POSIX platforms on 386 and amd64 right now.)
Change-Id: If776181dd5a9b3026a7b89a1b5266521b95a5f61
Reviewed-on: https://go-review.googlesource.com/c/go/+/201762
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This adds support for pausing a running G by sending a signal to its
M.
The main complication is that we want to target a G, but can only send
a signal to an M. Hence, the protocol we use is to simply mark the G
for preemption (which we already do) and send the M a "wake up and
look around" signal. The signal checks if it's running a G with a
preemption request and stops it if so in the same way that stack check
preemptions stop Gs. Since the preemption may fail (the G could be
moved or the signal could arrive at an unsafe point), we keep a count
of the number of received preemption signals. This lets stopG detect
if its request failed and should be retried without an explicit
channel back to suspendG.
For #10958, #24543.
Change-Id: I3e1538d5ea5200aeb434374abb5d5fdc56107e53
Reviewed-on: https://go-review.googlesource.com/c/go/+/201760
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
_defer.fn can be nil, so we need to add a check when dumping
_defer.fn.fn.
Fixes#35172
Change-Id: Ic1138be5ec9dce915a87467cfa51ff83acc6e3a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/203697
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
We're about to introduce asynchronous safe points, where we won't have
precise pointer maps for all stack frames. That's okay for scanning
the stack (conservatively), but not for shrinking the stack.
Hence, this CL prepares for this by only shrinking the stack as part
of the stack scan if the goroutine is stopped at a synchronous safe
point. Otherwise, it queues up the stack shrink for the next
synchronous safe point.
We already have one condition under which we can't shrink the stack
for very similar reasons: syscalls. Currently, we just give up on
shrinking the stack if it's in a syscall. But with this mechanism, we
defer that stack shrink until the next synchronous safe point.
For #10958, #24543.
Change-Id: Ifa1dec6f33fdf30f9067be2ce3f7ab8a7f62ce38
Reviewed-on: https://go-review.googlesource.com/c/go/+/201438
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When we copy a stack of a goroutine blocked in a channel operation, we
have to be very careful because other goroutines may be writing to
that goroutine's stack. To handle this, stack copying acquires the
locks for the channels a goroutine is waiting on.
One complication is that stack growth may happen while a goroutine
holds these locks, in which case stack copying must *not* acquire
these locks because that would self-deadlock.
Currently, stack growth never acquires these locks because stack
growth only happens when a goroutine is running, which means it's
either not blocking on a channel or it's holding the channel locks
already. Stack shrinking always acquires these locks because shrinking
happens asynchronously, so the goroutine is never running, so there
are either no locks or they've been released by the goroutine.
However, we're about to change when stack shrinking can happen, which
is going to break the current rules. Rather than find a new way to
derive whether to acquire these locks or not, this CL simply adds a
flag to the g struct that indicates that stack copying should acquire
channel locks. This flag is set while the goroutine is blocked on a
channel op.
For #10958, #24543.
Change-Id: Ia2ac8831b1bfda98d39bb30285e144c4f7eaf9ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/172982
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently, gcscanvalid is used to resolve a race between attempts to
scan a stack. Now that there's a clear owner of the stack scan
operation, there's no longer any danger of racing or attempting to
scan a stack more than once, so this CL eliminates gcscanvalid.
I double-checked my reasoning by first adding a throw if gcscanvalid
was set in scanstack and verifying that all.bash still passed.
For #10958, #24543.
Fixes#24363.
Change-Id: I76794a5fcda325ed7cfc2b545e2a839b8b3bc713
Reviewed-on: https://go-review.googlesource.com/c/go/+/201139
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, the process of suspending a goroutine is tied to stack
scanning. In preparation for non-cooperative preemption, this CL
abstracts this into general purpose suspendG/resumeG functions.
suspendG and resumeG closely follow the existing scang and restartg
functions with one exception: the addition of a _Gpreempted status.
Currently, preemption tasks (stack scanning) are carried out by the
target goroutine if it's in _Grunning. In this new approach, the task
is always carried out by the goroutine that called suspendG. Thus, we
need a reliable way to drive the target goroutine out of _Grunning
until the requesting goroutine is ready to resume it. The new
_Gpreempted state provides the handshake: when a runnable goroutine
responds to a preemption request, it now parks itself and enters
_Gpreempted. The requesting goroutine races to put it in _Gwaiting,
which gives it ownership, but also the responsibility to start it
again.
This CL adds several TODOs about improving the synchronization on the
G status. The existing code already has these problems; we're just
taking note of them.
The next CL will remove the now-dead scang and preemptscan.
For #10958, #24543.
Change-Id: I16dbf87bea9d50399cc86719c156f48e67198f16
Reviewed-on: https://go-review.googlesource.com/c/go/+/201137
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We already claim on the documentation for _Grunning that this is case,
but execute transitions to _Grunning before assigning g.m. Fix this
and make the documentation even more explicit.
For #10958, #24543, but also a good cleanup.
Change-Id: I1eb0108e7762f55cfb0282aca624af1c0a15fe56
Reviewed-on: https://go-review.googlesource.com/c/go/+/201440
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
Since the new timers run on g0, which does not have a race context,
we add a race context field to the P, and use that for timer functions.
This works since all timer functions are in the standard library.
Updates #27707
Change-Id: I8a5b727b4ddc8ca6fc60eb6d6f5e9819245e395b
Reviewed-on: https://go-review.googlesource.com/c/go/+/171882
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This adds a new field to P, adjustTimers, that tells the P that one of
its existing timers was modified to be earlier, and that it therefore
needs to resort them.
Updates #27707
Change-Id: I4c5f5b51ed116f1d898d3f87cdddfa1b552337f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/171832
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Add support to the main scheduler loop for handling timers on P's.
This is not used yet, as timers are not yet put on P's.
Updates #6239
Updates #27707
Change-Id: I6a359df408629f333a9232142ce19e8be8496dae
Reviewed-on: https://go-review.googlesource.com/c/go/+/171826
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
Part 1: CL 199499 (GOOS nacl)
Part 2: CL 200077 (amd64p32 files, toolchain)
Part 3: stuff that arguably should've been part of Part 2, but I forgot
one of my grep patterns when splitting the original CL up into
two parts.
This one might also have interesting stuff to resurrect for any future
x32 ABI support.
Updates #30439
Change-Id: I2b4143374a253a003666f3c69e776b7e456bdb9c
Reviewed-on: https://go-review.googlesource.com/c/go/+/200318
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
`cmd/compile/internal/gc/reflect.go:/^func.dumptypestructs` was modified many times, now is `cmd/compile/internal/gc/reflect.go:/^func.dumptabs`
Change-Id: Ie949a5bee7878c998591468a04f67a8a70c61da7
GitHub-Last-Rev: 9ecc26985e
GitHub-Pull-Request: golang/go#34489
Reviewed-on: https://go-review.googlesource.com/c/go/+/197037
Reviewed-by: Keith Randall <khr@golang.org>
This reverts CL 180761
Reason for revert: Reinstate the stack-allocated defer CL.
There was nothing wrong with the CL proper, but stack allocation of defers exposed two other issues.
Issue #32477: Fix has been submitted as CL 181258.
Issue #32498: Possible fix is CL 181377 (not submitted yet).
Change-Id: I32b3365d5026600069291b068bbba6cb15295eb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/181378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This reverts commit fff4f599fe.
Reason for revert: Seems to still have issues around GC.
Fixes#32452
Change-Id: Ibe7af629f9ad6a3d5312acd7b066123f484da7f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/180761
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.
This should make defers like this (which are very common) faster.
This optimization applies to 363 out of the 370 static defer sites
in the cmd/go binary.
name old time/op new time/op delta
Defer-4 52.2ns ± 5% 36.2ns ± 3% -30.70% (p=0.000 n=10+10)
Fixes#6980
Update #14939
Change-Id: I697109dd7aeef9e97a9eeba2ef65ff53d3ee1004
Reviewed-on: https://go-review.googlesource.com/c/go/+/171758
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This was left over from the C->Go transition.
Change-Id: I52494af3d49a388dc45b57210ba68292ae01cf84
Reviewed-on: https://go-review.googlesource.com/c/go/+/176897
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
This change adds a background scavenging goroutine whose pacing is
determined when the heap goal changes. The scavenger is paced to use
at most 1% of the mutator's time for most systems. Furthermore, the
scavenger's pacing is computed based on the estimated number of
scavengable huge pages to take advantage of optimizations provided by
the OS.
The purpose of this scavenger is to deal with a shrinking heap: if the
heap goal is falling over time, the scavenger should kick in and start
returning free pages from the heap to the OS.
Also, now that we have a pacing system, the credit system used by
scavengeLocked has become redundant. Replace it with a mechanism which
only scavenges on the allocation path if it makes sense to do so with
respect to the new pacing system.
Fixes#30333.
Change-Id: I6203f8dc84affb26c3ab04528889dd9663530edc
Reviewed-on: https://go-review.googlesource.com/c/go/+/142960
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
We're about to change some of these rules, so it's about time we wrote
them down!
For #10958, #24543.
Change-Id: I3efce0c44b53bfb6f31ce2d299809b2b4eb329f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/172857
Reviewed-by: Russ Cox <rsc@golang.org>
This adds an internal runtime debug log. It uses per-M time-stamped
ring buffers of binary log records. On panic, these buffers are
collected, interleaved, and printed.
The entry-point to the debug log is a new "dlog" function. dlog is
designed so it can be used even from very constrained corners of the
runtime such as signal handlers or inside the write barrier.
The facility is only enabled if the debuglog build tag is set.
Otherwise, it compiles away to a no-op implementation.
The debug log format is also designed so it would be reasonable to
decode from a core dump, though this hasn't been implemented.
Change-Id: I6e2737c286358e97a0d8091826498070b95b66a3
Reviewed-on: https://go-review.googlesource.com/c/go/+/157997
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Both g and p had a racectx field, but they held different kinds of values.
The g field held ThreadState values while the p field held Processor values
(to use the names used in the C++ code in the compiler_rt support library).
Rename the p field to raceprocctx to reduce potential confusion.
Change-Id: Iefba0e259d240171e973054c452c3c15bf3f8f8f
Reviewed-on: https://go-review.googlesource.com/c/go/+/169960
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Returning the innermost frame instead of the outermost
makes code that walks the results of runtime.Caller{,s}
still work correctly in the presence of mid-stack inlining.
Fixes#29582
Change-Id: I2392e3dd5636eb8c6f58620a61cef2194fe660a7
Reviewed-on: https://go-review.googlesource.com/c/156364
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
The current support_XXX variables are specific for the
amd64 and 386 platforms.
Prefix processor capability variables by architecture to have a
consistent naming scheme and avoid reuse of the existing
variables for new platforms.
This also aligns naming of runtime variables closer with internal/cpu
processor capability variable names.
Change-Id: I3eabb29a03874678851376185d3a62e73c1aff1d
Reviewed-on: https://go-review.googlesource.com/c/91435
Run-TryBot: Martin Möhrmann <martisch@uos.de>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
When a goroutine enters a syscall, its M unwires from its P to allow
the P to be retaken by another M if the syscall is slow. The M retains a
reference to its old P, however, so that if its old P has not been
retaken when the syscall returns, it can quickly reacquire that P.
The implementation, however, was confusing, as it left the reference to
the potentially-retaken P in m.p, which implied that the P was still
wired.
Make the code clearer by enforcing the invariant that m.p is never
stale. entersyscall now moves m.p to m.oldp and sets m.p to 0;
exitsyscall does the reverse, provided m.oldp has not been retaken.
With this scheme in place, the issue described in #27660 (assertion
failures in the race detector) would have resulted in a clean segfault
instead of silently corrupting memory.
Change-Id: Ib3e03623ebed4f410e852a716919fe4538858f0a
Reviewed-on: https://go-review.googlesource.com/c/148899
Run-TryBot: Dmitry Vyukov <dvyukov@google.com>
Reviewed-by: Dmitry Vyukov <dvyukov@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
When a function triggers a signal (like a segfault which translates to
a nil pointer exception) during execution, a sigpanic handler is just
below it on the stack. The function itself did not stop at a
safepoint, so we have to figure out what safepoint we should use to
scan its stack frame.
Previously we used the site of the most recent defer to get the live
variables at the signal site. That answer is not quite correct, as
explained in #27518. Instead, use the site of a deferreturn call.
It has all the right variables marked as live (no args, all the return
values, except those that escape to the heap, in which case the
corresponding PAUTOHEAP variables will be live instead).
This CL requires stack objects, so that all the local variables
and args referenced by the deferred closures keep the right variables alive.
Fixes#27518
Change-Id: Id45d8a8666759986c203181090b962e2981e48ca
Reviewed-on: https://go-review.googlesource.com/c/134637
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Now that we do no mark work during mark termination, we no longer need
the gchelper mechanism.
Updates #26903.
Updates #17503.
Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b
Reviewed-on: https://go-review.googlesource.com/c/134785
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This adds support for disabling the scheduling of user goroutines
while allowing system goroutines like the garbage collector to
continue running. User goroutines pass through the usual state
transitions, but if we attempt to actually schedule one, it will get
put on a deferred scheduling list.
Updates #26903. This is preparation for unifying STW GC and concurrent
GC.
Updates #25578. This same mechanism can form the basis for disabling
all but a single user goroutine for the purposes of debugger function
call injection.
Change-Id: Ib72a808e00c25613fe6982f5528160d3de3dbbc6
Reviewed-on: https://go-review.googlesource.com/c/134779
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This makes the runtime.support_sse2 variable unused
so it is removed in this CL too.
Change-Id: Ia8b9ffee7ac97128179f74ef244b10315e44c234
Reviewed-on: https://go-review.googlesource.com/131455
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
sys here is runtime/internal/sys.
Replace uses of sys.CacheLineSize for padding by
cpu.CacheLinePad or cpu.CacheLinePadSize.
Replace other uses of sys.CacheLineSize by cpu.CacheLineSize.
Remove now unused sys.CacheLineSize.
Updates #25203
Change-Id: I1daf410fe8f6c0493471c2ceccb9ca0a5a75ed8f
Reviewed-on: https://go-review.googlesource.com/126601
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Using internal/cpu variables has the benefit of avoiding false sharing
(as those are padded) and allows memory and cache usage for these variables
to be shared by multiple packages.
Change-Id: I2bf68d03091bf52b466cf689230d5d25d5950037
Reviewed-on: https://go-review.googlesource.com/126599
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
There are two manually managed G dequeues. Abstract these both into a
shared gQueue type. This also introduces a gList type, which we'll use
to replace several manually-managed G lists in follow-up CLs.
This makes the code more readable and maintainable. gcFlushBgCredit in
particular becomes much easier to follow. It also makes it easier to
introduce more G queues in the future. Finally, the gList type clearly
distinguishes between lists of Gs and individual Gs; currently both
are represented by a *g, which can easily lead to confusion and bugs.
Change-Id: Ic7798841b405d311fc8b6aa5a958ffa4c7993c6c
Reviewed-on: https://go-review.googlesource.com/129396
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
ARMv8.1 has added new instruction (LDADDAL) for atomic memory operations. This
CL improves existing atomic add intrinsics with the new instruction. Since the
new instruction is only guaranteed to be present after ARMv8.1, we guard its
usage with a conditional on CPU feature.
Performance result on ARMv8.1 machine:
name old time/op new time/op delta
Xadd-224 1.05µs ± 6% 0.02µs ± 4% -98.06% (p=0.000 n=10+8)
Xadd64-224 1.05µs ± 3% 0.02µs ±13% -98.10% (p=0.000 n=9+10)
[Geo mean] 1.05µs 0.02µs -98.08%
Performance result on ARMv8.0 machine:
name old time/op new time/op delta
Xadd-46 538ns ± 1% 541ns ± 1% +0.62% (p=0.000 n=9+9)
Xadd64-46 505ns ± 1% 508ns ± 0% +0.48% (p=0.003 n=9+8)
[Geo mean] 521ns 524ns +0.55%
Change-Id: If4b5d8d0e2d6f84fe1492a4f5de0789910ad0ee9
Reviewed-on: https://go-review.googlesource.com/81877
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This also allows the GODEBUGCPU options to change the
support_* runtime cpu feature variable values.
Change-Id: I884c5f03993afc7e3344ff2fd471a2c6cfde43d4
Reviewed-on: https://go-review.googlesource.com/114615
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Every time I poke at #14921, the g.waitreason string
pointer writes show up.
They're not particularly important performance-wise,
but it'd be nice to clear the noise away.
And it does open up a few extra bytes in the g struct
for some future use.
This is a re-roll of CL 99078, which was rolled
back because of failures on s390x.
Those failures were apparently due to an old version of gdb.
Change-Id: Icc2c12f449b2934063fd61e272e06237625ed589
Reviewed-on: https://go-review.googlesource.com/111256
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
CL 106735 changed to the new softfloat support on GOARM=5.
ARM assembly code that uses FP instructions not guarded on GOARM,
if any, will break. The easiest way to fix is probably to use Go
implementation on GOARM=5, like
MOVB runtime·goarm(SB), R11
CMP $5, R11
BEQ arm5
... FP instructions ...
RET
arm5:
CALL or JMP to Go implementation
Change-Id: I52fc76fac9c854ebe7c6c856c365fba35d3f560a
Reviewed-on: https://go-review.googlesource.com/107475
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
After CL 104636 cpu.X86.HasAVX is set early enough that it can be used
in runtime·memclrNoHeapPointers. Add an offset to use in assembly and
replace the only occurence of support_avx2.
Change-Id: Icada62efeb3e24d71251d55623a8a8602364c9a8
Reviewed-on: https://go-review.googlesource.com/106595
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
Currently, collecting a stack trace via runtime.Stack captures the stack for the
immediately running goroutines. This change extends those tracebacks to include
the tracebacks of their ancestors. This is done with a low memory cost and only
utilized when debug option tracebackancestors is set to a value greater than 0.
Resolves#22289
Change-Id: I7edacc62b2ee3bd278600c4a21052c351f313f3a
Reviewed-on: https://go-review.googlesource.com/70993
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
After CL 104636 cpu.X86.HasAVX is set early enough that it can be used
to determine useAVXmemmove. Use it and remove support_avx.
Change-Id: Ib7a627bede2bf96c92362507e742bd833cb42a74
Reviewed-on: https://go-review.googlesource.com/106235
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
After CL 104636 the feature flags in internal/cpu are initialized before
alginit and can now be used for aeshash feature detection. Also remove
now unused runtime variables:
x86:
support_ssse3
support_sse42
support_aes
arm64:
support_aes
Change-Id: I2f64198d91750eaf3c6cf2aac6e9e17615811ec8
Reviewed-on: https://go-review.googlesource.com/106015
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
The code reading these variables was removed in CL 41476. They are only
set but never read now, so remove them.
Change-Id: I6b0b8d813e9a3ec2a13586ff92746e00ad1b5bf0
Reviewed-on: https://go-review.googlesource.com/106095
Run-TryBot: Tobias Klauser <tobias.klauser@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>