Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.
Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Print the current SP and (old) stack bounds when the stack grows
too large. This helps to identify the problem: whether a large
stack is used, or something else goes wrong.
For #35470.
Change-Id: I34a4064d5c7280978391d835e171b90d06f87222
Reviewed-on: https://go-review.googlesource.com/c/go/+/207351
Reviewed-by: Emmanuel Odeke <emm.odeke@gmail.com>
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.
This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.
With these two functions, we also go and replace manual rounding code
with it where we can.
Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
This adds support for pausing a running G by sending a signal to its
M.
The main complication is that we want to target a G, but can only send
a signal to an M. Hence, the protocol we use is to simply mark the G
for preemption (which we already do) and send the M a "wake up and
look around" signal. The signal checks if it's running a G with a
preemption request and stops it if so in the same way that stack check
preemptions stop Gs. Since the preemption may fail (the G could be
moved or the signal could arrive at an unsafe point), we keep a count
of the number of received preemption signals. This lets stopG detect
if its request failed and should be retried without an explicit
channel back to suspendG.
For #10958, #24543.
Change-Id: I3e1538d5ea5200aeb434374abb5d5fdc56107e53
Reviewed-on: https://go-review.googlesource.com/c/go/+/201760
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When everything is working correctly, any pointer the garbage
collector encounters can only point into a fully initialized heap
span, since the span must have been initialized before that pointer
could escape the heap allocator and become visible to the GC.
However, in various cases, we try to be defensive against bad
pointers. In findObject, this is just a sanity check: we never expect
to find a bad pointer, but programming errors can lead to them. In
spanOfHeap, we don't necessarily trust the pointer and we're trying to
check if it really does point to the heap, though it should always
point to something. Conservative scanning takes this to a new level,
since it can only guess that a word may be a pointer and verify this.
In all of these cases, we have a problem that the span lookup and
check can race with span initialization, since the span becomes
visible to lookups before it's fully initialized.
Furthermore, we're about to start initializing the span without the
heap lock held, which is going to introduce races where accesses were
previously protected by the heap lock.
To address this, this CL makes accesses to mspan.state atomic, and
ensures that the span is fully initialized before setting the state to
mSpanInUse. All loads are now atomic, and in any case where we don't
trust the pointer, it first atomically loads the span state and checks
that it's mSpanInUse, after which it will have synchronized with span
initialization and can safely check the other span fields.
For #10958, #24543, but a good fix in general.
Change-Id: I518b7c63555b02064b98aa5f802c92b758fef853
Reviewed-on: https://go-review.googlesource.com/c/go/+/203286
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We check whether an M is preemptible in a surprising number of places.
Put it in one function.
For #10958, #24543.
Change-Id: I305090fdb1ea7f7a55ffe25851c1e35012d0d06c
Reviewed-on: https://go-review.googlesource.com/c/go/+/201439
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
We're about to introduce asynchronous safe points, where we won't have
precise pointer maps for all stack frames. That's okay for scanning
the stack (conservatively), but not for shrinking the stack.
Hence, this CL prepares for this by only shrinking the stack as part
of the stack scan if the goroutine is stopped at a synchronous safe
point. Otherwise, it queues up the stack shrink for the next
synchronous safe point.
We already have one condition under which we can't shrink the stack
for very similar reasons: syscalls. Currently, we just give up on
shrinking the stack if it's in a syscall. But with this mechanism, we
defer that stack shrink until the next synchronous safe point.
For #10958, #24543.
Change-Id: Ifa1dec6f33fdf30f9067be2ce3f7ab8a7f62ce38
Reviewed-on: https://go-review.googlesource.com/c/go/+/201438
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
When we copy a stack of a goroutine blocked in a channel operation, we
have to be very careful because other goroutines may be writing to
that goroutine's stack. To handle this, stack copying acquires the
locks for the channels a goroutine is waiting on.
One complication is that stack growth may happen while a goroutine
holds these locks, in which case stack copying must *not* acquire
these locks because that would self-deadlock.
Currently, stack growth never acquires these locks because stack
growth only happens when a goroutine is running, which means it's
either not blocking on a channel or it's holding the channel locks
already. Stack shrinking always acquires these locks because shrinking
happens asynchronously, so the goroutine is never running, so there
are either no locks or they've been released by the goroutine.
However, we're about to change when stack shrinking can happen, which
is going to break the current rules. Rather than find a new way to
derive whether to acquire these locks or not, this CL simply adds a
flag to the g struct that indicates that stack copying should acquire
channel locks. This flag is set while the goroutine is blocked on a
channel op.
For #10958, #24543.
Change-Id: Ia2ac8831b1bfda98d39bb30285e144c4f7eaf9ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/172982
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Currently, the process of suspending a goroutine is tied to stack
scanning. In preparation for non-cooperative preemption, this CL
abstracts this into general purpose suspendG/resumeG functions.
suspendG and resumeG closely follow the existing scang and restartg
functions with one exception: the addition of a _Gpreempted status.
Currently, preemption tasks (stack scanning) are carried out by the
target goroutine if it's in _Grunning. In this new approach, the task
is always carried out by the goroutine that called suspendG. Thus, we
need a reliable way to drive the target goroutine out of _Grunning
until the requesting goroutine is ready to resume it. The new
_Gpreempted state provides the handshake: when a runnable goroutine
responds to a preemption request, it now parks itself and enters
_Gpreempted. The requesting goroutine races to put it in _Gwaiting,
which gives it ownership, but also the responsibility to start it
again.
This CL adds several TODOs about improving the synchronization on the
G status. The existing code already has these problems; we're just
taking note of them.
The next CL will remove the now-dead scang and preemptscan.
For #10958, #24543.
Change-Id: I16dbf87bea9d50399cc86719c156f48e67198f16
Reviewed-on: https://go-review.googlesource.com/c/go/+/201137
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Currently, shrinkstack will not shrink a stack on Windows if
gp.m.libcallsp != 0. In general, we can't shrink stacks in syscalls
because the syscall may hold pointers into the stack, and in principle
this is supposed to be preventing that for libcall-based syscalls
(which are direct syscalls from the runtime). But this test is
actually broken and has been for a long time. That turns out to be
okay because it also appears it's not necessary.
This test is racy. g.m points to whatever M the G was last running on,
even if the G is in a blocked state, and that M could be doing
anything, including making libcalls. Hence, observing that libcallsp
== 0 at one moment in shrinkstack is no guarantee that it won't become
non-zero while we're shrinking the stack, and vice-versa.
It's also weird that this check is only performed on Windows, given
that we now use libcalls on macOS, Solaris, and AIX.
This check was added when stack shrinking was first implemented in CL
69580044. The history of that CL (though not the final version)
suggests this was necessary for libcalls that happened on Go user
stacks, which we never do now because of the limited stack space.
It could also be defending against user stack pointers passed to
libcall system calls from blocked Gs. But the runtime isn't allowed to
keep pointers into the user stack for blocked Gs on any OS, so it's
not clear this would be of any value.
Hence, this checks seems to be simply unnecessary.
Rather than simply remove it, this CL makes it defensive. We can't do
anything about blocked Gs, since it doesn't even make sense to look at
their M, but if a G tries to shrink its own stack while in a libcall,
that indicates a bug in the libcall code. This CL makes shrinkstack
panic in this case.
For #10958, #24543, since those are going to rearrange how we decide
that it's safe to shrink a stack.
Change-Id: Ia865e1f6340cff26637f8d513970f9ebb4735c6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/173724
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/202340
Reviewed-by: Austin Clements <austin@google.com>
Generate inline code at defer time to save the args of defer calls to unique
(autotmp) stack slots, and generate inline code at exit time to check which defer
calls were made and make the associated function/method/interface calls. We
remember that a particular defer statement was reached by storing in the deferBits
variable (always stored on the stack). At exit time, we check the bits of the
deferBits variable to determine which defer function calls to make (in reverse
order). These low-cost defers are only used for functions where no defers
appear in loops. In addition, we don't do these low-cost defers if there are too
many defer statements or too many exits in a function (to limit code increase).
When a function uses open-coded defers, we produce extra
FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and
for each defer, the stack slots where the closure and associated args have been
stored. The funcdata also includes the location of the deferBits variable.
Therefore, for panics, we can use this funcdata to determine exactly which defers
are active, and call the appropriate functions/methods/closures with the correct
arguments for each active defer.
In order to unwind the stack correctly after a recover(), we need to add an extra
code segment to functions with open-coded defers that simply calls deferreturn()
and returns. This segment is not reachable by the normal function, but is returned
to by the runtime during recovery. We set the liveness information of this
deferreturn() to be the same as the liveness at the first function call during the
last defer exit code (so all return values and all stack slots needed by the defer
calls will be live).
I needed to increase the stackguard constant from 880 to 896, because of a small
amount of new code in deferreturn().
The -N flag disables open-coded defers. '-d defer' prints out the kind of defer
being used at each defer statement (heap-allocated, stack-allocated, or
open-coded).
Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ]
With normal (stack-allocated) defers only: 35.4 ns/op
With open-coded defers: 5.6 ns/op
Cost of function call alone (remove defer keyword): 4.4 ns/op
Text size increase (including funcdata) for go cmd without/with open-coded defers: 0.09%
The average size increase (including funcdata) for only the functions that use
open-coded defers is 1.1%.
The cost of a panic followed by a recover got noticeably slower, since panic
processing now requires a scan of the stack for open-coded defer frames. This scan
is required, even if no frames are using open-coded defers:
Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ]
Without open-coded defers: 62.0 ns/op
With open-coded defers: 255 ns/op
A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers:
CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ]
Without open-coded defers: 443 ns/op
With open-coded defers: 347 ns/op
Updates #14939 (defer performance)
Updates #34481 (design doc)
Change-Id: I51a389860b9676cfa1b84722f5fb84d3c4ee9e28
Reviewed-on: https://go-review.googlesource.com/c/go/+/190098
Reviewed-by: Austin Clements <austin@google.com>
This reverts CL 180761
Reason for revert: Reinstate the stack-allocated defer CL.
There was nothing wrong with the CL proper, but stack allocation of defers exposed two other issues.
Issue #32477: Fix has been submitted as CL 181258.
Issue #32498: Possible fix is CL 181377 (not submitted yet).
Change-Id: I32b3365d5026600069291b068bbba6cb15295eb3
Reviewed-on: https://go-review.googlesource.com/c/go/+/181378
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
This marks all Go symbols called from assembly in other packages with
"go:linkname" directives to ensure they get ABI wrappers.
Now that we have this go:linkname convention, this also removes the
abi0Syms definition in the runtime, which was used to give morestackc
an ABI0 wrapper. Instead, we now just mark morestackc with a
go:linkname directive.
This was tested with buildall.bash in the default configuration, with
-race, and with -gcflags=all=-d=ssa/intrinsics/off. Since I couldn't
test cgo on non-Linux configurations, I manually grepped for runtime
symbols in runtime/cgo.
Updates #31230.
Change-Id: I6c8aa56be2ca6802dfa2bf159e49c411b9071bf1
Reviewed-on: https://go-review.googlesource.com/c/go/+/179862
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This reverts commit fff4f599fe.
Reason for revert: Seems to still have issues around GC.
Fixes#32452
Change-Id: Ibe7af629f9ad6a3d5312acd7b066123f484da7f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/180761
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
When a defer is executed at most once in a function body,
we can allocate the defer record for it on the stack instead
of on the heap.
This should make defers like this (which are very common) faster.
This optimization applies to 363 out of the 370 static defer sites
in the cmd/go binary.
name old time/op new time/op delta
Defer-4 52.2ns ± 5% 36.2ns ± 3% -30.70% (p=0.000 n=10+10)
Fixes#6980
Update #14939
Change-Id: I697109dd7aeef9e97a9eeba2ef65ff53d3ee1004
Reviewed-on: https://go-review.googlesource.com/c/go/+/171758
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently, shrinkstack will free the stack if the goroutine is dead.
There are only two places that call shrinkstack: scanstack, which will
never call it if the goroutine is dead; and markrootFreeGStacks, which
only calls it on dead goroutines.
Clean this up by separating stack freeing out of shrinkstack.
Change-Id: I7d7891e620550c32a2220833923a025704986681
Reviewed-on: https://go-review.googlesource.com/c/go/+/170890
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
When we set an explicit argmap, we may want only a prefix of that
argmap. Argmap is set when the function is reflect.makeFuncStub or
reflect.methodValueCall. In this case, arglen specifies how much of
the args section is actually live. (It could be either all the args +
results, or just the args.)
Fixes#28750
Change-Id: Idf060607f15a298ac591016994e58e22f7f92d83
Reviewed-on: https://go-review.googlesource.com/c/149217
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Rework how the compiler+runtime handles stack-allocated variables
whose address is taken.
Direct references to such variables work as before. References through
pointers, however, use a new mechanism. The new mechanism is more
precise than the old "ambiguously live" mechanism. It computes liveness
at runtime based on the actual references among objects on the stack.
Each function records all of its address-taken objects in a FUNCDATA.
These are called "stack objects". The runtime then uses that
information while scanning a stack to find all of the stack objects on
a stack. It then does a mark phase on the stack objects, using all the
pointers found on the stack (and ancillary structures, like defer
records) as the root set. Only stack objects which are found to be
live during this mark phase will be scanned and thus retain any heap
objects they point to.
A subsequent CL will remove all the "ambiguously live" logic from
the compiler, so that the stack object tracing will be required.
For this CL, the stack tracing is all redundant with the current
ambiguously live logic.
Update #22350
Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
Reviewed-on: https://go-review.googlesource.com/c/134155
Reviewed-by: Austin Clements <austin@google.com>
Now that we do no mark work during mark termination, we no longer need
the gchelper mechanism.
Updates #26903.
Updates #17503.
Change-Id: Ie94e5c0f918cfa047e88cae1028fece106955c1b
Reviewed-on: https://go-review.googlesource.com/c/134785
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that there is no mark 2 phase, gcBlackenPromptly is no longer
used.
Updates #26903. This is a follow-up to eliminating mark 2.
Change-Id: Ib9c534f21b36b8416fcf3cab667f186167b827f8
Reviewed-on: https://go-review.googlesource.com/c/134319
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
We already aliased mSpanInUse to _MSpanInUse. The dual constants are
getting annoying, so fix all of these to use the mSpan* naming
convention.
This was done automatically with:
sed -i -re 's/_?MSpan(Dead|InUse|Manual|Free)/mSpan\1/g' *.go
plus deleting the existing definition of mSpanInUse.
Change-Id: I09979d9d491d06c10689cea625dc57faa9cc6767
Reviewed-on: https://go-review.googlesource.com/137875
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
OpenBSD 6.4 is going to start requiring that the SP points to memory
that was mapped with MAP_STACK on system call entry, traps, and when
switching to the alternate signal stack [1]. Currently, Go doesn't map
any memory MAP_STACK, so the kernel quickly kills Go processes.
Fix this by remapping the memory that backs stack spans with
MAP_STACK, and re-remapping it without MAP_STACK when it's returned to
the heap.
[1] http://openbsd-archive.7691.n7.nabble.com/stack-register-checking-td338238.htmlFixes#26142.
Change-Id: I656eb84385a22833445d49328bb304f8cdd0e225
Reviewed-on: https://go-review.googlesource.com/121657
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Also:
- Add extra SystemStack space for darwin/arm64 just
like for darwin/arm.
- Removed redundant stack alignment; the arm64 hardware enforces
the 16 byte alignment.
- Save and restore the g registers at library initialization.
- Zero g registers since libpreinit can call libc functions
that in turn use asmcgocall. asmcgocall requires an initialized g.
- Change asmcgocall to work even if no g is set. The change mimics
amd64.
Change-Id: I1b8c63b07cfec23b909c0d215b50dc229f8adbc8
Reviewed-on: https://go-review.googlesource.com/117176
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
This adds a mechanism for debuggers to safely inject calls to Go
functions on amd64. Debuggers must participate in a protocol with the
runtime, and need to know how to lay out a call frame, but the runtime
support takes care of the details of handling live pointers in
registers, stack growth, and detecting the trickier conditions when it
is unsafe to inject a user function call.
Fixes#21678.
Updates derekparker/delve#119.
Change-Id: I56d8ca67700f1f77e19d89e7fc92ab337b228834
Reviewed-on: https://go-review.googlesource.com/109699
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Currently we have two nearly identical copies of the code that fetches
the locals and arguments liveness maps for a frame, plus a third
that's a poor knock-off. Unify these all into a single function.
Change-Id: Ibce7926a0b0e3d23182112da4e25df899579a585
Reviewed-on: https://go-review.googlesource.com/109698
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
There are several things combined in this change.
First, eliminate the gobitvector type in favor
of adding a ptrbit method to bitvector.
In non-performance-critical code, use that method.
In performance critical code, though, load the bitvector data
one byte at a time and iterate only over set bits.
To support that, add and use sys.Ctz8.
name old time/op new time/op delta
StackCopyPtr-8 81.8ms ± 5% 78.9ms ± 3% -3.58% (p=0.000 n=97+96)
StackCopy-8 65.9ms ± 3% 62.8ms ± 3% -4.67% (p=0.000 n=96+92)
StackCopyNoCache-8 105ms ± 3% 102ms ± 3% -3.38% (p=0.000 n=96+95)
Change-Id: I00b80f45612708bd440b1a411a57fa6dfa24aa74
Reviewed-on: https://go-review.googlesource.com/109716
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Currently, when the runtime looks up the stack map for a frame, it
uses frame.continpc - 1 unless continpc is the function entry PC, in
which case it uses frame.continpc. As a result, if continpc is the
function entry point (which happens for deferred frames), it will
actually look up the stack map *following* the first instruction.
I think, though I am not positive, that this is always okay today
because the first instruction of a function can never change the stack
map. It's usually not a CALL, so it doesn't have PCDATA. Or, if it is
a CALL, it has to have the entry stack map.
But we're about to start emitting stack maps at every instruction that
changes them, which means the first instruction can have PCDATA
(notably, in leaf functions that don't have a prologue).
To prepare for this, tweak how the runtime looks up stack map indexes
so that if continpc is the function entry point, it directly uses the
entry stack map.
For #24543.
Change-Id: I85aa818041cd26aff416f7b1fba186e9c8ca6568
Reviewed-on: https://go-review.googlesource.com/109349
Reviewed-by: Rick Hudson <rlh@golang.org>
adjustpointers loops over a bitmap.
If the length of that bitmap is zero,
we can skip making the call entirely.
This speeds up stack copying when there are
no pointers present in either args or locals.
name old time/op new time/op delta
StackCopyPtr-8 101ms ± 4% 90ms ± 4% -10.95% (p=0.000 n=87+93)
StackCopy-8 80.1ms ± 4% 72.6ms ± 4% -9.41% (p=0.000 n=98+100)
StackCopyNoCache-8 121ms ± 3% 113ms ± 3% -6.57% (p=0.000 n=98+97)
Change-Id: I7a272e19bc9a14fa3e3318771ebd082dc6247d25
Reviewed-on: https://go-review.googlesource.com/104737
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
When there are plugins, there may not be a unique copy of runtime
functions like goexit, mcall, etc. So identifying them by entry
address is problematic. Instead, keep track of each special function
using a field in the symbol table. That way, multiple copies of
the same runtime function will be treated identically.
Fixes#24351Fixes#23133
Change-Id: Iea3232df8a6af68509769d9ca618f530cc0f84fd
Reviewed-on: https://go-review.googlesource.com/100739
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Currently, throw may grow the stack, which means whenever we call it
from a context where it's not safe to grow the stack, we first have to
switch to the system stack. This is pretty easy to get wrong.
Fix this by making throw switch to the system stack so it doesn't grow
the stack and is hence safe to call without a system stack switch at
the call site.
The only thing this complicates is badsystemstack itself, which would
now go into an infinite loop before printing anything (previously it
would also go into an infinite loop, but would at least print the
error first). Fix this by making badsystemstack do a direct write and
then crash hard.
Change-Id: Ic5b4a610df265e47962dcfa341cabac03c31c049
Reviewed-on: https://go-review.googlesource.com/93659
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Now that we support the full non-contiguous virtual address space of
amd64 hardware, some of the comments and constants related to this are
out of date.
This renames memLimitBits to heapAddrBits because 1<<memLimitBits is
no longer the limit of the address space and rewrites the comment to
focus first on hardware limits (which span OSes) and then discuss
kernel limits.
Second, this eliminates the memLimit constant because there's no
longer a meaningful "highest possible heap pointer value" on amd64.
Updates #23862.
Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11
Reviewed-on: https://go-review.googlesource.com/95498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This replaces the contiguous heap arena mapping with a potentially
sparse mapping that can support heap mappings anywhere in the address
space.
This has several advantages over the current approach:
* There is no longer any limit on the size of the Go heap. (Currently
it's limited to 512GB.) Hence, this fixes#10460.
* It eliminates many failures modes of heap initialization and
growing. In particular it eliminates any possibility of panicking
with an address space conflict. This can happen for many reasons and
even causes a low but steady rate of TSAN test failures because of
conflicts with the TSAN runtime. See #16936 and #11993.
* It eliminates the notion of "non-reserved" heap, which was added
because creating huge address space reservations (particularly on
64-bit) led to huge process VSIZE. This was at best confusing and at
worst conflicted badly with ulimit -v. However, the non-reserved
heap logic is complicated, can race with other mappings in non-pure
Go binaries (e.g., #18976), and requires that the entire heap be
either reserved or non-reserved. We currently maintain the latter
property, but it's quite difficult to convince yourself of that, and
hence difficult to keep correct. This logic is still present, but
will be removed in the next CL.
* It fixes problems on 32-bit where skipping over parts of the address
space leads to mapping huge (and never-to-be-used) metadata
structures. See #19831.
This also completely rewrites and significantly simplifies
mheap.sysAlloc, which has been a source of many bugs. E.g., #21044,
#20259, #18651, and #13143 (and maybe #23222).
This change also makes it possible to allocate individual objects
larger than 512GB. As a result, a few tests that expected huge
allocations to fail needed to be changed to make even larger
allocations. However, at the moment attempting to allocate a humongous
object may cause the program to freeze for several minutes on Linux as
we fall back to probing every page with addrspace_free. That logic
(and this failure mode) will be removed in the next CL.
Fixes#10460.
Fixes#22204 (since it rewrites the code involved).
This slightly slows down compilebench and the x/benchmarks garbage
benchmark.
name old time/op new time/op delta
Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9)
Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10)
GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9)
Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10)
SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9)
Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9)
GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10)
Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9)
Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10)
XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10)
(https://perf.golang.org/search?q=upload:20171231.4)
name old time/op new time/op delta
Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19)
(https://perf.golang.org/search?q=upload:20171231.3)
Relative to the start of the sparse heap changes (starting at and
including "runtime: fix various contiguous bitmap assumptions"),
overall slowdown is roughly 1% on GC-intensive benchmarks:
name old time/op new time/op delta
Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9)
Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10)
GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9)
Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10)
SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9)
Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9)
GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10)
Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9)
Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10)
XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10)
[Geo mean] 369ms 373ms +1.17%
(https://perf.golang.org/search?q=upload:20180101.2)
name old time/op new time/op delta
Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19)
(https://perf.golang.org/search?q=upload:20180101.3)
Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0
Reviewed-on: https://go-review.googlesource.com/85887
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
I think we'd forgotten about the mheap.lookup APIs when we introduced
spanOf*, but, at any rate, the spanOf* functions are used far more
widely at this point, so this CL eliminates the mheap.lookup*
functions in favor of spanOf*.
Change-Id: I15facd0856e238bb75d990e838a092b5bef5bdfc
Reviewed-on: https://go-review.googlesource.com/85879
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This attempts to symbolize the PC of morestack's caller when there's a
stack split at a bad time. The stack trace starts at the *caller* of
the function that attempted to grow the stack, so this is useful if it
isn't obvious what's being called at that point, such as in #21431.
Change-Id: I5dee305d87c8069611de2d14e7a3083d76264f8f
Reviewed-on: https://go-review.googlesource.com/84115
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
newstack manually prints the stack trace if we try to grow the stack
when throwsplit is set. However, the default behavior is to omit
runtime frames. Since runtime frames can be critical to understanding
this crash, this change fixes this traceback to include them.
Updates #21431.
Change-Id: I5aa43f43aa2f10a8de7d67bcec743427be3a3b5d
Reviewed-on: https://go-review.googlesource.com/79518
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently, newstack and gogo have write barriers for maintaining the
context register saved in g.sched.ctxt. This is troublesome, because
newstack can be called from go:nowritebarrierrec places that can't
allow write barriers. It happens to be benign because g.sched.ctxt
will always be nil on entry to newstack *and* it so happens the
incoming ctxt will also always be nil in these contexts (I
think/hope), but this is playing with fire. It's also desirable to
mark newstack go:nowritebarrierrec to prevent any other, non-benign
write barriers from creeping in, but we can't do that right now
because of this one write barrier.
Fix all of this by observing that g.sched.ctxt is really just a saved
live pointer register. Hence, we can shade it when we scan g's stack
and otherwise move it back and forth between the actual context
register and g.sched.ctxt without write barriers. This means we can
save it in morestack along with all of the other g.sched, eliminate
the save from newstack along with its troublesome write barrier, and
eliminate the shenanigans in gogo to invoke the write barrier when
restoring it.
Once we've done all of this, we can mark newstack
go:nowritebarrierrec.
Fixes#22385.
For #22460.
Change-Id: I43c24958e3f6785b53c1350e1e83c2844e0d1522
Reviewed-on: https://go-review.googlesource.com/72553
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Found with mvdan.cc/unindent. Prioritized the ones with the biggest wins
for now.
Change-Id: I2b032e45cdd559fc9ed5b1ee4c4de42c4c92e07b
Reviewed-on: https://go-review.googlesource.com/56470
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Writing to selectdone on the stack of another goroutine meant a
pretty subtle dance between the select code and the stack copying
code. Instead move the selectdone variable into the g struct.
Change-Id: Id246aaf18077c625adef7ca2d62794afef1bdd1b
Reviewed-on: https://go-review.googlesource.com/53390
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
There are two copies each of the stackPreempt/_StackPreempt and
stackFork/_StackFork constants. Remove the ones left over from C that
are no longer used.
Change-Id: I849604c72c11e4a0cb08e45e9817eb3f5a6ce8ba
Reviewed-on: https://go-review.googlesource.com/43638
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Setting stackCache to 0 to disable stack caches for debugging hasn't
worked for a long time. It causes stackalloc to fall back to full span
allocation, round sub-page stacks down to 0 pages, and blow up.
Fix this debug mode so it disables the per-P caches, but continues to
use the global stack pools for small stacks, which correctly handle
sub-page stacks. While we're here, rename stackCache to stackNoCache
so it acts like the rest of the stack allocator debug modes where "0"
is the right default value.
Fixes#17291.
Change-Id: If401c41cee3448513cbd7bb2e9334a8efab257a7
Reviewed-on: https://go-review.googlesource.com/43637
Reviewed-by: Keith Randall <khr@golang.org>
The stackFromSystem debug mode has two problems:
1) It rounds the stack allocation to _PageSize. If the physical page
size is >8K, this can cause unmapping the memory later to either
under-unmap or over-unmap.
2) It doesn't return the rounded-up allocation size to its caller, so
when we later unmap the memory, we may pass the wrong length.
Fix these problems by rounding the size up to the physical page size
and putting that rounded-up size in the returned stack bounds.
Fixes#17289.
Change-Id: I6b854af3b06bb16e3750798397bb5e2a722ec1cb
Reviewed-on: https://go-review.googlesource.com/43636
Reviewed-by: Keith Randall <khr@golang.org>