This reverts commit 719dfcf8a8.
Reason for revert: Causing crashes.
Change-Id: I0b8526dd03d82fa074ce4f97f1789eeac702b3eb
Reviewed-on: https://go-review.googlesource.com/c/go/+/709755
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Instead of storing LR (the return address) at 0(SP) and the FP
(parent's frame pointer) at -8(SP), store them at framesize-8(SP)
and framesize-16(SP), respectively.
We push and pop data onto the stack such that we're never accessing
anything below SP.
The prolog/epilog lengths are unchanged (3 insns for a typical prolog,
2 for a typical epilog).
We use 8 bytes more per frame.
Typical prologue:
STP.W (FP, LR), -16(SP)
MOVD SP, FP
SUB $C, SP
Typical epilogue:
ADD $C, SP
LDP.P 16(SP), (FP, LR)
RET
The previous word where we stored LR, at 0(SP), is now unused.
We could repurpose that slot for storing a local variable.
The new prolog and epilog instructions are recognized by libunwind,
so pc-sampling tools like perf should now be accurate. (TODO: except
maybe after the first RET instruction? Have to look into that.)
Update #73753 (fixes, for arm64)
Update #57302 (Quim thinks this will help on that issue)
Change-Id: I4800036a9a9a08aaaf35d9f99de79a36cf37ebb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/674615
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Currently isShrinkStackSafe returns false if a goroutine is put into
_Gwaiting while it actually goes and executes on the system stack.
For a long time, we needed to be robust to the goroutine's stack
shrinking while we're executing on the system stack.
Unfortunately, this has become harder and harder to do over time. First,
the execution tracer might be invoked in these contexts and it may wish
to take a stack trace. We cannot take the stack trace if the garbage
collector might concurrently shrink the stack of the user goroutine we
want to trace. So, isShrinkStackSafe grew the condition that we wouldn't
try to shrink the stack in these cases if execution tracing was enabled.
Today, runtime.mutex may wish to take a stack trace for the mutex
profile, and it can happen in a very similar context. Taking the stack
trace is no longer safe.
This change takes the stance that we stop trying to make this work at
all, and instead guarantee that the stack won't move while we're in
these sensitive contexts.
Change-Id: Ibfad2d7a335ee97cecaa48001df0db9812deeab1
Reviewed-on: https://go-review.googlesource.com/c/go/+/692716
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
If a goroutine is synchronously preempted, then taking a
frame-pointer-based stack trace at that preemption will skip PC of the
caller of the function which called into morestack. This happens because
the frame pointer is pushed to the stack after the preamble, leaving the
stack in an odd state for frame pointer unwinding.
Deal with this by marking a goroutine as synchronously preempted and
using that signal to load the missing PC from the stack. On LR platforms
this is available in gp.sched.lr. On non-LR platforms like x86, it's at
gp.sched.sp, because there are no args, no locals, and no frame pointer
pushed to the SP yet.
For #68090.
Change-Id: I73a1206d8b84eecb8a96dbe727195da30088f288
Reviewed-on: https://go-review.googlesource.com/c/go/+/684435
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Nick Ripley <nick.ripley@datadoghq.com>
Almost everywhere we stop the world we casGToWaitingForGC to prevent
mutual deadlock with the GC trying to scan our stack. This historically
was only necessary if we weren't stopping the world to change the GC
phase, because what we were worried about was mutual deadlock with mark
workers' use of suspendG. And, they were the only users of suspendG.
In Go 1.22 this changed. The execution tracer began using suspendG, too.
This leads to the possibility of mutual deadlock between the execution
tracer and a goroutine trying to start or end the GC mark phase. The fix
is simple: make the stop-the-world calls for the GC also call
casGToWaitingForGC. This way, suspendG is guaranteed to make progress in
this circumstance, and once it completes, the stop-the-world can
complete as well.
We can take this a step further, though, and move casGToWaitingForGC
into stopTheWorldWithSema, since there's no longer really a place we can
afford to skip this detail.
While we're here, rename casGToWaitingForGC to casGToWaitingForSuspendG,
since the GC is now not the only potential source of mutual deadlock.
Fixes#72740.
Change-Id: I5e3739a463ef3e8173ad33c531e696e46260692f
Reviewed-on: https://go-review.googlesource.com/c/go/+/681501
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add build tag gated Valgrind annotations to the runtime which let it
understand how the runtime manages memory. This allows for Go binaries
to be run under Valgrind without emitting spurious errors.
Instead of adding the Valgrind headers to the tree, and using cgo to
call the various Valgrind client request macros, we just add an assembly
function which emits the necessary instructions to trigger client
requests.
In particular we add instrumentation of the memory allocator, using a
two-level mempool structure (as described in the Valgrind manual [0]).
We also add annotations which allow Valgrind to track which memory we
use for stacks, which seems necessary to let it properly function.
We describe the memory model to Valgrind as follows: we treat heap
arenas as a "pool" created with VALGRIND_CREATE_MEMPOOL_EXT (so that we
can use VALGRIND_MEMPOOL_METAPOOL and VALGRIND_MEMPOOL_AUTO_FREE).
Within the pool we treat spans as "superblocks", annotated with
VALGRIND_MEMPOOL_ALLOC. We then allocate individual objects within spans
with VALGRIND_MALLOCLIKE_BLOCK.
It should be noted that running binaries under Valgrind can be _quite
slow_, and certain operations, such as running the GC, can be _very
slow_. It is recommended to run programs with GOGC=off. Additionally,
async preemption should be turned off, since it'll cause strange
behavior (GODEBUG=asyncpreemptoff=1).
Running Valgrind with --leak-check=yes will result in some errors
resulting from some things not being marked fully free'd. These likely
need more annotations to rectify, but for now it is recommended to run
with --leak-check=off.
Updates #73602
[0] https://valgrind.org/docs/manual/mc-manual.html#mc-manual.mempools
Change-Id: I71b26c47d7084de71ef1e03947ef6b1cc6d38301
Reviewed-on: https://go-review.googlesource.com/c/go/+/674077
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
We currently make some parts of the preamble unpreemptible because
it confuses morestack. See comments in the code.
Instead, have morestack handle those weird cases so we can
remove unpreemptible marks from most places.
This CL makes user functions preemptible everywhere if they have no
write barriers (at least, on x86). In cmd/go the fraction of functions
that need preemptible markings drops from 82% to 36%. Makes the cmd/go
binary 0.3% smaller.
Update #35470
Change-Id: Ic83d5eabfd0f6d239a92e65684bcce7e67ff30bb
Reviewed-on: https://go-review.googlesource.com/c/go/+/648518
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
We will want to reference these definitions from new generator programs,
and this is a good opportunity to cleanup all these old C-style names.
Change-Id: Ifb06f0afc381e2697e7877f038eca786610c96de
Reviewed-on: https://go-review.googlesource.com/c/go/+/655275
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This is a two-pronged approach. First, try to keep large objects
off the stack frame. Second, if they do manage to appear anyway,
use straight bitmasks instead of gc programs.
Generally probably a good idea to keep large objects out of stack frames.
But particularly keeping gc programs off the stack simplifies
runtime code a bit.
This CL sets the limit of most stack objects to 131072 bytes (on 64-bit archs).
There can still be large objects if allocated by a late pass, like order, or
they are required to be on the stack, like function arguments.
But the size for the bitmasks for these objects isn't a huge deal,
as we have already have (probably several) bitmasks for the frame
liveness map itself.
Change-Id: I6d2bed0e9aa9ac7499955562c6154f9264061359
Reviewed-on: https://go-review.googlesource.com/c/go/+/542815
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
The failures in #70288 are consistent with and strongly imply
stack corruption during fault handling, and debug prints show
that the Go code run during fault handling is running about
300 bytes above the bottom of the goroutine stack.
That should be okay, but that implies the DLL code that called
Go's handler was running near the bottom of the stack too,
and maybe it called other deeper things before or after the
Go handler and smashed the stack that way.
stackSystem is already 4096 bytes on amd64;
making it match that on 386 makes the flaky failures go away.
It's a little unsatisfying not to be able to say exactly what is
overflowing the stack, but the circumstantial evidence is
very strong that it's Windows.
Fixes#70288.
Change-Id: Ife89385873d5e5062a71629dbfee40825edefa49
Reviewed-on: https://go-review.googlesource.com/c/go/+/627375
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This change allows the tracer to be reentrant by restructuring the
internals such that writing an event is atomic with respect to stack
growth. Essentially, a core set of functions that are involved in
acquiring a trace buffer and writing to it are all marked nosplit.
Stack growth is currently the only hidden place where the tracer may be
accidentally reentrant, preventing the tracer from being used
everywhere. It already lacks write barriers, lacks allocations, and is
non-preemptible. This change thus makes the tracer fully reentrant,
since the only reentry case it needs to handle is stack growth.
Since the invariants needed to attain this are subtle, this change also
extends the debugTraceReentrancy debug mode to check these invariants as
well. Specifically, the invariants are checked by setting the throwsplit
flag.
A side benefit of this change is it simplifies the trace event writing
API a good bit: there's no need to actually thread the event writer
through things, and most callsites look a bit simpler.
Change-Id: I7c329fb7a6cb936bd363c44cf882ea0a925132f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/587599
Reviewed-by: Austin Clements <austin@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Cleanup and friction reduction
For #65355.
Change-Id: Ia14c9dc584a529a35b97801dd3e95b9acc99a511
Reviewed-on: https://go-review.googlesource.com/c/go/+/600436
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Some of the new experimental events added have a problem in that they
might be emitted during stack growth. This is, to my knowledge, the only
restriction on the tracer, because the tracer otherwise prevents
preemption, avoids allocation, and avoids write barriers. However, the
stack can grow from within the tracer. This leads to
tracing-during-tracing which can result in lost buffers and broken event
streams. (There's a debug mode to get a nice error message, but it's
disabled by default.)
This change resolves the problem by skipping writing out these new
events. This results in the new events sometimes being broken (alloc
without a free, free without an alloc) but for now that's OK. Before the
freeze begins we just want to fix broken tests; tools interpreting these
events will be totally in-house to begin with, and if they have to be a
little bit smarter about missing information, that's OK. In the future
we'll have a more robust fix for this, but it appears that it's going to
require making the tracer fully reentrant. (This is not too hard; either
we force flushing all buffers when going reentrant (which is actually
somewhat subtle with respect to event ordering) or we isolate down just
the actual event writing to be atomic with respect to stack growth. Both
are just bigger changes on shared codepaths that are scary to land this
late in the release cycle.)
Fixes#67379.
Change-Id: I46bb7e470e61c64ff54ac5aec5554b828c1ca4be
Reviewed-on: https://go-review.googlesource.com/c/go/+/587597
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This change adds expensive alloc/free events to traces, guarded by a
GODEBUG that can be set at run time by mutating the GODEBUG environment
variable. This supersedes the alloc/free trace deleted in a previous CL.
There are two parts to this CL.
The first part is adding a mechanism for exposing experimental events
through the tracer and trace parser. This boils down to a new
ExperimentalEvent event type in the parser API which simply reveals the
raw event data for the event. Each experimental event can also be
associated with "experimental data" which is associated with a
particular generation. This experimental data is just exposed as a bag
of bytes that supplements the experimental events.
In the runtime, this CL organizes experimental events by experiment.
An experiment is defined by a set of experimental events and a single
special batch type. Batches of this special type are exposed through the
parser's API as the aforementioned "experimental data".
The second part of this CL is defining the AllocFree experiment, which
defines 9 new experimental events covering heap object alloc/frees, span
alloc/frees, and goroutine stack alloc/frees. It also generates special
batches that contain a type table: a mapping of IDs to type information.
Change-Id: I965c00e3dcfdf5570f365ff89d0f70d8aeca219c
Reviewed-on: https://go-review.googlesource.com/c/go/+/583377
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Currently, the execution tracer may attempt to take a stack trace of a
goroutine whose stack it does not own. For example, if the goroutine is
in _Grunnable or _Gwaiting. This is easily fixed in all cases by simply
moving the emission of GoStop and GoBlock events to before the
casgstatus happens. The goroutine status is what is used to signal stack
ownership, and the GC may shrink a goroutine's stack if it can acquire
the scan bit.
Although this is easily fixed, the interaction here is very subtle,
because stack ownership is only implicit in the goroutine's scan status.
To make this invariant more maintainable and less error-prone in the
future, this change adds a GODEBUG setting that checks, at the point of
taking a stack trace, whether the caller owns the goroutine. This check
is not quite perfect because there's no way for the stack tracing code
to know that the _Gscan bit was acquired by the caller, so for
simplicity it assumes that it was the caller that acquired the scan bit.
In all other cases however, we can check for ownership precisely. At the
very least, this check is sufficient to catch the issue this change is
fixing.
To make sure this debug check doesn't bitrot, it's always enabled during
trace testing. This new mode has actually caught a few other issues
already, so this change fixes them.
One issue that this debug mode caught was that it's not safe to take a
stack trace of a _Gwaiting goroutine that's being unparked.
Another much bigger issue this debug mode caught was the fact that the
execution tracer could try to take a stack trace of a G that was in
_Gwaiting solely to avoid a deadlock in the GC. The execution tracer
already has a partial list of these cases since they're modeled as the
goroutine just executing as normal in the tracer, but this change takes
the list and makes it more formal. In this specific case, we now prevent
the GC from shrinking the stacks of goroutines in this state if tracing
is enabled. The stack traces from these scenarios are too useful to
discard, but there is indeed a race here between the tracer and any
attempt to shrink the stack by the GC.
Change-Id: I019850dabc8cede202fd6dcc0a4b1f16764209fb
Cq-Include-Trybots: luci.golang.try:gotip-linux-amd64-longtest,gotip-linux-amd64-longtest-race
Reviewed-on: https://go-review.googlesource.com/c/go/+/573155
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Now that pcvalue keeps its cache on the M, we can drop all of the
stack-allocated pcvalueCaches and stop carefully passing them around
between lots of operations. This significantly simplifies a fair
amount of code and makes several structures smaller.
This series of changes has no statistically significant effect on any
runtime Stack benchmarks.
I also experimented with making the cache larger, now that the impact
is limited to the M struct, but wasn't able to measure any
improvements.
This is a re-roll of CL 515277
Change-Id: Ia27529302f81c1c92fb9c3a7474739eca80bfca1
Reviewed-on: https://go-review.googlesource.com/c/go/+/520064
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Now that pcvalue keeps its cache on the M, we can drop all of the
stack-allocated pcvalueCaches and stop carefully passing them around
between lots of operations. This significantly simplifies a fair
amount of code and makes several structures smaller.
This series of changes has no statistically significant effect on any
runtime Stack benchmarks.
I also experimented with making the cache larger, now that the impact
is limited to the M struct, but wasn't able to measure any
improvements.
Change-Id: I4719ebf347c7150a05e887e75a238e23647c20cd
Reviewed-on: https://go-review.googlesource.com/c/go/+/515277
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
This CL refactors gopanic, Goexit, and deferreturn to share a common
state machine for processing pending defers. The new state machine
removes a lot of redundant code and does overall less work.
It should also make it easier to implement further optimizations
(e.g., TODOs added in this CL).
Change-Id: I71d3cc8878a6f951d8633505424a191536c8e6b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/513837
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Remove logic for skipping some adjustframe logic for systemstack (aka
FuncID_systemstack_switch). This was introduced in 2014 by
9198ed4bd6 but doesn't seem to be needed
anymore.
Updates #59692
Change-Id: I2368d64f9bb28ced4e7f15c9b15dac7a29194389
Reviewed-on: https://go-review.googlesource.com/c/go/+/489116
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Add TestSystemstackFramePointerAdjust as a regression test for CL
489015.
By turning stackPoisonCopy into a var instead of const and introducing
the ShrinkStackAndVerifyFramePointers() helper function, we are able to
trigger the exact combination of events that can crash traceEvent() if
systemstack restores a frame pointer that is pointing into the old
stack.
Updates #59692
Change-Id: I60fc6940638077e3b60a81d923b5f5b4f6d8a44c
Reviewed-on: https://go-review.googlesource.com/c/go/+/489115
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Change adjustframe to adjust the frame pointer of systemstack (aka
FuncID_systemstack_switch) before returning early.
Without this fix it is possible for traceEvent() to crash when using
frame pointer unwinding. The issue occurs when a goroutine calls
systemstack in order to call shrinkstack. While returning, systemstack
will restore the unadjusted frame pointer from its frame as part of its
epilogue. If the callee of systemstack then triggers a traceEvent, it
will try to unwind into the old stack. This can lead to a crash if the
memory of the old stack has been reused or freed in the meantime.
The most common situation in which this will manifest is when when
gcAssistAlloc() invokes gcAssistAlloc1() on systemstack() and performs a
shrinkstack() followed by a traceGCMarkAssistDone() or Gosched()
triggering traceEvent().
See CL 489115 for a deterministic test case that triggers the issue.
Meanwhile the problem can frequently be observed using the command
below:
$ GODEBUG=tracefpunwindoff=0 ../bin/go test -trace /dev/null -run TestDeferHeapAndStack ./runtime
SIGSEGV: segmentation violation
PC=0x45f977 m=14 sigcode=128
goroutine 0 [idle]:
runtime.fpTracebackPCs(...)
.../go/src/runtime/trace.go:945
runtime.traceStackID(0xcdab904677a?, {0x7f1584346018, 0x0?, 0x80}, 0x0?)
.../go/src/runtime/trace.go:917 +0x217 fp=0x7f1565ffab00 sp=0x7f1565ffaab8 pc=0x45f977
runtime.traceEventLocked(0x0?, 0x0?, 0x0?, 0xc00003dbd0, 0x12, 0x0, 0x1, {0x0, 0x0, 0x0})
.../go/src/runtime/trace.go:760 +0x285 fp=0x7f1565ffab78 sp=0x7f1565ffab00 pc=0x45ef45
runtime.traceEvent(0xf5?, 0x1, {0x0, 0x0, 0x0})
.../go/src/runtime/trace.go:692 +0xa9 fp=0x7f1565ffabe0 sp=0x7f1565ffab78 pc=0x45ec49
runtime.traceGoPreempt(...)
.../go/src/runtime/trace.go:1535
runtime.gopreempt_m(0xc000328340?)
.../go/src/runtime/proc.go:3551 +0x45 fp=0x7f1565ffac20 sp=0x7f1565ffabe0 pc=0x4449a5
runtime.newstack()
.../go/src/runtime/stack.go:1077 +0x3cb fp=0x7f1565ffadd0 sp=0x7f1565ffac20 pc=0x455feb
runtime.morestack()
.../go/src/runtime/asm_amd64.s:593 +0x8f fp=0x7f1565ffadd8 sp=0x7f1565ffadd0 pc=0x47644f
goroutine 19 [running]:
runtime.traceEvent(0x2c?, 0xffffffffffffffff, {0x0, 0x0, 0x0})
.../go/src/runtime/trace.go:669 +0xe8 fp=0xc0006e6c28 sp=0xc0006e6c20 pc=0x45ec88
runtime.traceGCMarkAssistDone(...)
.../go/src/runtime/trace.go:1497
runtime.gcAssistAlloc(0xc0003281a0)
.../go/src/runtime/mgcmark.go:517 +0x27d fp=0xc0006e6c88 sp=0xc0006e6c28 pc=0x421a1d
runtime.deductAssistCredit(0x0?)
.../go/src/runtime/malloc.go:1287 +0x54 fp=0xc0006e6cb0 sp=0xc0006e6c88 pc=0x40fed4
runtime.mallocgc(0x400, 0x7a9420, 0x1)
.../go/src/runtime/malloc.go:1002 +0xc9 fp=0xc0006e6d18 sp=0xc0006e6cb0 pc=0x40f709
runtime.newobject(0xb3?)
.../go/src/runtime/malloc.go:1324 +0x25 fp=0xc0006e6d40 sp=0xc0006e6d18 pc=0x40ffc5
runtime_test.deferHeapAndStack(0xb4)
.../go/src/runtime/stack_test.go:924 +0x165 fp=0xc0006e6e20 sp=0xc0006e6d40 pc=0x75c2a5
Fixes#59692
Co-Authored-By: Cherry Mui <cherryyz@google.com>
Co-Authored-By: Michael Knyszek <mknyszek@google.com>
Co-Authored-By: Nick Ripley <nick.ripley@datadoghq.com>
Change-Id: I1c0c28327fc2fac0b8cfdbaa72e25584331be31e
Reviewed-on: https://go-review.googlesource.com/c/go/+/489015
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
The current definitions of StackLimit and StackGuard only indirectly
specify the NOSPLIT stack limit and duplicate a literal constant
(928). Currently, they define the stack guard delta, and from there
compute the NOSPLIT limit.
Rationalize these by defining a new constant, abi.StackNosplitBase,
which consolidates and directly specifies the NOSPLIT stack limit (in
the default case). From this we then compute the stack guard delta,
inverting the relationship between these two constants. While we're
here, we rename StackLimit to StackNosplit to make it clearer what's
being limited.
This change does not affect the values of these constants in the
default configuration. It does slightly change how
StackGuardMultiplier values other than 1 affect the constants, but
this multiplier is a pretty rough heuristic anyway.
before after
stackNosplit 800 800
_StackGuard 928 928
stackNosplit -race 1728 1600
_StackGuard -race 1856 1728
For #59670.
Change-Id: Ia94094c5e47897e7c088d24b4a5e33f5c2768db5
Reviewed-on: https://go-review.googlesource.com/c/go/+/486976
Auto-Submit: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This reverts commit CL 486379.
Submitted out of order and breaks bootstrap.
Change-Id: Ie20a61cc56efc79a365841293ca4e7352b02d86b
Reviewed-on: https://go-review.googlesource.com/c/go/+/486917
TryBot-Bypass: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
This reverts commit CL 486380.
Submitted out of order and breaks bootstrap.
Change-Id: I67bd225094b5c9713b97f70feba04d2c99b7da76
Reviewed-on: https://go-review.googlesource.com/c/go/+/486916
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
This reverts commit CL 486381.
Submitted out of order and breaks bootstrap.
Change-Id: Ia472111cb966e884a48f8ee3893b3bf4b4f4f875
Reviewed-on: https://go-review.googlesource.com/c/go/+/486915
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Austin Clements <austin@google.com>
The current definitions of StackLimit and StackGuard only indirectly
specify the NOSPLIT stack limit and duplicate a literal constant
(928). Currently, they define the stack guard delta, and from there
compute the NOSPLIT limit.
Rationalize these by defining a new constant, abi.StackNosplitBase,
which consolidates and directly specifies the NOSPLIT stack limit (in
the default case). From this we then compute the stack guard delta,
inverting the relationship between these two constants. While we're
here, we rename StackLimit to StackNosplit to make it clearer what's
being limited.
This change does not affect the values of these constants in the
default configuration. It does slightly change how
StackGuardMultiplier values other than 1 affect the constants, but
this multiplier is a pretty rough heuristic anyway.
before after
stackNosplit 800 800
_StackGuard 928 928
stackNosplit -race 1728 1600
_StackGuard -race 1856 1728
For #59670.
Change-Id: Ibe20825ebe0076bbd7b0b7501177b16c9dbcb79e
Reviewed-on: https://go-review.googlesource.com/c/go/+/486380
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Frame pointer is enabled on ARM64. When copying stacks, the
saved frame pointers need to be adjusted.
Updates #39524, #40044.
Fixes#58432.
Change-Id: I73651fdfd1a6cccae26a5ce02e7e86f6c2fb9bf7
Reviewed-on: https://go-review.googlesource.com/c/go/+/241158
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
More stuff to do = more stack needed. Bump up the guard space when
building with the race detector.
Fixes#54291
Change-Id: I701bc8800507921bed568047d35b8f49c26e7df7
Reviewed-on: https://go-review.googlesource.com/c/go/+/451217
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Replace all uses of Ctz64/32/8 with TrailingZeros64/32/8, because they
are the same and maybe duplicated. Also renamed CtzXX functions in 386
assembly code.
Change-Id: I19290204858083750f4be589bb0923393950ae6d
Reviewed-on: https://go-review.googlesource.com/c/go/+/438935
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
The stkframe struct and its methods are strewn across different source
files. Since they actually have a pretty coherent theme at this point,
migrate it all into a new file, stkframe.go. There are no code changes
in this CL.
For #54466, albeit rather indirectly.
Change-Id: Ibe53fc4b1106d131005e1c9d491be838a8f14211
Reviewed-on: https://go-review.googlesource.com/c/go/+/424516
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
This places getStackMap alongside argBytes and argMapInternal as
another method of stkframe.
For #54466, albeit rather indirectly.
Change-Id: I411dda3605dd7f996983706afcbefddf29a68a85
Reviewed-on: https://go-review.googlesource.com/c/go/+/424515
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently, stkframe.arglen and stkframe.argmap are populated by
gentraceback under a particular set of circumstances. But because they
can be constructed from other fields in stkframe, they don't need to
be computed eagerly at all. They're also rather misleading, as they're
only part of computing the actual argument map and most callers should
be using getStackMap, which does the rest of the work.
This CL drops these fields from stkframe. It shifts the functions that
used to compute them, getArgInfoFast and getArgInfo, into
corresponding methods stkframe.argBytes and stkframe.argMapInternal.
argBytes is expected to be used by callers that need to know only the
argument frame size, while argMapInternal is used only by argBytes and
getStackMap.
We also move some of the logic from getStackMap into argMapInternal
because the previous split of responsibilities didn't make much sense.
This lets us return just a bitvector from argMapInternal, rather than
both a bitvector, which carries a size, and an "actually use this
size".
The getArgInfoFast function was inlined before (and inl_test checked
this). We drop that requirement from stkframe.argBytes because the
uses of this have shifted and now it's only called from heap dumping
(which never happens) and conservative stack frame scanning (which
very, very rarely happens).
There will be a few follow-up clean-up CLs.
For #54466. This is a nice clean-up on its own, but it also serves to
remove pointers from the traceback state that would eventually become
troublesome write barriers once we stack-rip gentraceback.
Change-Id: I107f98ed8e7b00185c081de425bbf24af02a4163
Reviewed-on: https://go-review.googlesource.com/c/go/+/424514
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
In order to prevent false sharing of cache lines, structs are
padded with some number of bytes. These bytes are unused, serving
only to make the size of the struct a multiple of the size of the
cache line.
The current calculation of how much to pad is an overestimation,
when the struct size is already a multiple of the cache line size
without padding. For these cases, no padding is necessary, and
the size of the inner pad field should be 0. The bug is that the
pad field is sized to a whole 'nother cache line, wasting space.
Here is the current formula that can never return 0:
cpu.CacheLinePadSize - unsafe.Sizeof(myStruct{})%cpu.CacheLinePadSize
This change simply mods that calculation by cpu.CacheLinePadSize,
so that 0 will be returned instead of cpu.CacheLinePadSize.
Change-Id: I26a2b287171bf47a3b9121873b2722f728381b5e
Reviewed-on: https://go-review.googlesource.com/c/go/+/414214
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Measure the average stack size used by goroutines at every GC. When
starting a new goroutine, allocate an initial goroutine stack of that
average size. Intuition is that we'll waste at most 2x in stack space
because only half the goroutines can be below average. In turn, we
avoid some of the early stack growth / copying needed in the average
case.
More details in the design doc at: https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing
name old time/op new time/op delta
Issue18138 95.3µs ± 0% 67.3µs ±13% -29.35% (p=0.000 n=9+10)
Fixes#18138
Change-Id: Iba34d22ed04279da7e718bbd569bbf2734922eaa
Reviewed-on: https://go-review.googlesource.com/c/go/+/345889
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
CL 362934 added open code for unsafe.Slice, so using it now no longer
negatively impacts the performance.
Updates #48798
Change-Id: Ifbabe8bc1cc4349c5bcd11586a11fc99bcb388b1
Reviewed-on: https://go-review.googlesource.com/c/go/+/404974
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
[This CL is part of a sequence implementing the proposal #51082.
The design doc is at https://go.dev/s/godocfmt-design.]
Run the updated gofmt, which reformats doc comments,
on the main repository. Vendored files are excluded.
For #51082.
Change-Id: I7332f099b60f716295fb34719c98c04eb1a85407
Reviewed-on: https://go-review.googlesource.com/c/go/+/384268
Reviewed-by: Jonathan Amsterdam <jba@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>