Error like "morestack on g0" is one of the errors that is very
hard to debug, because often it doesn't print a useful stack trace.
The runtime doesn't directly print a stack trace because it is
a bad stack state to call print. Sometimes the SIGABRT may trigger
a traceback, but sometimes not especially in a cgo binary. Even if
it triggers a traceback it often does not include the stack trace
of the bad stack.
This CL makes it explicitly print a stack trace and throw. The
idea is to have some space as an "emergency" crash stack. When the
stack is in a really bad state, we switch to the crash stack and
do a traceback.
Currently only implemented on AMD64 and ARM64.
TODO: also handle errors like "morestack on gsignal" and bad
systemstack. Also handle other architectures.
Change-Id: Ibfc397202f2bb0737c5cbe99f2763de83301c1c1
Reviewed-on: https://go-review.googlesource.com/c/go/+/419435
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Change-Id: I3f0b7209621b39cee69566a5cc95e4343b4f1f20
GitHub-Last-Rev: af9dbbe69a
GitHub-Pull-Request: golang/go#63321
Reviewed-on: https://go-review.googlesource.com/c/go/+/531916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
We were using the size stored in the map, which is the smaller
of the real type size and 128.
As of CL 61538 we don't use these functions, but we expect to
use them again in the future after #61626 is resolved.
Change-Id: I7bfb4af5f0e3a56361d4019a8ed7c1ec59ff31fd
Reviewed-on: https://go-review.googlesource.com/c/go/+/535215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Keith Randall <khr@google.com>
The correct load factor is 6.5, not 6.
This got broken by accident in CL 462115.
Fixes#63438
Change-Id: Ib07bb6ab6103aec87cb775bc06bd04362a64e489
Reviewed-on: https://go-review.googlesource.com/c/go/+/533279
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
mspan.freeindex and nelems can fit into uint16 for all possible
values. Use uint16 instead of uintptr.
Change-Id: Ifce20751e81d5022be1f6b5cbb5fbe4fd1728b1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/451359
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
After the previous CL, this is now all dead code. This change is
separated out to make the previous one easy to backport.
For #63334.
Related to #61718 and #59960.
Change-Id: I109673ed97c62c472bbe2717dfeeb5aa4fc883ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/532117
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
This is a follow up of CL 528696.
Change-Id: I5b71eabedb12567c4b1b36f7182a3d2b0ed662a5
GitHub-Last-Rev: acaf3ac11c
GitHub-Pull-Request: golang/go#62713
Reviewed-on: https://go-review.googlesource.com/c/go/+/529197
Reviewed-by: Ian Lance Taylor <iant@google.com>
Auto-Submit: Ian Lance Taylor <iant@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
The stack bounds from pthread are not always accurate, and could
cause seg fault if we run out of the actual stack space before
reaching the bounds. Here we use an artificially small stack bounds
to check overflow without actually running out of the system stack.
Change-Id: I8067c5e1297307103b315d9d0c60120293b57aab
Reviewed-on: https://go-review.googlesource.com/c/go/+/523695
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Currently the runtime marks all new memory as MADV_HUGEPAGE on Linux and
manages its hugepage eligibility status. Unfortunately, the default
THP behavior on most Linux distros is that MADV_HUGEPAGE blocks while
the kernel eagerly reclaims and compacts memory to allocate a hugepage.
This direct reclaim and compaction is unbounded, and may result in
significant application thread stalls. In really bad cases, this can
exceed 100s of ms or even seconds.
Really all we want is to undo MADV_NOHUGEPAGE marks and let the default
Linux paging behavior take over, but the only way to unmark a region as
MADV_NOHUGEPAGE is to also mark it MADV_HUGEPAGE.
The overall strategy of trying to keep hugepages for the heap unbroken
however is sound. So instead let's use the new shiny MADV_COLLAPSE if it
exists.
MADV_COLLAPSE makes a best-effort synchronous attempt at collapsing the
physical memory backing a memory region into a hugepage. We'll use
MADV_COLLAPSE where we would've used MADV_HUGEPAGE, and stop using
MADV_NOHUGEPAGE altogether.
Because MADV_COLLAPSE is synchronous, it's also important to not
re-collapse huge pages if the huge pages are likely part of some large
allocation. Although in many cases it's advantageous to back these
allocations with hugepages because they're contiguous, eagerly
collapsing every hugepage means having to page in at least part of the
large allocation.
However, because we won't use MADV_NOHUGEPAGE anymore, we'll no longer
handle the fact that khugepaged might come in and back some memory we
returned to the OS with a hugepage. I've come to the conclusion that
this is basically unavoidable without a new madvise flag and that it's
just not a good default. If this change lands, advice about Linux huge
page settings will be added to the GC guide.
Verified that this change doesn't regress Sweet, at least not on my
machine with:
/sys/kernel/mm/transparent_hugepage/enabled [always or madvise]
/sys/kernel/mm/transparent_hugepage/defrag [madvise]
/sys/kernel/mm/transparent_hugepage/khugepaged/max_ptes_none [0 or 511]
Unfortunately, this workaround means that we only get forced hugepages
on Linux 6.1+.
Fixes#61718.
Change-Id: I7f4a7ba397847de29f800a99f9cb66cb2720a533
Reviewed-on: https://go-review.googlesource.com/c/go/+/516795
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Now that pcvalue keeps its cache on the M, we can drop all of the
stack-allocated pcvalueCaches and stop carefully passing them around
between lots of operations. This significantly simplifies a fair
amount of code and makes several structures smaller.
This series of changes has no statistically significant effect on any
runtime Stack benchmarks.
I also experimented with making the cache larger, now that the impact
is limited to the M struct, but wasn't able to measure any
improvements.
This is a re-roll of CL 515277
Change-Id: Ia27529302f81c1c92fb9c3a7474739eca80bfca1
Reviewed-on: https://go-review.googlesource.com/c/go/+/520064
Auto-Submit: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
The pprof mutex profile was meant to match the Google C++ (now Abseil)
mutex profiler, originally designed and implemented by Mike Burrows.
When we worked on the Go version, pjw and I missed that C++ counts the
time each thread is blocked, even if multiple threads are blocked on a
mutex. That is, if 100 threads are blocked on the same mutex for the
same 10ms, that still counts as 1000ms of contention in C++. In Go, to
date, /debug/pprof/mutex has counted that as only 10ms of contention.
If 100 goroutines are blocked on one mutex and only 1 goroutine is
blocked on another mutex, we probably do want to see the first mutex
as being more contended, so the Abseil approach is the more useful one.
This CL adopts "contention scales with number of goroutines blocked",
to better match Abseil [1]. However, it still makes sure to attribute the
time to the unlock that caused the backup, not subsequent innocent
unlocks that were affected by the congestion. In this way it still gives
more accurate profiles than Abseil does.
[1] https://github.com/abseil/abseil-cpp/blob/lts_2023_01_25/absl/synchronization/mutex.cc#L2390Fixes#61015.
Change-Id: I7eb9e706867ffa8c0abb5b26a1b448f6eba49331
Reviewed-on: https://go-review.googlesource.com/c/go/+/506415
Run-TryBot: Russ Cox <rsc@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
When recovering from a panic, restore the caller's frame pointer before
returning control to the caller. Otherwise, if the function proceeds to
run more deferred calls before returning, the deferred functions will
get invalid frame pointers pointing to an address lower in the stack.
This can cause frame pointer unwinding to crash, such as if an execution
trace event is recorded during the deferred call on architectures which
support frame pointer unwinding.
Fixes#61766
Change-Id: I45f41aedcc397133560164ab520ca638bbd93c4e
Reviewed-on: https://go-review.googlesource.com/c/go/+/516157
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Currently the BenchmarkSetType* benchmarks are racy: they call
heapBitsSetType on an allocation that might be in a span in-use for
allocation on another P. Because heap bits are bits but are written
byte-wise non-atomically (because a P assumes it has total ownership of
a span's bits), two threads can race writing the same heap bitmap byte
creating incorrect metadata.
Fix this by forcing every value we're writing heap bits for into a large
object. Large object spans will never be written to concurrently unless
they're freed first.
Also, while we're here, refactor the benchmarks a bit. Use generics to
eliminate the reflect nastiness in gc_test.go, and pass b.ResetTimer
down into the test to get slightly more accurate results.
Fixes#60050.
Change-Id: Ib7d6249b321963367c8c8ca88385386c8ae9af1c
Reviewed-on: https://go-review.googlesource.com/c/go/+/497215
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
The previous name was wrong due to the mistaken assumption that calling
f->g->getcallerpc and f->g->getcallersp would respectively return the
pc/sp at g. However, they are actually referring to their caller's
caller, i.e. f.
Rename getcallerfp to getfp in order to stay consistent with this
naming convention.
Also see discussion on CL 463835.
For #16638
This is a redo of CL 481617 that became necessary because CL 461738
added another call site for getcallerfp().
Change-Id: If0b536e85a6c26061b65e7b5c2859fc31385d025
Reviewed-on: https://go-review.googlesource.com/c/go/+/494857
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Currently STW events are only emitted for GC STWs. There's little reason
why the trace can't contain events for every STW: they're rare so don't
take up much space in the trace, yet being able to see when the world
was stopped is often critical to debugging certain latency issues,
especially when they stem from user-level APIs.
This change adds new "kinds" to the EvGCSTWStart event, renames the
GCSTW events to just "STW," and lets the parser deal with unknown STW
kinds for future backwards compatibility.
But, this change must break trace compatibility, so it bumps the trace
version to Go 1.21.
This change also includes a small cleanup in the trace command, which
previously checked for STW events when deciding whether user tasks
overlapped with a GC. Looking at the source, I don't see a way for STW
events to ever enter the stream that that code looks at, so that
condition has been deleted.
Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/494495
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Some C APIs require the use or structures that contain pointers to
buffers (iovec, io_uring, ...). The pointer passing rules would
require that these buffers are allocated in C memory and to process
this data with Go libraries it would need to be copied.
In order to provide a zero-copy way to use these C APIs, this CL
implements a Pinner API that allows to pin Go objects, which
guarantees that the garbage collector does not move these objects
while pinned. This allows to relax the pointer passing rules so that
pinned pointers can be stored in C allocated memory or can be
contained in Go memory that is passed to C functions.
The Pin() method accepts pointers to objects of any type and
unsafe.Pointer. Slices and arrays can be pinned by calling Pin()
with the pointer to the first element. Pinning of maps is not
supported.
If the GC collects unreachable Pinner holding pinned objects it
panics. If Pin() is called with the other non-pointer types it
panics as well.
Performance considerations: This change has no impact on execution
time on existing code, because checks are only done in code paths,
that would panic otherwise. The memory footprint on existing code is
one pointer per memory span.
Fixes: #46787
Signed-off-by: Sven Anderson <sven@anderson.de>
Change-Id: I110031fe789b92277ae45a9455624687bd1c54f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/367296
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Currently if GOGC=off and GOMEMLIMIT is set, then the synchronous
scavenger is likely to work fairly often to maintain the limit, since
the heap goal goes right up to the edge of the memory limit (minus a
fixed 1 MiB of headroom).
If the application's allocation rate is high, and page-level
fragmentation is high, then most allocations will scavenge.
This change mitigates this problem by adding a proportional component
to constant headroom added to the memory-limit-based heap goal. This
means the runtime will have much more headroom before fragmentation
forces memory to be eagerly scavenged.
The proportional headroom in this case is 3%, or ~30 MiB for a 1 GiB
heap. This technically will increase GC frequency in the GOGC=off case
by a tiny amount, but will likely have a positive impact on both
allocation throughput and latency that outweighs this difference.
I wrote a small program to reproduce this issue and confirmed that the
issue is resolved by this patch:
https://github.com/golang/go/issues/57069#issuecomment-1551746565
This value of 3% is chosen as it seems to be a inflection point in this
particular small program. 2% still resulted in quite a bit of eager
scavenging work. I confirmed this results in a GC frequency increase of
about 3%.
This choice is still somewhat arbitrary because the program is
arbitrary, so perhaps worth revisiting in the future. Still, it should
help a good number of programs.
Fixes#57069.
Change-Id: Icb9829db0dfefb4fe42a0cabc5aa8d35970dd7d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/460375
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Types are either static (for compiler-created types) or heap
allocated and always reachable (for reflection-created types, held
in the central map). So there is no need to escape types.
With CL 408826 reflect.Value does not always escape. Some functions
that escapes Value.typ would make the Value escape without this CL.
Had to add a special case for the inliner to keep (*Value).Type
still inlineable.
Change-Id: I7c14d35fd26328347b509a06eb5bd1534d40775f
Reviewed-on: https://go-review.googlesource.com/c/go/+/413474
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This was accidentally left behind when moving the logic to set the skip
sentinel in pcBuf to the caller.
Change-Id: Id7565f6ea4df6b32cf18b99c700bca322998d182
Reviewed-on: https://go-review.googlesource.com/c/go/+/489095
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
This reverts CL 481617.
Reason for revert: breaks test build on Windows
Change-Id: Ifc1a323b0cc521e7a5a1f7de7b3da667f5fee375
Reviewed-on: https://go-review.googlesource.com/c/go/+/494377
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
The previous name was wrong due to the mistaken assumption that calling
f->g->getcallerpc and f->g->getcallersp would respectively return the
pc/sp at g. However, they are actually referring to their caller's
caller, i.e. f.
Rename getcallerfp to getfp in order to stay consistent with this
naming convention.
Also see discussion on CL 463835.
For #16638
Change-Id: I07990645da78819efd3db92f643326652ee516f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/481617
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This touches a lot of files, which is bad, but it is also good,
since there's N copies of this information commoned into 1.
The new files in internal/abi are copied from the end of the stack;
ultimately this will all end up being used.
Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e
Reviewed-on: https://go-review.googlesource.com/c/go/+/462995
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Add TestSystemstackFramePointerAdjust as a regression test for CL
489015.
By turning stackPoisonCopy into a var instead of const and introducing
the ShrinkStackAndVerifyFramePointers() helper function, we are able to
trigger the exact combination of events that can crash traceEvent() if
systemstack restores a frame pointer that is pointing into the old
stack.
Updates #59692
Change-Id: I60fc6940638077e3b60a81d923b5f5b4f6d8a44c
Reviewed-on: https://go-review.googlesource.com/c/go/+/489115
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
The scavenge index currently doesn't guard against overflow, and CL
436395 removed the minHeapIdx optimization that allows the chunk scan to
skip scanning chunks that haven't been mapped for the heap, and are only
available as a consequence of chunks' mapped region being rounded out to
a page on both ends.
Because the 0'th chunk is never mapped, minHeapIdx effectively prevents
overflow, fixing the iOS breakage.
This change also refactors growth and initialization a little bit to
decouple it from pageAlloc a bit and share code across platforms.
Change-Id: If7fc3245aa81cf99451bf8468458da31986a9b0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/486695
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
This change makes it so that on Linux the Go runtime explicitly marks
page heap memory as either available to be backed by hugepages or not
using heuristics based on density.
The motivation behind this change is twofold:
1. In default Linux configurations, khugepaged can recoalesce hugepages
even after the scavenger breaks them up, resulting in significant
overheads for small heaps when their heaps shrink.
2. The Go runtime already has some heuristics about this, but those
heuristics appear to have bit-rotted and result in haphazard
hugepage management. Unlucky (but otherwise fairly dense) regions of
memory end up not backed by huge pages while sparse regions end up
accidentally marked MADV_HUGEPAGE and are not later broken up by the
scavenger, because it already got the memory it needed from more
dense sections (this is more likely to happen with small heaps that
go idle).
In this change, the runtime uses a new policy:
1. Mark all new memory MADV_HUGEPAGE.
2. Track whether each page chunk (4 MiB) became dense during the GC
cycle. Mark those MADV_HUGEPAGE, and hide them from the scavenger.
3. If a chunk is not dense for 1 full GC cycle, make it visible to the
scavenger.
4. The scavenger marks a chunk MADV_NOHUGEPAGE before it scavenges it.
This policy is intended to try and back memory that is a good candidate
for huge pages (high occupancy) with huge pages, and give memory that is
not (low occupancy) to the scavenger. Occupancy is defined not just by
occupancy at any instant of time, but also occupancy in the near future.
It's generally true that by the end of a GC cycle the heap gets quite
dense (from the perspective of the page allocator).
Because we want scavenging and huge page management to happen together
(the right time to MADV_NOHUGEPAGE is just before scavenging in order to
break up huge pages and keep them that way) and the cost of applying
MADV_HUGEPAGE and MADV_NOHUGEPAGE is somewhat high, the scavenger avoids
releasing memory in dense page chunks. All this together means the
scavenger will now more generally release memory on a ~1 GC cycle delay.
Notably this has implications for scavenging to maintain the memory
limit and the runtime/debug.FreeOSMemory API. This change makes it so
that in these cases all memory is visible to the scavenger regardless of
sparseness and delays the page allocator in re-marking this memory with
MADV_NOHUGEPAGE for around 1 GC cycle to mitigate churn.
The end result of this change should be little-to-no performance
difference for dense heaps (MADV_HUGEPAGE works a lot like the default
unmarked state) but should allow the scavenger to more effectively take
back fragments of huge pages. The main risk here is churn, because
MADV_HUGEPAGE usually forces the kernel to immediately back memory with
a huge page. That's the reason for the large amount of hysteresis (1
full GC cycle) and why the definition of high density is 96% occupancy.
Fixes#55328.
Change-Id: I8da7998f1a31b498a9cc9bc662c1ae1a6bf64630
Reviewed-on: https://go-review.googlesource.com/c/go/+/436395
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This is relatively easy using the new traceback iterator.
Ancestor tracebacks are now limited to 50 frames. We could keep that
at 100, but the fact that it used 100 before seemed arbitrary and
unnecessary.
Fixes#7181
Updates #54466
Change-Id: If693045881d84848f17e568df275a5105b6f1cb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/475960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently, filling PC traceback buffers is one of the jobs of
gentraceback. This moves it into a new function, tracebackPCs, with a
simple API built around unwinder, and changes all callers to use this
new API.
Updates #54466.
Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c
Reviewed-on: https://go-review.googlesource.com/c/go/+/468300
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
lfstack does very unsafe things. In particular, it will not
work with nodes that live on the heap. In normal use by the runtime,
that is the case (it is only used for gc work bufs). But the lfstack
test does use heap objects. It goes through some hoops to prevent
premature deallocation, but those hoops are not enough to convince
-d=checkptr that everything is ok.
Instead, allocate the test objects outside the heap, like the runtime
does for all of its lfstack usage. Remove the lifetime workaround
from the test.
Reported in https://groups.google.com/g/golang-nuts/c/psjrUV2ZKyI
Change-Id: If611105eab6c823a4d6c105938ce145ed731781d
Reviewed-on: https://go-review.googlesource.com/c/go/+/448899
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
If TestArenaCollision cannot reserve the address range it expects to
reserve, it currently fails somewhat mysteriously. Detect this case
and skip the test. This could lead to test rot if we wind up always
skipping this test, but it's not clear that there's a better answer.
If the test does fail, we now also log what it thinks it reserved so
the failure message is more useful in debugging any issues.
Fixes#49415Fixes#54597
Change-Id: I05cf27258c1c0a7a3ac8d147f36bf8890820d59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/446877
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This adds the function "start line number" to runtime._func and
runtime.inlinedCall objects. The "start line number" is the line number
of the func keyword or TEXT directive for assembly.
Subtracting the start line number from PC line number provides the
relative line offset of a PC from the the start of the function. This
helps with source stability by allowing code above the function to move
without invalidating samples within the function.
Encoding start line rather than relative lines directly is convenient
because the pprof format already contains a start line field.
This CL uses a straightforward encoding of explictly including a start
line field in every _func and inlinedCall. It is possible that we could
compress this further in the future. e.g., functions with a prologue
usually have <line of PC 0> == <start line>. In runtime.test, 95% of
functions have <line of PC 0> == <start line>.
According to bent, this is geomean +0.83% binary size vs master and
-0.31% binary size vs 1.19.
Note that //line directives can change the file and line numbers
arbitrarily. The encoded start line is as adjusted by //line directives.
Since this can change in the middle of a function, `line - start line`
offset calculations may not be meaningful if //line directives are in
use.
For #55022.
Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429638
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This change adds the arenas package and a function to reflect for
allocating from an arena via reflection, but all the new API is placed
behind a GOEXPERIMENT.
For #51317.
Change-Id: I026d46294e26ab386d74625108c19a0024fbcedc
Reviewed-on: https://go-review.googlesource.com/c/go/+/423361
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.
The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.
This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
choice is made because in practice 64 MiB appears to be way too large
of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
list and when there are no pointers left pointing into them, they're
allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
only types and slice backing stores whose sizes are 1/4th the size of
a chunk or less may be used. Previously larger sizes, up to the size
of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
termination (a non-public patch once did this; I don't see a reason
not to continue that).
For #51317.
Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
This change adds a metric to the runtime/metrics package which tracks
total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of
this metric is to be able to quickly get an idea of the total mutex wait
time.
The implementation of this metric piggybacks off of the existing G
runnable tracking infrastructure, as well as the wait reason set on a G
when it goes into _Gwaiting.
Fixes#49881.
Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517
Reviewed-on: https://go-review.googlesource.com/c/go/+/427618
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
There are lots of useless buckets with too much precision. Introduce a
minimum level of precision with a minimum bucket bit. This cuts down on
the size of a time histogram dramatically (~3x). Also, pick a smaller
sub bucket count; we don't need 6% precision.
Also, rename super-buckets to buckets to more closely line up with HDR
histogram literature.
Change-Id: I199449650e4b34f2a6dca3cf1d8edb071c6655c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/427615
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Use an atomic.Uint32 to represent the state of finalizer goroutine.
fingStatus will only be changed to fingWake in non fingWait state,
so it is safe to set fingRunningFinalizer status in runfinq.
name old time/op new time/op delta
Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10)
FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8)
Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/400134
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Michael Pratt <mpratt@google.com>
For #53821
Change-Id: I686fe81268f70acc6a4c3e6b1d3ed0e07bb0d61c
Reviewed-on: https://go-review.googlesource.com/c/go/+/425775
Run-TryBot: xie cui <523516579@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
I've dropped the note that sched.timeToRun is protected by sched.lock,
as it does not seem to be true.
For #53821.
Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/419449
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Atomic operations are used even during STW for consistency.
For #53821.
Change-Id: Ibe7afe5cf893b1288ce24fc96b7691b1f81754ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/417775
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Rename g variables to gp for consistency.
Change-Id: I09ecdc7e8439637bc0e32f9c5f96f515e6436362
Reviewed-on: https://go-review.googlesource.com/c/go/+/418591
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
_g_, _p_, and _m_ are primarily vestiges of the C version of the
runtime, while today we prefer Go-style variable names (generally gp,
pp, and mp).
This change replaces all remaining uses of _p_ with pp. These are all
trivial replacements (i.e., no conflicts). That said, there are several
functions that refer to two different Ps at once. There the naming
convention is generally that pp refers to the local P, and p2 refers to
the other P we are accessing.
Change-Id: I205b801be839216972e7644b1fbeacdbf2612859
Reviewed-on: https://go-review.googlesource.com/c/go/+/306674
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.
semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.
See similar instrumentation on trace.shutdownSema and reflectOffs.lock.
Fixes#53542.
Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>