Commit graph

221 commits

Author SHA1 Message Date
Michael Anthony Knyszek
b1aadd034c runtime: emit STW events for all pauses, not just those for the GC
Currently STW events are only emitted for GC STWs. There's little reason
why the trace can't contain events for every STW: they're rare so don't
take up much space in the trace, yet being able to see when the world
was stopped is often critical to debugging certain latency issues,
especially when they stem from user-level APIs.

This change adds new "kinds" to the EvGCSTWStart event, renames the
GCSTW events to just "STW," and lets the parser deal with unknown STW
kinds for future backwards compatibility.

But, this change must break trace compatibility, so it bumps the trace
version to Go 1.21.

This change also includes a small cleanup in the trace command, which
previously checked for STW events when deciding whether user tasks
overlapped with a GC. Looking at the source, I don't see a way for STW
events to ever enter the stream that that code looks at, so that
condition has been deleted.

Change-Id: I9a5dc144092c53e92eb6950e9a5504a790ac00cf
Reviewed-on: https://go-review.googlesource.com/c/go/+/494495
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-19 17:06:45 +00:00
Sven Anderson
251daf46fb runtime: implement Pinner API for object pinning
Some C APIs require the use or structures that contain pointers to
buffers (iovec, io_uring, ...).  The pointer passing rules would
require that these buffers are allocated in C memory and to process
this data with Go libraries it would need to be copied.

In order to provide a zero-copy way to use these C APIs, this CL
implements a Pinner API that allows to pin Go objects, which
guarantees that the garbage collector does not move these objects
while pinned.  This allows to relax the pointer passing rules so that
pinned pointers can be stored in C allocated memory or can be
contained in Go memory that is passed to C functions.

The Pin() method accepts pointers to objects of any type and
unsafe.Pointer.  Slices and arrays can be pinned by calling Pin()
with the pointer to the first element.  Pinning of maps is not
supported.

If the GC collects unreachable Pinner holding pinned objects it
panics.  If Pin() is called with the other non-pointer types it
panics as well.

Performance considerations: This change has no impact on execution
time on existing code, because checks are only done in code paths,
that would panic otherwise.  The memory footprint on existing code is
one pointer per memory span.

Fixes: #46787

Signed-off-by: Sven Anderson <sven@anderson.de>
Change-Id: I110031fe789b92277ae45a9455624687bd1c54f2
Reviewed-on: https://go-review.googlesource.com/c/go/+/367296
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-05-19 14:59:14 +00:00
Michael Anthony Knyszek
e97bd776f9 runtime: make the memory limit heap goal headroom proportional
Currently if GOGC=off and GOMEMLIMIT is set, then the synchronous
scavenger is likely to work fairly often to maintain the limit, since
the heap goal goes right up to the edge of the memory limit (minus a
fixed 1 MiB of headroom).

If the application's allocation rate is high, and page-level
fragmentation is high, then most allocations will scavenge.

This change mitigates this problem by adding a proportional component
to constant headroom added to the memory-limit-based heap goal. This
means the runtime will have much more headroom before fragmentation
forces memory to be eagerly scavenged.

The proportional headroom in this case is 3%, or ~30 MiB for a 1 GiB
heap. This technically will increase GC frequency in the GOGC=off case
by a tiny amount, but will likely have a positive impact on both
allocation throughput and latency that outweighs this difference.

I wrote a small program to reproduce this issue and confirmed that the
issue is resolved by this patch:

https://github.com/golang/go/issues/57069#issuecomment-1551746565

This value of 3% is chosen as it seems to be a inflection point in this
particular small program. 2% still resulted in quite a bit of eager
scavenging work. I confirmed this results in a GC frequency increase of
about 3%.

This choice is still somewhat arbitrary because the program is
arbitrary, so perhaps worth revisiting in the future. Still, it should
help a good number of programs.

Fixes #57069.

Change-Id: Icb9829db0dfefb4fe42a0cabc5aa8d35970dd7d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/460375
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-19 13:53:21 +00:00
Cherry Mui
be4fe08b57 reflect: do not escape Value.Type
Types are either static (for compiler-created types) or heap
allocated and always reachable (for reflection-created types, held
in the central map). So there is no need to escape types.

With CL 408826 reflect.Value does not always escape. Some functions
that escapes Value.typ would make the Value escape without this CL.

Had to add a special case for the inliner to keep (*Value).Type
still inlineable.

Change-Id: I7c14d35fd26328347b509a06eb5bd1534d40775f
Reviewed-on: https://go-review.googlesource.com/c/go/+/413474
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-12 21:11:51 +00:00
Felix Geisendörfer
c3db9af3a6 runtime: remove unused skip arg from fpTracebackPCs
This was accidentally left behind when moving the logic to set the skip
sentinel in pcBuf to the caller.

Change-Id: Id7565f6ea4df6b32cf18b99c700bca322998d182
Reviewed-on: https://go-review.googlesource.com/c/go/+/489095
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2023-05-12 13:51:15 +00:00
Michael Pratt
96add980ad Revert "runtime: rename getcallerfp to getfp"
This reverts CL 481617.

Reason for revert: breaks test build on Windows

Change-Id: Ifc1a323b0cc521e7a5a1f7de7b3da667f5fee375
Reviewed-on: https://go-review.googlesource.com/c/go/+/494377
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-11 17:15:06 +00:00
Felix Geisendörfer
bc9e21351c runtime: rename getcallerfp to getfp
The previous name was wrong due to the mistaken assumption that calling
f->g->getcallerpc and f->g->getcallersp would respectively return the
pc/sp at g. However, they are actually referring to their caller's
caller, i.e. f.

Rename getcallerfp to getfp in order to stay consistent with this
naming convention.

Also see discussion on CL 463835.

For #16638

Change-Id: I07990645da78819efd3db92f643326652ee516f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/481617
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-11 15:18:52 +00:00
David Chase
2e93fe0a9f runtime: move per-type types to internal/abi
Change-Id: I1f031f0f83a94bebe41d3978a91a903dc5bcda66
Reviewed-on: https://go-review.googlesource.com/c/go/+/489276
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-11 13:45:40 +00:00
David Chase
bdc6ae579a internal/abi: refactor (basic) type struct into one definition
This touches a lot of files, which is bad, but it is also good,
since there's N copies of this information commoned into 1.

The new files in internal/abi are copied from the end of the stack;
ultimately this will all end up being used.

Change-Id: Ia252c0055aaa72ca569411ef9f9e96e3d610889e
Reviewed-on: https://go-review.googlesource.com/c/go/+/462995
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-05-05 14:59:28 +00:00
Felix Geisendörfer
6dca1a29ab runtime: add test for systemstack frame pointer adjustment
Add TestSystemstackFramePointerAdjust as a regression test for CL
489015.

By turning stackPoisonCopy into a var instead of const and introducing
the ShrinkStackAndVerifyFramePointers() helper function, we are able to
trigger the exact combination of events that can crash traceEvent() if
systemstack restores a frame pointer that is pointing into the old
stack.

Updates #59692

Change-Id: I60fc6940638077e3b60a81d923b5f5b4f6d8a44c
Reviewed-on: https://go-review.googlesource.com/c/go/+/489115
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
2023-05-03 14:34:04 +00:00
Michael Anthony Knyszek
15c1276246 runtime: bring back minHeapIdx in scavenge index
The scavenge index currently doesn't guard against overflow, and CL
436395 removed the minHeapIdx optimization that allows the chunk scan to
skip scanning chunks that haven't been mapped for the heap, and are only
available as a consequence of chunks' mapped region being rounded out to
a page on both ends.

Because the 0'th chunk is never mapped, minHeapIdx effectively prevents
overflow, fixing the iOS breakage.

This change also refactors growth and initialization a little bit to
decouple it from pageAlloc a bit and share code across platforms.

Change-Id: If7fc3245aa81cf99451bf8468458da31986a9b0a
Reviewed-on: https://go-review.googlesource.com/c/go/+/486695
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2023-04-20 20:08:25 +00:00
Michael Anthony Knyszek
8fa9e3beee runtime: manage huge pages explicitly
This change makes it so that on Linux the Go runtime explicitly marks
page heap memory as either available to be backed by hugepages or not
using heuristics based on density.

The motivation behind this change is twofold:
1. In default Linux configurations, khugepaged can recoalesce hugepages
   even after the scavenger breaks them up, resulting in significant
   overheads for small heaps when their heaps shrink.
2. The Go runtime already has some heuristics about this, but those
   heuristics appear to have bit-rotted and result in haphazard
   hugepage management. Unlucky (but otherwise fairly dense) regions of
   memory end up not backed by huge pages while sparse regions end up
   accidentally marked MADV_HUGEPAGE and are not later broken up by the
   scavenger, because it already got the memory it needed from more
   dense sections (this is more likely to happen with small heaps that
   go idle).

In this change, the runtime uses a new policy:

1. Mark all new memory MADV_HUGEPAGE.
2. Track whether each page chunk (4 MiB) became dense during the GC
   cycle. Mark those MADV_HUGEPAGE, and hide them from the scavenger.
3. If a chunk is not dense for 1 full GC cycle, make it visible to the
   scavenger.
4. The scavenger marks a chunk MADV_NOHUGEPAGE before it scavenges it.

This policy is intended to try and back memory that is a good candidate
for huge pages (high occupancy) with huge pages, and give memory that is
not (low occupancy) to the scavenger. Occupancy is defined not just by
occupancy at any instant of time, but also occupancy in the near future.
It's generally true that by the end of a GC cycle the heap gets quite
dense (from the perspective of the page allocator).

Because we want scavenging and huge page management to happen together
(the right time to MADV_NOHUGEPAGE is just before scavenging in order to
break up huge pages and keep them that way) and the cost of applying
MADV_HUGEPAGE and MADV_NOHUGEPAGE is somewhat high, the scavenger avoids
releasing memory in dense page chunks. All this together means the
scavenger will now more generally release memory on a ~1 GC cycle delay.

Notably this has implications for scavenging to maintain the memory
limit and the runtime/debug.FreeOSMemory API. This change makes it so
that in these cases all memory is visible to the scavenger regardless of
sparseness and delays the page allocator in re-marking this memory with
MADV_NOHUGEPAGE for around 1 GC cycle to mitigate churn.

The end result of this change should be little-to-no performance
difference for dense heaps (MADV_HUGEPAGE works a lot like the default
unmarked state) but should allow the scavenger to more effectively take
back fragments of huge pages. The main risk here is churn, because
MADV_HUGEPAGE usually forces the kernel to immediately back memory with
a huge page. That's the reason for the large amount of hysteresis (1
full GC cycle) and why the definition of high density is 96% occupancy.

Fixes #55328.

Change-Id: I8da7998f1a31b498a9cc9bc662c1ae1a6bf64630
Reviewed-on: https://go-review.googlesource.com/c/go/+/436395
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-19 14:30:00 +00:00
Felix Geisendörfer
25f9af6661 runtime: add a benchmark of FPCallers
This allows comparing frame pointer unwinding against the default
unwinder as shown below.

goos: linux
goarch: amd64
pkg: runtime
cpu: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
                    │ callers.txt │
                    │   sec/op    │
Callers/cached-32     1.254µ ± 0%
FPCallers/cached-32   24.99n ± 0%

For #16638

Change-Id: I4dd05f82254726152ef4a5d5beceab33641e9d2b
Reviewed-on: https://go-review.googlesource.com/c/go/+/475795
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Felix Geisendörfer <felix.geisendoerfer@datadoghq.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-03-30 19:18:18 +00:00
Austin Clements
9eba17ff90 runtime: for deep stacks, print both the top 50 and bottom 50 frames
This is relatively easy using the new traceback iterator.

Ancestor tracebacks are now limited to 50 frames. We could keep that
at 100, but the fact that it used 100 before seemed arbitrary and
unnecessary.

Fixes #7181
Updates #54466

Change-Id: If693045881d84848f17e568df275a5105b6f1cb0
Reviewed-on: https://go-review.googlesource.com/c/go/+/475960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-03-21 19:14:14 +00:00
Austin Clements
3e360b035f runtime: new API for filling PC traceback buffers
Currently, filling PC traceback buffers is one of the jobs of
gentraceback. This moves it into a new function, tracebackPCs, with a
simple API built around unwinder, and changes all callers to use this
new API.

Updates #54466.

Change-Id: Id2038bded81bf533a5a4e71178a7c014904d938c
Reviewed-on: https://go-review.googlesource.com/c/go/+/468300
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
2023-03-10 17:59:37 +00:00
Nick Ripley
51225f6fc6 runtime: record parent goroutine ID, and print it in stack traces
Fixes #38651

Change-Id: Id46d684ee80e208c018791a06c26f304670ed159
Reviewed-on: https://go-review.googlesource.com/c/go/+/435337
Run-TryBot: Nick Ripley <nick.ripley@datadoghq.com>
Reviewed-by: Ethan Reesor <ethan.reesor@gmail.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-21 17:35:22 +00:00
Keith Randall
d6171c9be2 runtime: fix conflict between lfstack and checkptr
lfstack does very unsafe things. In particular, it will not
work with nodes that live on the heap. In normal use by the runtime,
that is the case (it is only used for gc work bufs). But the lfstack
test does use heap objects. It goes through some hoops to prevent
premature deallocation, but those hoops are not enough to convince
-d=checkptr that everything is ok.

Instead, allocate the test objects outside the heap, like the runtime
does for all of its lfstack usage. Remove the lifetime workaround
from the test.

Reported in https://groups.google.com/g/golang-nuts/c/psjrUV2ZKyI

Change-Id: If611105eab6c823a4d6c105938ce145ed731781d
Reviewed-on: https://go-review.googlesource.com/c/go/+/448899
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2022-11-17 23:12:04 +00:00
Austin Clements
e72da1c15d runtime: skip TestArenaCollision on failed reservation
If TestArenaCollision cannot reserve the address range it expects to
reserve, it currently fails somewhat mysteriously. Detect this case
and skip the test. This could lead to test rot if we wind up always
skipping this test, but it's not clear that there's a better answer.
If the test does fail, we now also log what it thinks it reserved so
the failure message is more useful in debugging any issues.

Fixes #49415
Fixes #54597

Change-Id: I05cf27258c1c0a7a3ac8d147f36bf8890820d59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/446877
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-11-01 17:06:23 +00:00
Michael Pratt
f2656f20ea cmd/compile,cmd/link,runtime: add start line numbers to func metadata
This adds the function "start line number" to runtime._func and
runtime.inlinedCall objects. The "start line number" is the line number
of the func keyword or TEXT directive for assembly.

Subtracting the start line number from PC line number provides the
relative line offset of a PC from the the start of the function. This
helps with source stability by allowing code above the function to move
without invalidating samples within the function.

Encoding start line rather than relative lines directly is convenient
because the pprof format already contains a start line field.

This CL uses a straightforward encoding of explictly including a start
line field in every _func and inlinedCall. It is possible that we could
compress this further in the future. e.g., functions with a prologue
usually have <line of PC 0> == <start line>. In runtime.test, 95% of
functions have <line of PC 0> == <start line>.

According to bent, this is geomean +0.83% binary size vs master and
-0.31% binary size vs 1.19.

Note that //line directives can change the file and line numbers
arbitrarily. The encoded start line is as adjusted by //line directives.
Since this can change in the middle of a function, `line - start line`
offset calculations may not be meaningful if //line directives are in
use.

For #55022.

Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429638
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-14 14:47:12 +00:00
Michael Anthony Knyszek
e0d01b8467 arena: add experimental arena package
This change adds the arenas package and a function to reflect for
allocating from an arena via reflection, but all the new API is placed
behind a GOEXPERIMENT.

For #51317.

Change-Id: I026d46294e26ab386d74625108c19a0024fbcedc
Reviewed-on: https://go-review.googlesource.com/c/go/+/423361
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-12 20:23:36 +00:00
Michael Anthony Knyszek
7866538d25 runtime: add safe arena support to the runtime
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.

The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.

This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
  choice is made because in practice 64 MiB appears to be way too large
  of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
  list and when there are no pointers left pointing into them, they're
  allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
  by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
  only types and slice backing stores whose sizes are 1/4th the size of
  a chunk or less may be used. Previously larger sizes, up to the size
  of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
  termination (a non-public patch once did this; I don't see a reason
  not to continue that).

For #51317.

Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-10-12 20:23:30 +00:00
Michael Anthony Knyszek
63ceff95fa runtime/metrics: add /sync/mutex/wait/total:seconds metric
This change adds a metric to the runtime/metrics package which tracks
total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of
this metric is to be able to quickly get an idea of the total mutex wait
time.

The implementation of this metric piggybacks off of the existing G
runnable tracking infrastructure, as well as the wait reason set on a G
when it goes into _Gwaiting.

Fixes #49881.

Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517
Reviewed-on: https://go-review.googlesource.com/c/go/+/427618
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-16 16:33:08 +00:00
Michael Anthony Knyszek
87eda2a782 runtime: shrink time histogram buckets
There are lots of useless buckets with too much precision. Introduce a
minimum level of precision with a minimum bucket bit. This cuts down on
the size of a time histogram dramatically (~3x). Also, pick a smaller
sub bucket count; we don't need 6% precision.

Also, rename super-buckets to buckets to more closely line up with HDR
histogram literature.

Change-Id: I199449650e4b34f2a6dca3cf1d8edb071c6655c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/427615
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-09-16 16:32:01 +00:00
Leonard Wang
bd5595d7fa runtime: refactor finalizer goroutine status
Use an atomic.Uint32 to represent the state of finalizer goroutine.
fingStatus will only be changed to fingWake in non fingWait state,
so it is safe to set fingRunningFinalizer status in runfinq.

name            old time/op  new time/op  delta
Finalizer-8      592µs ± 4%   561µs ± 1%  -5.22%  (p=0.000 n=10+10)
FinalizerRun-8   694ns ± 6%   675ns ± 7%    ~     (p=0.059 n=9+8)

Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/400134
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-09-05 08:28:34 +00:00
cuiweixie
fa0e3bffb4 runtime: convert semaRoot.nwait to atomic type
For #53821

Change-Id: I686fe81268f70acc6a4c3e6b1d3ed0e07bb0d61c
Reviewed-on: https://go-review.googlesource.com/c/go/+/425775
Run-TryBot: xie cui <523516579@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-31 15:33:00 +00:00
Cuong Manh Le
a719a78c1b runtime: add and use runtime/internal/sys.NotInHeap
Updates #46731

Change-Id: Ic2208c8bb639aa1e390be0d62e2bd799ecf20654
Reviewed-on: https://go-review.googlesource.com/c/go/+/421878
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2022-08-19 00:29:18 +00:00
Michael Pratt
b04e4637db runtime: convert timeHistogram to atomic types
I've dropped the note that sched.timeToRun is protected by sched.lock,
as it does not seem to be true.

For #53821.

Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/419449
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-12 01:53:11 +00:00
Michael Pratt
b8f4847d6f runtime: convert gcController.globalsScan to atomic type
For #53821.

Change-Id: I92bd33e355c868ae229395fd9c98fdb10768d03d
Reviewed-on: https://go-review.googlesource.com/c/go/+/417779
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2022-08-08 14:11:40 +00:00
Michael Pratt
02fb9b8ca9 runtime: convert gcController.maxStackScan to atomic type
For #53821.

Change-Id: I1bd23cdbc371011ec2331fb0a37482ecf99a063b
Reviewed-on: https://go-review.googlesource.com/c/go/+/417778
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08 14:11:32 +00:00
Michael Pratt
6e9925c4f7 runtime: convert gcController.heapScan to atomic type
For #53821.

Change-Id: I64d3f53c89a579d93056906304e4c05fc35cd9b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/417776
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08 14:11:19 +00:00
Michael Pratt
3a9281ff61 runtime: convert gcController.heapLive to atomic type
Atomic operations are used even during STW for consistency.

For #53821.

Change-Id: Ibe7afe5cf893b1288ce24fc96b7691b1f81754ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/417775
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08 14:11:09 +00:00
Michael Pratt
29b9a328d2 runtime: trivial replacements of g in remaining files
Rename g variables to gp for consistency.

Change-Id: I09ecdc7e8439637bc0e32f9c5f96f515e6436362
Reviewed-on: https://go-review.googlesource.com/c/go/+/418591
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2022-08-02 18:52:33 +00:00
Michael Pratt
5e8d261918 runtime: rename _p_ to pp
_g_, _p_, and _m_ are primarily vestiges of the C version of the
runtime, while today we prefer Go-style variable names (generally gp,
pp, and mp).

This change replaces all remaining uses of _p_ with pp. These are all
trivial replacements (i.e., no conflicts). That said, there are several
functions that refer to two different Ps at once. There the naming
convention is generally that pp refers to the local P, and p2 refers to
the other P we are accessing.

Change-Id: I205b801be839216972e7644b1fbeacdbf2612859
Reviewed-on: https://go-review.googlesource.com/c/go/+/306674
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-02 18:51:06 +00:00
Michael Pratt
d6481d5b96 runtime: add race annotations to metricsSema
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.

semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.

See similar instrumentation on trace.shutdownSema and reflectOffs.lock.

Fixes #53542.

Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-06-29 16:30:19 +00:00
Michael Anthony Knyszek
7bad61554e runtime: write much more direct test for semaphore waiter scalability
This test originally existed as two tests in test/locklinear.go, but
this checked against actual locks and was flaky. The test was checking
a property of a deep part of the runtime but from a much higher level,
and it's easy for nondeterminism due to scheduling to completely mess
that up, especially on an oversubscribed system.

That test was then moved to the sync package with a more rigorous
testing methodology, but it could still flake pretty easily.

Finally, this CL makes semtable more testable, exports it in
export_test.go, then writes a very direct scalability test for exactly
the situation the original test described. As far as I can tell, this is
much, much more stable, because it's single-threaded and is just
checking exactly the algorithm we need to check.

Don't bother trying to bring in a test that checks for O(log n) behavior
on the other kind of iteration. It'll be perpetually flaky because the
underlying data structure is a treap, so it's only _expected_ to be
O(log n), but it's very easy for it to get unlucky without a large
number of iterations that's too much for a simple test.

Fixes #53381.

Change-Id: Ia1cd2d2b0e36d552d5a8ae137077260a16016602
Reviewed-on: https://go-review.googlesource.com/c/go/+/412875
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-06-16 21:25:35 +00:00
Michael Anthony Knyszek
d7941030c9 runtime: only use CPU time from the current window in the GC CPU limiter
Currently the GC CPU limiter consumes CPU time from a few pools, but
because the events that flush to those pools may overlap, rather than be
strictly contained within, the update window for the GC CPU limiter, the
limiter's accounting is ultimately sloppy.

This sloppiness complicates accounting for idle time more completely,
and makes reasoning about the transient behavior of the GC CPU limiter
much more difficult.

To remedy this, this CL adds a field to the P struct that tracks the
start time of any in-flight event the limiter might care about, along
with information about the nature of that event. This timestamp is
managed atomically so that the GC CPU limiter can come in and perform a
read of the partial CPU time consumed by a given event. The limiter also
updates the timestamp so that only what's left over is flushed by the
event itself when it completes.

The end result of this change is that, since the GC CPU limiter is aware
of all past completed events, and all in-flight events, it can much more
accurately collect the CPU time of events since the last update. There's
still the possibility for skew, but any leftover time will be captured
in the following update, and the magnitude of this leftover time is
effectively bounded by the update period of the GC CPU limiter, which is
much easier to consider.

One caveat of managing this timestamp-type combo atomically is that they
need to be packed in 64 bits. So, this CL gives up the top 3 bits of the
timestamp and places the type information there. What this means is we
effectively have only a 61-bit resolution timestamp. This is fine when
the top 3 bits are the same between calls to nanotime, but becomes a
problem on boundaries when those 3 bits change. These cases may cause
hiccups in the GC CPU limiter by not accounting for some source of CPU
time correctly, but with 61 bits of resolution this should be extremely
rare. The rate of update is on the order of milliseconds, so at worst
the runtime will be off of any given measurement by only a few
CPU-milliseconds (and this is directly bounded by the rate of update).
We're probably more inaccurate from the fact that we don't measure real
CPU time but only approximate it.

For #52890.

Change-Id: I347f30ac9e2ba6061806c21dfe0193ef2ab3bbe9
Reviewed-on: https://go-review.googlesource.com/c/go/+/410120
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-06-03 20:16:15 +00:00
Michael Anthony Knyszek
1f0ef6bec7 runtime: cancel mark and scavenge assists if the limiter is enabled
This change forces mark and scavenge assists to be cancelled early if
the limiter is enabled. This avoids goroutines getting stuck in really
long assists if the limiter happens to be disabled when they first come
into the assist. This can get especially bad for mark assists, which, in
dire situations, can end up "owing" the GC a really significant debt.

For #52890.

Change-Id: I4bfaa76b8de3e167d49d2ffd8bc2127b87ea566a
Reviewed-on: https://go-review.googlesource.com/c/go/+/408816
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-27 17:30:15 +00:00
Michael Anthony Knyszek
2eb8b6eec6 runtime: make CPU limiter assist time much less error-prone
At the expense of performance (having to update another atomic counter)
this change makes CPU limiter assist time much less error-prone to
manage. There are currently a number of issues with respect to how
scavenge assist time is treated, and this change resolves those by just
having the limiter maintain its own internal pool that's drained on each
update.

While we're here, clear the measured assist time each cycle, which was
the impetus for the change.

Change-Id: I84c513a9f012b4007362a33cddb742c5779782b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/404304
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-13 16:02:20 +00:00
Keith Randall
016d755213 runtime: measure stack usage; start stacks larger if needed
Measure the average stack size used by goroutines at every GC. When
starting a new goroutine, allocate an initial goroutine stack of that
average size. Intuition is that we'll waste at most 2x in stack space
because only half the goroutines can be below average. In turn, we
avoid some of the early stack growth / copying needed in the average
case.

More details in the design doc at: https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing

name        old time/op  new time/op  delta
Issue18138  95.3µs ± 0%  67.3µs ±13%  -29.35%  (p=0.000 n=9+10)

Fixes #18138

Change-Id: Iba34d22ed04279da7e718bbd569bbf2734922eaa
Reviewed-on: https://go-review.googlesource.com/c/go/+/345889
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2022-05-12 22:32:42 +00:00
Michael Anthony Knyszek
e7508598bb runtime: use Escape instead of escape in export_test.go
I landed the bottom CL of my stack without rebasing or retrying trybots,
but in the rebase "escape" was removed in favor of "Escape."

Change-Id: Icdc4d8de8b6ebc782215f2836cd191377cc211df
Reviewed-on: https://go-review.googlesource.com/c/go/+/403755
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03 15:51:30 +00:00
Michael Anthony Knyszek
91f863013e runtime: redesign scavenging algorithm
Currently the runtime's scavenging algorithm involves running from the
top of the heap address space to the bottom (or as far as it gets) once
per GC cycle. Once it treads some ground, it doesn't tread it again
until the next GC cycle.

This works just fine for the background scavenger, for heap-growth
scavenging, and for debug.FreeOSMemory. However, it breaks down in the
face of a memory limit for small heaps in the tens of MiB. Basically,
because the scavenger never retreads old ground, it's completely
oblivious to new memory it could scavenge, and that it really *should*
in the face of a memory limit.

Also, every time some thread goes to scavenge in the runtime, it
reserves what could be a considerable amount of address space, hiding it
from other scavengers.

This change modifies and simplifies the implementation overall. It's
less code with complexities that are much better encapsulated. The
current implementation iterates optimistically over the address space
looking for memory to scavenge, keeping track of what it last saw. The
new implementation does the same, but instead of directly iterating over
pages, it iterates over chunks. It maintains an index of chunks (as a
bitmap over the address space) that indicate which chunks may contain
scavenge work. The page allocator populates this index, while scavengers
consume it and iterate over it optimistically.

This has a two key benefits:
1. Scavenging is much simpler: find a candidate chunk, and check it,
   essentially just using the scavengeOne fast path. There's no need for
   the complexity of iterating beyond one chunk, because the index is
   lock-free and already maintains that information.
2. If pages are freed to the page allocator (always guaranteed to be
   unscavenged), the page allocator immediately notifies all scavengers
   of the new source of work, avoiding the hiding issues of the old
   implementation.

One downside of the new implementation, however, is that it's
potentially more expensive to find pages to scavenge. In the past, if
a single page would become free high up in the address space, the
runtime's scavengers would ignore it. Now that scavengers won't, one or
more scavengers may need to iterate potentially across the whole heap to
find the next source of work. For the background scavenger, this just
means a potentially less reactive scavenger -- overall it should still
use the same amount of CPU. It means worse overheads for memory limit
scavenging, but that's not exactly something with a baseline yet.

In practice, this shouldn't be too bad, hopefully since the chunk index
is extremely compact. For a 48-bit address space, the index is only 8
MiB in size at worst, but even just one physical page in the index is
able to support up to 128 GiB heaps, provided they aren't terribly
sparse. On 32-bit platforms, the index is only 128 bytes in size.

For #48409.

Change-Id: I72b7e74365046b18c64a6417224c5d85511194fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/399474
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:13:53 +00:00
Michael Anthony Knyszek
7e4bc74119 runtime: set the heap goal from the memory limit
This change makes the memory limit functional by including it in the
heap goal calculation. Specifically, we derive a heap goal from the
memory limit, and compare that to the GOGC-based goal. If the goal based
on the memory limit is lower, we prefer that.

To derive the memory limit goal, the heap goal calculation now takes
a few additional parameters as input. As a result, the heap goal, in the
presence of a memory limit, may change dynamically. The consequences of
this are that different parts of the runtime can have different views of
the heap goal; this is OK. What's important is that all of the runtime
is able to observe the correct heap goal for the moment it's doing
something that affects it, like anything that should trigger a GC cycle.

On the topic of triggering a GC cycle, this change also allows any
manually managed memory allocation from the page heap to trigger a GC.
So, specifically workbufs, unrolled GC scan programs, and goroutine
stacks. The reason for this is that now non-heap memory can effect the
trigger or the heap goal.

Most sources of non-heap memory only change slowly, like GC pointer
bitmaps, or change in response to explicit function calls like
GOMAXPROCS. Note also that unrolled GC scan programs and workbufs are
really only relevant during a GC cycle anyway, so they won't actually
ever trigger a GC. Our primary target here is goroutine stacks.

Goroutine stacks can increase quickly, and this is currently totally
independent of the GC cycle. Thus, if for example a goroutine begins to
recurse suddenly and deeply, then even though the heap goal and trigger
react, we might not notice until its too late. As a result, we need to
trigger a GC cycle.

We do this trigger in allocManual instead of in stackalloc because it's
far more general. We ultimately care about memory that's mapped
read/write and not returned to the OS, which is much more the domain of
the page heap than the stack allocator. Furthermore, there may be new
sources of memory manual allocation in the future (e.g. arenas) that
need to trigger a GC if necessary. As such, I'm inclined to leave the
trigger in allocManual as an extra defensive measure.

It's worth noting that because goroutine stacks do not behave quite as
predictably as other non-heap memory, there is the potential for the
heap goal to swing wildly. Fortunately, goroutine stacks that haven't
been set up to shrink by the last GC cycle will not shrink until after
the next one. This reduces the amount of possible churn in the heap goal
because it means that shrinkage only happens once per goroutine, per GC
cycle. After all the goroutines that should shrink did, then goroutine
stacks will only grow. The shrink mechanism is analagous to sweeping,
which is incremental and thus tends toward a steady amount of heap
memory used. As a result, in practice, I expect this to be a non-issue.

Note that if the memory limit is not set, this change should be a no-op.

For #48409.

Change-Id: Ie06d10175e5e36f9fb6450e26ed8acd3d30c681c
Reviewed-on: https://go-review.googlesource.com/c/go/+/394221
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-03 15:13:35 +00:00
Michael Anthony Knyszek
129dcb7226 runtime: check the heap goal and trigger dynamically
As it stands, the heap goal and the trigger are set once by
gcController.commit, and then read out of gcController. However with the
coming memory limit we need the GC to be able to respond to changes in
non-heap memory. The simplest way of achieving this is to compute the
heap goal and its associated trigger dynamically.

In order to make this easier to implement, the GC trigger is now based
on the heap goal, as opposed to the status quo of computing both
simultaneously. In many cases we just want the heap goal anyway, not
both, but we definitely need the goal to compute the trigger, because
the trigger's bounds are entirely based on the goal (the initial runway
is not). A consequence of this is that we can't rely on the trigger to
enforce a minimum heap size anymore, and we need to lift that up
directly to the goal. Specifically, we need to lift up any part of the
calculation that *could* put the trigger ahead of the goal. Luckily this
is just the heap minimum and minimum sweep distance. In the first case,
the pacer may behave slightly differently, as the heap minimum is no
longer the minimum trigger, but the actual minimum heap goal. In the
second case it should be the same, as we ensure the additional runway
for sweeping is added to both the goal *and* the trigger, as before, by
computing that in gcControllerState.commit.

There's also another place we update the heap goal: if a GC starts and
we triggered beyond the goal, we always ensure there's some runway.
That calculation uses the current trigger, which violates the rule of
keeping the goal based on the trigger. Notice, however, that using the
precomputed trigger for this isn't even quite correct: due to a bug, or
something else, we might trigger a GC beyond the precomputed trigger.

So this change also adds a "triggered" field to gcControllerState that
tracks the point at which a GC actually triggered. This is independent
of the precomputed trigger, so it's fine for the heap goal calculation
to rely on it. It also turns out, there's more than just that one place
where we really should be using the actual trigger point, so this change
fixes those up too.

Also, because the heap minimum is set by the goal and not the trigger,
the maximum trigger calculation now happens *after* the goal is set, so
the maximum trigger actually does what I originally intended (and what
the comment says): at small heaps, the pacer picks 95% of the runway as
the maximum trigger. Currently, the pacer picks a small trigger based
on a not-yet-rounded-up heap goal, so the trigger gets rounded up to the
goal, and as per the "ensure there's some runway" check, the runway ends
up at always being 64 KiB. That check is supposed to be for exceptional
circumstances, not the status quo. There's a test introduced in the last
CL that needs to be updated to accomodate this slight change in
behavior.

So, this all sounds like a lot that changed, but what we're talking about
here are really, really tight corner cases that arise from situations
outside of our control, like pathologically bad behavior on the part of
an OS or CPU. Even in these corner cases, it's very unlikely that users
will notice any difference at all. What's more important, I think, is
that the pacer behaves more closely to what all the comments describe,
and what the original intent was.

Another note: at first, one might think that computing the heap goal and
trigger dynamically introduces some raciness, but not in this CL: the heap
goal and trigger are completely static.

Allocation outside of a GC cycle may now be a bit slower than before, as
the GC trigger check is now significantly more complex. However, note
that this executes basically just as often as gcController.revise, and
that makes up for a vanishingly small part of any CPU profile. The next
CL cleans up the floating point multiplications on this path
nonetheless, just to be safe.

For #48409.

Change-Id: I280f5ad607a86756d33fb8449ad08555cbee93f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/397014
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:12:52 +00:00
Michael Anthony Knyszek
473c99643f runtime: rewrite pacer max trigger calculation
Currently the maximum trigger calculation is totally incorrect with
respect to the comment above it and its intent. This change rectifies
this mistake.

For #48409.

Change-Id: Ifef647040a8bdd304dd327695f5f315796a61a74
Reviewed-on: https://go-review.googlesource.com/c/go/+/398834
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:12:45 +00:00
Michael Anthony Knyszek
375d696ddf runtime: move inconsistent memstats into gcController
Fundamentally, all of these memstats exist to serve the runtime in
managing memory. For the sake of simpler testing, couple these stats
more tightly with the GC.

This CL was mostly done automatically. The fields had to be moved
manually, but the references to the fields were updated via

    gofmt -w -r 'memstats.<field> -> gcController.<field>' *.go

For #48409.

Change-Id: Ic036e875c98138d9a11e1c35f8c61b784c376134
Reviewed-on: https://go-review.googlesource.com/c/go/+/397678
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-05-03 15:12:38 +00:00
Michael Anthony Knyszek
4649a43903 runtime: track how much memory is mapped in the Ready state
This change adds a field to memstats called mappedReady that tracks how
much memory is in the Ready state at any given time. In essence, it's
the total memory usage by the Go runtime (with one exception which is
documented). Essentially, all memory mapped read/write that has either
been paged in or will soon.

To make tracking this not involve the many different stats that track
mapped memory, we track this statistic at a very low level. The downside
of tracking this statistic at such a low level is that it managed to
catch lots of situations where the runtime wasn't fully accounting for
memory. This change rectifies these situations by always accounting for
memory that's mapped in some way (i.e. always passing a sysMemStat to a
mem.go function), with *two* exceptions.

Rectifying these situations means also having the memory mapped during
testing being accounted for, so that tests (i.e. ReadMemStats) that
ultimately check mappedReady continue to work correctly without special
exceptions. We choose to simply account for this memory in other_sys.

Let's talk about the exceptions. The first is the arenas array for
finding heap arena metadata from an address is mapped as read/write in
one large chunk. It's tens of MiB in size. On systems with demand
paging, we assume that the whole thing isn't paged in at once (after
all, it maps to the whole address space, and it's exceedingly difficult
with today's technology to even broach having as much physical memory as
the total address space). On systems where we have to commit memory
manually, we use a two-level structure.

Now, the reason why this is an exception is because we have no mechanism
to track what memory is paged in, and we can't just account for the
entire thing, because that would *look* like an enormous overhead.
Furthermore, this structure is on a few really, really critical paths in
the runtime, so doing more explicit tracking isn't really an option. So,
we explicitly don't and call sysAllocOS to map this memory.

The second exception is that we call sysFree with no accounting to clean
up address space reservations, or otherwise to throw out mappings we
don't care about. In this case, also drop down to a lower level and call
sysFreeOS to explicitly avoid accounting.

The third exception is debuglog allocations. That is purely a debugging
facility and ideally we want it to have as small an impact on the
runtime as possible. If we include it in mappedReady calculations, it
could cause GC pacing shifts in future CLs, especailly if one increases
the debuglog buffer sizes as a one-off.

As of this CL, these are the only three places in the runtime that would
pass nil for a stat to any of the functions in mem.go. As a result, this
CL makes sysMemStats mandatory to facilitate better accounting in the
future. It's now much easier to grep and find out where accounting is
explicitly elided, because one doesn't have to follow the trail of
sysMemStat nil pointer values, and can just look at the function name.

For #48409.

Change-Id: I274eb467fc2603881717482214fddc47c9eaf218
Reviewed-on: https://go-review.googlesource.com/c/go/+/393402
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03 15:12:21 +00:00
Michael Anthony Knyszek
986a31053d runtime: add a non-functional memory limit to the pacer
Nothing much to see here, just some plumbing to make latter CLs smaller
and clearer.

For #48409.

Change-Id: Ide23812d5553e0b6eea5616c277d1a760afb4ed0
Reviewed-on: https://go-review.googlesource.com/c/go/+/393401
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-03 15:12:04 +00:00
Michael Anthony Knyszek
0feebe6eb5 runtime: add byte count parser for GOMEMLIMIT
This change adds a parser for the GOMEMLIMIT environment variable's
input. This environment variable accepts a number followed by an
optional prefix expressing the unit. Acceptable units include
B, KiB, MiB, GiB, TiB, where *iB is a power-of-two byte unit.

For #48409.

Change-Id: I6a3b4c02b175bfcf9c4debee6118cf5dda93bb6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/393400
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
2022-05-03 15:11:55 +00:00
Michael Knyszek
01359b4681 runtime: add GC CPU utilization limiter
This change adds a GC CPU utilization limiter to the GC. It disables
assists to ensure GC CPU utilization remains under 50%. It uses a leaky
bucket mechanism that will only fill if GC CPU utilization exceeds 50%.
Once the bucket begins to overflow, GC assists are limited until the
bucket empties, at the risk of GC overshoot. The limiter is primarily
updated by assists. The scheduler may also update it, but only if the
GC is on and a few milliseconds have passed since the last update. This
second case exists to ensure that if the limiter is on, and no assists
are happening, we're still updating the limiter regularly.

The purpose of this limiter is to mitigate GC death spirals, opting to
use more memory instead.

This change turns the limiter on always. In practice, 50% overall GC CPU
utilization is very difficult to hit unless you're trying; even the most
allocation-heavy applications with complex heaps still need to do
something with that memory. Note that small GOGC values (i.e.
single-digit, or low teens) are more likely to trigger the limiter,
which means the GOGC tradeoff may no longer be respected. Even so, it
should still be relatively rare.

This change also introduces the feature flag for code to support the
memory limit feature.

For #48409.

Change-Id: Ia30f914e683e491a00900fd27868446c65e5d3c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/353989
Reviewed-by: Michael Pratt <mpratt@google.com>
2022-05-03 15:11:42 +00:00
Austin Clements
123e27170a runtime: clean up escaping in tests
There are several tests in the runtime that need to force various
things to escape to the heap. This CL centralizes this functionality
into runtime.Escape, defined in export_test.

Change-Id: I2de2519661603ad46c372877a9c93efef8e7a857
Reviewed-on: https://go-review.googlesource.com/c/go/+/402178
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
2022-04-28 18:28:44 +00:00