lfstack does very unsafe things. In particular, it will not
work with nodes that live on the heap. In normal use by the runtime,
that is the case (it is only used for gc work bufs). But the lfstack
test does use heap objects. It goes through some hoops to prevent
premature deallocation, but those hoops are not enough to convince
-d=checkptr that everything is ok.
Instead, allocate the test objects outside the heap, like the runtime
does for all of its lfstack usage. Remove the lifetime workaround
from the test.
Reported in https://groups.google.com/g/golang-nuts/c/psjrUV2ZKyI
Change-Id: If611105eab6c823a4d6c105938ce145ed731781d
Reviewed-on: https://go-review.googlesource.com/c/go/+/448899
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
If TestArenaCollision cannot reserve the address range it expects to
reserve, it currently fails somewhat mysteriously. Detect this case
and skip the test. This could lead to test rot if we wind up always
skipping this test, but it's not clear that there's a better answer.
If the test does fail, we now also log what it thinks it reserved so
the failure message is more useful in debugging any issues.
Fixes#49415Fixes#54597
Change-Id: I05cf27258c1c0a7a3ac8d147f36bf8890820d59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/446877
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This adds the function "start line number" to runtime._func and
runtime.inlinedCall objects. The "start line number" is the line number
of the func keyword or TEXT directive for assembly.
Subtracting the start line number from PC line number provides the
relative line offset of a PC from the the start of the function. This
helps with source stability by allowing code above the function to move
without invalidating samples within the function.
Encoding start line rather than relative lines directly is convenient
because the pprof format already contains a start line field.
This CL uses a straightforward encoding of explictly including a start
line field in every _func and inlinedCall. It is possible that we could
compress this further in the future. e.g., functions with a prologue
usually have <line of PC 0> == <start line>. In runtime.test, 95% of
functions have <line of PC 0> == <start line>.
According to bent, this is geomean +0.83% binary size vs master and
-0.31% binary size vs 1.19.
Note that //line directives can change the file and line numbers
arbitrarily. The encoded start line is as adjusted by //line directives.
Since this can change in the middle of a function, `line - start line`
offset calculations may not be meaningful if //line directives are in
use.
For #55022.
Change-Id: Iaabbc6dd4f85ffdda294266ef982ae838cc692f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/429638
Run-TryBot: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This change adds the arenas package and a function to reflect for
allocating from an arena via reflection, but all the new API is placed
behind a GOEXPERIMENT.
For #51317.
Change-Id: I026d46294e26ab386d74625108c19a0024fbcedc
Reviewed-on: https://go-review.googlesource.com/c/go/+/423361
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change adds an API to the runtime for arenas. A later CL can
potentially export it as an experimental API, but for now, just the
runtime implementation will suffice.
The purpose of arenas is to improve efficiency, primarily by allowing
for an application to manually free memory, thereby delaying garbage
collection. It comes with other potential performance benefits, such as
better locality, a better allocation strategy, and better handling of
interior pointers by the GC.
This implementation is based on one by danscales@google.com with a few
significant differences:
* The implementation lives entirely in the runtime (all layers).
* Arena chunks are the minimum of 8 MiB or the heap arena size. This
choice is made because in practice 64 MiB appears to be way too large
of an area for most real-world use-cases.
* Arena chunks are not unmapped, instead they're placed on an evacuation
list and when there are no pointers left pointing into them, they're
allowed to be reused.
* Reusing partially-used arena chunks no longer tries to find one used
by the same P first; it just takes the first one available.
* In order to ensure worst-case fragmentation is never worse than 25%,
only types and slice backing stores whose sizes are 1/4th the size of
a chunk or less may be used. Previously larger sizes, up to the size
of the chunk, were allowed.
* ASAN, MSAN, and the race detector are fully supported.
* Sets arena chunks to fault that were deferred at the end of mark
termination (a non-public patch once did this; I don't see a reason
not to continue that).
For #51317.
Change-Id: I83b1693a17302554cb36b6daa4e9249a81b1644f
Reviewed-on: https://go-review.googlesource.com/c/go/+/423359
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
This change adds a metric to the runtime/metrics package which tracks
total mutex wait time for sync.Mutex and sync.RWMutex. The purpose of
this metric is to be able to quickly get an idea of the total mutex wait
time.
The implementation of this metric piggybacks off of the existing G
runnable tracking infrastructure, as well as the wait reason set on a G
when it goes into _Gwaiting.
Fixes#49881.
Change-Id: I4691abf64ac3574bec69b4d7d4428b1573130517
Reviewed-on: https://go-review.googlesource.com/c/go/+/427618
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
There are lots of useless buckets with too much precision. Introduce a
minimum level of precision with a minimum bucket bit. This cuts down on
the size of a time histogram dramatically (~3x). Also, pick a smaller
sub bucket count; we don't need 6% precision.
Also, rename super-buckets to buckets to more closely line up with HDR
histogram literature.
Change-Id: I199449650e4b34f2a6dca3cf1d8edb071c6655c0
Reviewed-on: https://go-review.googlesource.com/c/go/+/427615
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Use an atomic.Uint32 to represent the state of finalizer goroutine.
fingStatus will only be changed to fingWake in non fingWait state,
so it is safe to set fingRunningFinalizer status in runfinq.
name old time/op new time/op delta
Finalizer-8 592µs ± 4% 561µs ± 1% -5.22% (p=0.000 n=10+10)
FinalizerRun-8 694ns ± 6% 675ns ± 7% ~ (p=0.059 n=9+8)
Change-Id: I7e4da30cec98ce99f7d8cf4c97f933a8a2d1cae1
Reviewed-on: https://go-review.googlesource.com/c/go/+/400134
Reviewed-by: Joedian Reid <joedian@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Michael Pratt <mpratt@google.com>
For #53821
Change-Id: I686fe81268f70acc6a4c3e6b1d3ed0e07bb0d61c
Reviewed-on: https://go-review.googlesource.com/c/go/+/425775
Run-TryBot: xie cui <523516579@qq.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
I've dropped the note that sched.timeToRun is protected by sched.lock,
as it does not seem to be true.
For #53821.
Change-Id: I03f8dc6ca0bcd4ccf3ec113010a0aa39c6f7d6ef
Reviewed-on: https://go-review.googlesource.com/c/go/+/419449
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Pratt <mpratt@google.com>
Atomic operations are used even during STW for consistency.
For #53821.
Change-Id: Ibe7afe5cf893b1288ce24fc96b7691b1f81754ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/417775
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Rename g variables to gp for consistency.
Change-Id: I09ecdc7e8439637bc0e32f9c5f96f515e6436362
Reviewed-on: https://go-review.googlesource.com/c/go/+/418591
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
_g_, _p_, and _m_ are primarily vestiges of the C version of the
runtime, while today we prefer Go-style variable names (generally gp,
pp, and mp).
This change replaces all remaining uses of _p_ with pp. These are all
trivial replacements (i.e., no conflicts). That said, there are several
functions that refer to two different Ps at once. There the naming
convention is generally that pp refers to the local P, and p2 refers to
the other P we are accessing.
Change-Id: I205b801be839216972e7644b1fbeacdbf2612859
Reviewed-on: https://go-review.googlesource.com/c/go/+/306674
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
metricsSema protects the metrics map. The map implementation is race
instrumented regardless of which package is it called from.
semacquire/semrelease are not automatically race instrumented, so we can
trigger race false positives without manually annotating our lock
acquire and release.
See similar instrumentation on trace.shutdownSema and reflectOffs.lock.
Fixes#53542.
Change-Id: Ia3fd239ac860e037d09c7cb9c4ad267391e70705
Reviewed-on: https://go-review.googlesource.com/c/go/+/414517
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
This test originally existed as two tests in test/locklinear.go, but
this checked against actual locks and was flaky. The test was checking
a property of a deep part of the runtime but from a much higher level,
and it's easy for nondeterminism due to scheduling to completely mess
that up, especially on an oversubscribed system.
That test was then moved to the sync package with a more rigorous
testing methodology, but it could still flake pretty easily.
Finally, this CL makes semtable more testable, exports it in
export_test.go, then writes a very direct scalability test for exactly
the situation the original test described. As far as I can tell, this is
much, much more stable, because it's single-threaded and is just
checking exactly the algorithm we need to check.
Don't bother trying to bring in a test that checks for O(log n) behavior
on the other kind of iteration. It'll be perpetually flaky because the
underlying data structure is a treap, so it's only _expected_ to be
O(log n), but it's very easy for it to get unlucky without a large
number of iterations that's too much for a simple test.
Fixes#53381.
Change-Id: Ia1cd2d2b0e36d552d5a8ae137077260a16016602
Reviewed-on: https://go-review.googlesource.com/c/go/+/412875
Reviewed-by: Michael Pratt <mpratt@google.com>
Currently the GC CPU limiter consumes CPU time from a few pools, but
because the events that flush to those pools may overlap, rather than be
strictly contained within, the update window for the GC CPU limiter, the
limiter's accounting is ultimately sloppy.
This sloppiness complicates accounting for idle time more completely,
and makes reasoning about the transient behavior of the GC CPU limiter
much more difficult.
To remedy this, this CL adds a field to the P struct that tracks the
start time of any in-flight event the limiter might care about, along
with information about the nature of that event. This timestamp is
managed atomically so that the GC CPU limiter can come in and perform a
read of the partial CPU time consumed by a given event. The limiter also
updates the timestamp so that only what's left over is flushed by the
event itself when it completes.
The end result of this change is that, since the GC CPU limiter is aware
of all past completed events, and all in-flight events, it can much more
accurately collect the CPU time of events since the last update. There's
still the possibility for skew, but any leftover time will be captured
in the following update, and the magnitude of this leftover time is
effectively bounded by the update period of the GC CPU limiter, which is
much easier to consider.
One caveat of managing this timestamp-type combo atomically is that they
need to be packed in 64 bits. So, this CL gives up the top 3 bits of the
timestamp and places the type information there. What this means is we
effectively have only a 61-bit resolution timestamp. This is fine when
the top 3 bits are the same between calls to nanotime, but becomes a
problem on boundaries when those 3 bits change. These cases may cause
hiccups in the GC CPU limiter by not accounting for some source of CPU
time correctly, but with 61 bits of resolution this should be extremely
rare. The rate of update is on the order of milliseconds, so at worst
the runtime will be off of any given measurement by only a few
CPU-milliseconds (and this is directly bounded by the rate of update).
We're probably more inaccurate from the fact that we don't measure real
CPU time but only approximate it.
For #52890.
Change-Id: I347f30ac9e2ba6061806c21dfe0193ef2ab3bbe9
Reviewed-on: https://go-review.googlesource.com/c/go/+/410120
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change forces mark and scavenge assists to be cancelled early if
the limiter is enabled. This avoids goroutines getting stuck in really
long assists if the limiter happens to be disabled when they first come
into the assist. This can get especially bad for mark assists, which, in
dire situations, can end up "owing" the GC a really significant debt.
For #52890.
Change-Id: I4bfaa76b8de3e167d49d2ffd8bc2127b87ea566a
Reviewed-on: https://go-review.googlesource.com/c/go/+/408816
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
At the expense of performance (having to update another atomic counter)
this change makes CPU limiter assist time much less error-prone to
manage. There are currently a number of issues with respect to how
scavenge assist time is treated, and this change resolves those by just
having the limiter maintain its own internal pool that's drained on each
update.
While we're here, clear the measured assist time each cycle, which was
the impetus for the change.
Change-Id: I84c513a9f012b4007362a33cddb742c5779782b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/404304
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Measure the average stack size used by goroutines at every GC. When
starting a new goroutine, allocate an initial goroutine stack of that
average size. Intuition is that we'll waste at most 2x in stack space
because only half the goroutines can be below average. In turn, we
avoid some of the early stack growth / copying needed in the average
case.
More details in the design doc at: https://docs.google.com/document/d/1YDlGIdVTPnmUiTAavlZxBI1d9pwGQgZT7IKFKlIXohQ/edit?usp=sharing
name old time/op new time/op delta
Issue18138 95.3µs ± 0% 67.3µs ±13% -29.35% (p=0.000 n=9+10)
Fixes#18138
Change-Id: Iba34d22ed04279da7e718bbd569bbf2734922eaa
Reviewed-on: https://go-review.googlesource.com/c/go/+/345889
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
I landed the bottom CL of my stack without rebasing or retrying trybots,
but in the rebase "escape" was removed in favor of "Escape."
Change-Id: Icdc4d8de8b6ebc782215f2836cd191377cc211df
Reviewed-on: https://go-review.googlesource.com/c/go/+/403755
Reviewed-by: Bryan Mills <bcmills@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Currently the runtime's scavenging algorithm involves running from the
top of the heap address space to the bottom (or as far as it gets) once
per GC cycle. Once it treads some ground, it doesn't tread it again
until the next GC cycle.
This works just fine for the background scavenger, for heap-growth
scavenging, and for debug.FreeOSMemory. However, it breaks down in the
face of a memory limit for small heaps in the tens of MiB. Basically,
because the scavenger never retreads old ground, it's completely
oblivious to new memory it could scavenge, and that it really *should*
in the face of a memory limit.
Also, every time some thread goes to scavenge in the runtime, it
reserves what could be a considerable amount of address space, hiding it
from other scavengers.
This change modifies and simplifies the implementation overall. It's
less code with complexities that are much better encapsulated. The
current implementation iterates optimistically over the address space
looking for memory to scavenge, keeping track of what it last saw. The
new implementation does the same, but instead of directly iterating over
pages, it iterates over chunks. It maintains an index of chunks (as a
bitmap over the address space) that indicate which chunks may contain
scavenge work. The page allocator populates this index, while scavengers
consume it and iterate over it optimistically.
This has a two key benefits:
1. Scavenging is much simpler: find a candidate chunk, and check it,
essentially just using the scavengeOne fast path. There's no need for
the complexity of iterating beyond one chunk, because the index is
lock-free and already maintains that information.
2. If pages are freed to the page allocator (always guaranteed to be
unscavenged), the page allocator immediately notifies all scavengers
of the new source of work, avoiding the hiding issues of the old
implementation.
One downside of the new implementation, however, is that it's
potentially more expensive to find pages to scavenge. In the past, if
a single page would become free high up in the address space, the
runtime's scavengers would ignore it. Now that scavengers won't, one or
more scavengers may need to iterate potentially across the whole heap to
find the next source of work. For the background scavenger, this just
means a potentially less reactive scavenger -- overall it should still
use the same amount of CPU. It means worse overheads for memory limit
scavenging, but that's not exactly something with a baseline yet.
In practice, this shouldn't be too bad, hopefully since the chunk index
is extremely compact. For a 48-bit address space, the index is only 8
MiB in size at worst, but even just one physical page in the index is
able to support up to 128 GiB heaps, provided they aren't terribly
sparse. On 32-bit platforms, the index is only 128 bytes in size.
For #48409.
Change-Id: I72b7e74365046b18c64a6417224c5d85511194fb
Reviewed-on: https://go-review.googlesource.com/c/go/+/399474
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change makes the memory limit functional by including it in the
heap goal calculation. Specifically, we derive a heap goal from the
memory limit, and compare that to the GOGC-based goal. If the goal based
on the memory limit is lower, we prefer that.
To derive the memory limit goal, the heap goal calculation now takes
a few additional parameters as input. As a result, the heap goal, in the
presence of a memory limit, may change dynamically. The consequences of
this are that different parts of the runtime can have different views of
the heap goal; this is OK. What's important is that all of the runtime
is able to observe the correct heap goal for the moment it's doing
something that affects it, like anything that should trigger a GC cycle.
On the topic of triggering a GC cycle, this change also allows any
manually managed memory allocation from the page heap to trigger a GC.
So, specifically workbufs, unrolled GC scan programs, and goroutine
stacks. The reason for this is that now non-heap memory can effect the
trigger or the heap goal.
Most sources of non-heap memory only change slowly, like GC pointer
bitmaps, or change in response to explicit function calls like
GOMAXPROCS. Note also that unrolled GC scan programs and workbufs are
really only relevant during a GC cycle anyway, so they won't actually
ever trigger a GC. Our primary target here is goroutine stacks.
Goroutine stacks can increase quickly, and this is currently totally
independent of the GC cycle. Thus, if for example a goroutine begins to
recurse suddenly and deeply, then even though the heap goal and trigger
react, we might not notice until its too late. As a result, we need to
trigger a GC cycle.
We do this trigger in allocManual instead of in stackalloc because it's
far more general. We ultimately care about memory that's mapped
read/write and not returned to the OS, which is much more the domain of
the page heap than the stack allocator. Furthermore, there may be new
sources of memory manual allocation in the future (e.g. arenas) that
need to trigger a GC if necessary. As such, I'm inclined to leave the
trigger in allocManual as an extra defensive measure.
It's worth noting that because goroutine stacks do not behave quite as
predictably as other non-heap memory, there is the potential for the
heap goal to swing wildly. Fortunately, goroutine stacks that haven't
been set up to shrink by the last GC cycle will not shrink until after
the next one. This reduces the amount of possible churn in the heap goal
because it means that shrinkage only happens once per goroutine, per GC
cycle. After all the goroutines that should shrink did, then goroutine
stacks will only grow. The shrink mechanism is analagous to sweeping,
which is incremental and thus tends toward a steady amount of heap
memory used. As a result, in practice, I expect this to be a non-issue.
Note that if the memory limit is not set, this change should be a no-op.
For #48409.
Change-Id: Ie06d10175e5e36f9fb6450e26ed8acd3d30c681c
Reviewed-on: https://go-review.googlesource.com/c/go/+/394221
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
As it stands, the heap goal and the trigger are set once by
gcController.commit, and then read out of gcController. However with the
coming memory limit we need the GC to be able to respond to changes in
non-heap memory. The simplest way of achieving this is to compute the
heap goal and its associated trigger dynamically.
In order to make this easier to implement, the GC trigger is now based
on the heap goal, as opposed to the status quo of computing both
simultaneously. In many cases we just want the heap goal anyway, not
both, but we definitely need the goal to compute the trigger, because
the trigger's bounds are entirely based on the goal (the initial runway
is not). A consequence of this is that we can't rely on the trigger to
enforce a minimum heap size anymore, and we need to lift that up
directly to the goal. Specifically, we need to lift up any part of the
calculation that *could* put the trigger ahead of the goal. Luckily this
is just the heap minimum and minimum sweep distance. In the first case,
the pacer may behave slightly differently, as the heap minimum is no
longer the minimum trigger, but the actual minimum heap goal. In the
second case it should be the same, as we ensure the additional runway
for sweeping is added to both the goal *and* the trigger, as before, by
computing that in gcControllerState.commit.
There's also another place we update the heap goal: if a GC starts and
we triggered beyond the goal, we always ensure there's some runway.
That calculation uses the current trigger, which violates the rule of
keeping the goal based on the trigger. Notice, however, that using the
precomputed trigger for this isn't even quite correct: due to a bug, or
something else, we might trigger a GC beyond the precomputed trigger.
So this change also adds a "triggered" field to gcControllerState that
tracks the point at which a GC actually triggered. This is independent
of the precomputed trigger, so it's fine for the heap goal calculation
to rely on it. It also turns out, there's more than just that one place
where we really should be using the actual trigger point, so this change
fixes those up too.
Also, because the heap minimum is set by the goal and not the trigger,
the maximum trigger calculation now happens *after* the goal is set, so
the maximum trigger actually does what I originally intended (and what
the comment says): at small heaps, the pacer picks 95% of the runway as
the maximum trigger. Currently, the pacer picks a small trigger based
on a not-yet-rounded-up heap goal, so the trigger gets rounded up to the
goal, and as per the "ensure there's some runway" check, the runway ends
up at always being 64 KiB. That check is supposed to be for exceptional
circumstances, not the status quo. There's a test introduced in the last
CL that needs to be updated to accomodate this slight change in
behavior.
So, this all sounds like a lot that changed, but what we're talking about
here are really, really tight corner cases that arise from situations
outside of our control, like pathologically bad behavior on the part of
an OS or CPU. Even in these corner cases, it's very unlikely that users
will notice any difference at all. What's more important, I think, is
that the pacer behaves more closely to what all the comments describe,
and what the original intent was.
Another note: at first, one might think that computing the heap goal and
trigger dynamically introduces some raciness, but not in this CL: the heap
goal and trigger are completely static.
Allocation outside of a GC cycle may now be a bit slower than before, as
the GC trigger check is now significantly more complex. However, note
that this executes basically just as often as gcController.revise, and
that makes up for a vanishingly small part of any CPU profile. The next
CL cleans up the floating point multiplications on this path
nonetheless, just to be safe.
For #48409.
Change-Id: I280f5ad607a86756d33fb8449ad08555cbee93f9
Reviewed-on: https://go-review.googlesource.com/c/go/+/397014
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Currently the maximum trigger calculation is totally incorrect with
respect to the comment above it and its intent. This change rectifies
this mistake.
For #48409.
Change-Id: Ifef647040a8bdd304dd327695f5f315796a61a74
Reviewed-on: https://go-review.googlesource.com/c/go/+/398834
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Fundamentally, all of these memstats exist to serve the runtime in
managing memory. For the sake of simpler testing, couple these stats
more tightly with the GC.
This CL was mostly done automatically. The fields had to be moved
manually, but the references to the fields were updated via
gofmt -w -r 'memstats.<field> -> gcController.<field>' *.go
For #48409.
Change-Id: Ic036e875c98138d9a11e1c35f8c61b784c376134
Reviewed-on: https://go-review.googlesource.com/c/go/+/397678
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change adds a field to memstats called mappedReady that tracks how
much memory is in the Ready state at any given time. In essence, it's
the total memory usage by the Go runtime (with one exception which is
documented). Essentially, all memory mapped read/write that has either
been paged in or will soon.
To make tracking this not involve the many different stats that track
mapped memory, we track this statistic at a very low level. The downside
of tracking this statistic at such a low level is that it managed to
catch lots of situations where the runtime wasn't fully accounting for
memory. This change rectifies these situations by always accounting for
memory that's mapped in some way (i.e. always passing a sysMemStat to a
mem.go function), with *two* exceptions.
Rectifying these situations means also having the memory mapped during
testing being accounted for, so that tests (i.e. ReadMemStats) that
ultimately check mappedReady continue to work correctly without special
exceptions. We choose to simply account for this memory in other_sys.
Let's talk about the exceptions. The first is the arenas array for
finding heap arena metadata from an address is mapped as read/write in
one large chunk. It's tens of MiB in size. On systems with demand
paging, we assume that the whole thing isn't paged in at once (after
all, it maps to the whole address space, and it's exceedingly difficult
with today's technology to even broach having as much physical memory as
the total address space). On systems where we have to commit memory
manually, we use a two-level structure.
Now, the reason why this is an exception is because we have no mechanism
to track what memory is paged in, and we can't just account for the
entire thing, because that would *look* like an enormous overhead.
Furthermore, this structure is on a few really, really critical paths in
the runtime, so doing more explicit tracking isn't really an option. So,
we explicitly don't and call sysAllocOS to map this memory.
The second exception is that we call sysFree with no accounting to clean
up address space reservations, or otherwise to throw out mappings we
don't care about. In this case, also drop down to a lower level and call
sysFreeOS to explicitly avoid accounting.
The third exception is debuglog allocations. That is purely a debugging
facility and ideally we want it to have as small an impact on the
runtime as possible. If we include it in mappedReady calculations, it
could cause GC pacing shifts in future CLs, especailly if one increases
the debuglog buffer sizes as a one-off.
As of this CL, these are the only three places in the runtime that would
pass nil for a stat to any of the functions in mem.go. As a result, this
CL makes sysMemStats mandatory to facilitate better accounting in the
future. It's now much easier to grep and find out where accounting is
explicitly elided, because one doesn't have to follow the trail of
sysMemStat nil pointer values, and can just look at the function name.
For #48409.
Change-Id: I274eb467fc2603881717482214fddc47c9eaf218
Reviewed-on: https://go-review.googlesource.com/c/go/+/393402
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Nothing much to see here, just some plumbing to make latter CLs smaller
and clearer.
For #48409.
Change-Id: Ide23812d5553e0b6eea5616c277d1a760afb4ed0
Reviewed-on: https://go-review.googlesource.com/c/go/+/393401
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change adds a parser for the GOMEMLIMIT environment variable's
input. This environment variable accepts a number followed by an
optional prefix expressing the unit. Acceptable units include
B, KiB, MiB, GiB, TiB, where *iB is a power-of-two byte unit.
For #48409.
Change-Id: I6a3b4c02b175bfcf9c4debee6118cf5dda93bb6f
Reviewed-on: https://go-review.googlesource.com/c/go/+/393400
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
This change adds a GC CPU utilization limiter to the GC. It disables
assists to ensure GC CPU utilization remains under 50%. It uses a leaky
bucket mechanism that will only fill if GC CPU utilization exceeds 50%.
Once the bucket begins to overflow, GC assists are limited until the
bucket empties, at the risk of GC overshoot. The limiter is primarily
updated by assists. The scheduler may also update it, but only if the
GC is on and a few milliseconds have passed since the last update. This
second case exists to ensure that if the limiter is on, and no assists
are happening, we're still updating the limiter regularly.
The purpose of this limiter is to mitigate GC death spirals, opting to
use more memory instead.
This change turns the limiter on always. In practice, 50% overall GC CPU
utilization is very difficult to hit unless you're trying; even the most
allocation-heavy applications with complex heaps still need to do
something with that memory. Note that small GOGC values (i.e.
single-digit, or low teens) are more likely to trigger the limiter,
which means the GOGC tradeoff may no longer be respected. Even so, it
should still be relatively rare.
This change also introduces the feature flag for code to support the
memory limit feature.
For #48409.
Change-Id: Ia30f914e683e491a00900fd27868446c65e5d3c2
Reviewed-on: https://go-review.googlesource.com/c/go/+/353989
Reviewed-by: Michael Pratt <mpratt@google.com>
There are several tests in the runtime that need to force various
things to escape to the heap. This CL centralizes this functionality
into runtime.Escape, defined in export_test.
Change-Id: I2de2519661603ad46c372877a9c93efef8e7a857
Reviewed-on: https://go-review.googlesource.com/c/go/+/402178
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Austin Clements <austin@google.com>
Auto-Submit: Austin Clements <austin@google.com>
This change refactors the scavenger into a type whose methods represent
the actual function and scheduling of the scavenger. It also stubs out
access to global state in order to make it testable.
This change thus also adds a test for the scavenger. In writing this
test, I discovered the lack of a behavior I expected: if the
pageAlloc.scavenge returns < the bytes requested scavenged, that means
the heap is exhausted. This has been true this whole time, but was not
documented or explicitly relied upon. This change rectifies that. In
theory this means the scavenger could spin in run() indefinitely (as
happened in the test) if shouldStop never told it to stop. In practice,
shouldStop fires long before the heap is exhausted, but for future
changes it may be important. At the very least it's good to be
intentional about these things.
While we're here, I also moved the call to stopTimer out of wake and
into sleep. There's no reason to add more operations to a context that's
already precarious (running without a P on sysmon).
Change-Id: Ib31b86379fd9df84f25ae282734437afc540da5c
Reviewed-on: https://go-review.googlesource.com/c/go/+/384734
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
This change reduces the maximum number of idle mark workers during
periodic (currently every 2 minutes) GC cycles to 1.
Idle mark workers soak up all available and unused Ps, up to GOMAXPROCS.
While this provides some throughput and latency benefit in general, it
can cause what appear to be massive CPU utilization spikes in otherwise
idle applications. This is mostly an issue for *very* idle applications,
ones idle enough to trigger periodic GC cycles. This spike also tends to
interact poorly with auto-scaling systems, as the system might assume
the load average is very low and suddenly see a massive burst in
activity.
The result of this change is not to bring down this 100% (of GOMAXPROCS)
CPU utilization spike to 0%, but rather
min(25% + 1/GOMAXPROCS*100%, 100%)
Idle mark workers also do incur a small latency penalty as they must be
descheduled for other work that might pop up. Luckily the runtime is
pretty good about getting idle mark workers off of Ps, so in general
the latency benefit from shorter GC cycles outweighs this cost. But, the
cost is still non-zero and may be more significant in idle applications
that aren't invoking assists and write barriers quite as often.
We can't completely eliminate idle mark workers because they're
currently necessary for GC progress in some circumstances. Namely,
they're critical for progress when all we have is fractional workers. If
a fractional worker meets its quota, and all user goroutines are blocked
directly or indirectly on a GC cycle (via runtime.GOMAXPROCS, or
runtime.GC), the program may deadlock without GC workers, since the
fractional worker will go to sleep with nothing to wake it.
Fixes#37116.
For #44163.
Change-Id: Ib74793bb6b88d1765c52d445831310b0d11ef423
Reviewed-on: https://go-review.googlesource.com/c/go/+/393394
Reviewed-by: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
There are a few tests of the scheduler run queue API that allocate a
local []g and test using those G's. However, the run queue API
frequently converts between *g and guintptr, which is safe for "real"
Gs because they're heap-allocated and hence don't move, but if these
tests get a stack movement while holding one of these local *g's as a
guintptr, it won't get updated and the test will fail.
Updates #48297.
Change-Id: Ifd424147ce1a1b53732ff0cf55a81df1a9beeb3b
Reviewed-on: https://go-review.googlesource.com/c/go/+/402157
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Support fastrand64 in the runtime, although fastrand uses wyrand to generate 64-bit random number, it still returns uint32. In some cases, we need to generate a 64-bit random number, the new API would be faster and easier to use, and at least we can use the new function in these places:
src/net/dnsclient.go:randInt()
src/hash/maphash/maphash.go:MakeSeed()
src/runtime/map.go:mapiterinit()
name time/op
Fastrand-16 0.09ns ± 5%
Fastrand64-16 0.09ns ± 6%
Change-Id: Ibb97378c7ca59bc7dc15535d4872fa58ea112e6a
Reviewed-on: https://go-review.googlesource.com/c/go/+/400734
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@google.com>
A future change to gofmt will rewrite
// Doc comment.
//go:foo
to
// Doc comment.
//
//go:foo
Apply that change preemptively to all comments (not necessarily just doc comments).
For #51082.
Change-Id: Iffe0285418d1e79d34526af3520b415a12203ca9
Reviewed-on: https://go-review.googlesource.com/c/go/+/384260
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Now that Go 1.18 has been released, remove the old pacer.
Change-Id: Ie7a7596d67f3fc25d3f375a08fc75eafac2eb834
Reviewed-on: https://go-review.googlesource.com/c/go/+/393396
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
If something goes horribly wrong with the assumptions surrounding a
piController, its internal error state might accumulate in an unbounded
manner. In practice this means unexpected Inf and NaN values.
Avoid this by identifying cases where the error overflows and resetting
controller state.
In the scavenger, this case is much more likely. All that has to happen
is the proportional relationship between sleep time and estimated CPU
usage has to break down. Unfortunately because we're just measuring
monotonic time for all this, there are lots of ways it could happen,
especially in an oversubscribed system. In these cases, just fall back
on a conservative pace for scavenging and try to wait out the issue.
In the pacer I'm pretty sure this is impossible. Because we wire the
output of the controller to the input, the response is very directly
correlated, so it's impossible for the controller's core assumption to
break down.
While we're in the pacer, add more detail about why that controller is
even there, as well as its purpose.
Finally, let's be proactive about other sources of overflow, namely
overflow from a very large input value. This change adds a check after
the first few operations to detect overflow issues from the input,
specifically the multiplication.
No tests for the pacer because I was unable to actually break the
pacer's controller under a fuzzer, and no tests for the scavenger because
it is not really in a testable state.
However:
* This change includes a fuzz test for the piController.
* I broke out the scavenger code locally and fuzz tested it, confirming
that the patch eliminates the original failure mode.
* I tested that on a local heap-spike test, the scavenger continues
operating as expected under normal conditions.
Fixes#51061.
Change-Id: I02a01d2dbf0eb9d2a8a8e7274d4165c2b6a3415a
Reviewed-on: https://go-review.googlesource.com/c/go/+/383954
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
There was an off-by-one error in the time histogram buckets calculation
that caused the linear sub-buckets distances to be off by 2x.
The fix was trivial, but in writing tests I realized there was a much
simpler way to express the calculation for the histogram buckets, and
took the opportunity to do that here. The new bucket calculation also
fixes the bug.
Fixes#50732.
Change-Id: Idae89986de1c415ee4e148f778e0e101ca003ade
Reviewed-on: https://go-review.googlesource.com/c/go/+/380094
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Emmanuel Odeke <emmanuel@orijtech.com>
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TestFutexsleep was originally created in CL 7876043 as a
regression test for buggy division logic in futexsleep. Several months
later CL 11575044 moved this logic to timediv (called by futexsleep).
This test calls runtime.Futexsleep, which temporarily disables
asynchronous preemption. Unfortunately, TestFutexSleep calls this from
multiple goroutines, creating a race condition that may result in
asynchronous preemption remaining disabled for the remainder of the
process lifetime.
We could fix this by moving the async preemption disable to the main
test function, however this test has had a history of flakiness. As an
alternative, this CL replaces the test wholesale with a new test for
timediv, covering the overflow logic without the difficulty of dealing
with futex.
Fixes#50749.
Change-Id: If9e1dac63ef1535adb49f9a9ffcaff99b9135895
Reviewed-on: https://go-review.googlesource.com/c/go/+/380058
Trust: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
And then revert the bootstrap cmd directories and certain testdata.
And adjust tests as needed.
Not reverting the changes in std that are bootstrapped,
because some of those changes would appear in API docs,
and we want to use any consistently.
Instead, rewrite 'any' to 'interface{}' in cmd/dist for those directories
when preparing the bootstrap copy.
A few files changed as a result of running gofmt -w
not because of interface{} -> any but because they
hadn't been updated for the new //go:build lines.
Fixes#49884.
Change-Id: Ie8045cba995f65bd79c694ec77a1b3d1fe01bb09
Reviewed-on: https://go-review.googlesource.com/c/go/+/368254
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Robert Griesemer <gri@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
In iOS <14, the address space is strictly limited to 8 GiB, or 33 bits.
As a result, the page allocator also assumes all heap memory lives in
this region. This is especially necessary because the page allocator has
a PROT_NONE mapping proportional to the size of the usable address
space, so this keeps that mapping very small.
However starting with iOS 14, this restriction is relaxed, and mmap may
start returning addresses outside of the <14 range. Today this means
that in iOS 14 and later, users experience an error in the page
allocator when a heap arena is mapped outside of the old range.
This change increases the ios/arm64 heapAddrBits to 40 while
simultaneously making ios/arm64 use the 64-bit pagealloc implementation
(with reservations and incremental mapping) to accommodate both iOS
versions <14 and 14+.
Once iOS <14 is deprecated, we can remove these exceptions and treat
ios/arm64 like any other arm64 platform.
This change also makes the BaseChunkIdx expression a little bit easier
to read, while we're here.
Fixes#46860.
Change-Id: I13865f799777739109585f14f1cc49d6d57e096b
Reviewed-on: https://go-review.googlesource.com/c/go/+/344401
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TestTinyAllocIssue37262 assumes that all of its allocations will come
from the same tiny allocator (that is, the same P), and that nothing
else will allocate from that tiny allocator while it's running. It can
fail incorrectly if these assumptions aren't met.
Fix this potential test flakiness by disabling preemption during this
test.
As far as I know, this has never happened on the builders. It was
found by mayMoreStackPreempt.
Change-Id: I59f993e0bdbf46a9add842d0e278415422c3f804
Reviewed-on: https://go-review.googlesource.com/c/go/+/366994
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
This change modifies the scavenger to no longer hold the heap lock while
actively scavenging pages. To achieve this, the change also:
* Reverses the locking behavior of the (*pageAlloc).scavenge API, to
only acquire the heap lock when necessary.
* Introduces a new lock on the scavenger-related fields in a pageAlloc
so that access to those fields doesn't require the heap lock. There
are a few places in the scavenge path, notably reservation, that
requires synchronization. The heap lock is far too heavy handed for
this case.
* Changes the scavenger to marks pages that are actively being scavenged
as allocated, and "frees" them back to the page allocator the usual
way.
* Lifts the heap-growth scavenging code out of mheap.grow, where the
heap lock is held, and into allocSpan, just after the lock is
released. Releasing the lock during mheap.grow is not feasible if we
want to ensure that allocation always makes progress (post-growth,
another allocator could come in and take all that space, forcing the
goroutine that just grew the heap to do so again).
This change means that the scavenger now must do more work for each
scavenge, but it is also now much more scalable. Although in theory it's
not great by always taking the locked paths in the page allocator, it
takes advantage of some properties of the allocator:
* Most of the time, the scavenger will be working with one page at a
time. The page allocator's locked path is optimized for this case.
* On the allocation path, it doesn't need to do the find operation at
all; it can go straight to setting bits for the range and updating the
summary structure.
Change-Id: Ie941d5e7c05dcc96476795c63fef74bcafc2a0f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/353974
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
This change implements the GC pacer redesign outlined in #44167 and the
accompanying design document, behind a GOEXPERIMENT flag that is on by
default.
In addition to adding the new pacer, this CL also includes code to track
and account for stack and globals scan work in the pacer and in the
assist credit system.
The new pacer also deviates slightly from the document in that it
increases the bound on the minimum trigger ratio from 0.6 (scaled by
GOGC) to 0.7. The logic behind this change is that the new pacer much
more consistently hits the goal (good!) leading to slightly less
frequent GC cycles, but _longer_ ones (in this case, bad!). It turns out
that the cost of having the GC on hurts throughput significantly (per
byte of memory used), though tail latencies can improve by up to 10%! To
be conservative, this change moves the value to 0.7 where there is a
small improvement to both throughput and latency, given the memory use.
Because the new pacer accounts for the two most significant sources of
scan work after heap objects, it is now also safer to reduce the minimum
heap size without leading to very poor amortization. This change thus
decreases the minimum heap size to 512 KiB, which corresponds to the
fact that the runtime has around 200 KiB of scannable globals always
there, up-front, providing a baseline.
Benchmark results: https://perf.golang.org/search?q=upload:20211001.6
tile38's KNearest benchmark shows a memory increase, but throughput (and
latency) per byte of memory used is better.
gopher-lua showed an increase in both CPU time and memory usage, but
subsequent attempts to reproduce this behavior are inconsistent.
Sometimes the overall performance is better, sometimes it's worse. This
suggests that the benchmark is fairly noisy in a way not captured by the
benchmarking framework itself.
biogo-igor is the only benchmark to show a significant performance loss.
This benchmark exhibits a very high GC rate, with relatively little work
to do in each cycle. The idle mark workers are quite active. In the new
pacer, mark phases are longer, mark assists are fewer, and some of that
time in mark assists has shifted to idle workers. Linux perf indicates
that the difference in CPU time can be mostly attributed to write-barrier
slow path related calls, which in turn indicates that the write barrier
being on for longer is the primary culprit. This also explains the memory
increase, as a longer mark phase leads to more memory allocated black,
surviving an extra cycle and contributing to the heap goal.
For #44167.
Change-Id: I8ac7cfef7d593e4a642c9b2be43fb3591a8ec9c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/309869
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>