Commit graph

237 commits

Author SHA1 Message Date
Austin Clements
4e1bf8ed38 runtime: add GC testing helpers for regabi signature fuzzer
This CL adds a set of helper functions for testing GC interactions.
These are intended for use in the regabi signature fuzzer, but are
generally useful for GC tests, so we make them generally available to
runtime tests.

These provide:

1. An easy way to force stack movement, for testing stack copying.

2. A simple and robust way to check the reachability of a set of
pointers.

3. A way to check what general category of memory a pointer points to,
mostly so tests can make sure they're testing what they mean to.

For #40724, but generally useful.

Change-Id: I15d33ccb3f5a792c0472a19c2cc9a8b4a9356a66
Reviewed-on: https://go-review.googlesource.com/c/go/+/305330
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-03-29 21:50:16 +00:00
Austin Clements
1ef114d12c runtime: abstract specials list iteration
The specials processing loop in mspan.sweep is about to get more
complicated and I'm too allergic to list manipulation to open code
more of it there.

Change-Id: I767a0889739da85fb2878fc06a5c55b73bf2ba7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/305551
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-03-29 21:50:14 +00:00
Michael Anthony Knyszek
bd6aeca968 runtime: prepare arenas for use incrementally
This change moves the call of sysMap from (*mheap).sysAlloc into
(*mheap).grow, so we only sysMap what we're going to use in the near
future (thanks to the curArena mechanism). The purpose of this change is
to better support systems with strict overcommit rules which generally
accept reserved memory but not prepared memory (see malloc.go for exact
descriptions of these states).

This move requires changing linearAlloc to only optionally map memory.
In one case, with mheap.heapArenaAlloc, we do want it to map memory. But
now in the other case, with mheap.arena, we don't, because we want grow
to take care of it.

The risk with this change is we may make more syscalls than before on
systems with 64 MiB arenas, but because heap growth is relatively rare
this is unlikely to be a noticable issue. We also bound the amount of
syscalls made by only extending curArena (and thus mapping) by
pallocChunkPages*pageSize which is 4 MiB.

Fixes #42612.

Change-Id: I736df696afe78ddb1a747a896caa0db8726027e5
Reviewed-on: https://go-review.googlesource.com/c/go/+/270537
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2021-03-15 20:20:51 +00:00
Matthew Dempsky
4662029264 runtime: simplify divmagic for span calculations
It's both simpler and faster to just unconditionally do two 32-bit
multiplies rather than a bunch of branching to try to avoid them.
This is safe thanks to the tight bounds derived in [1] and verified
during mksizeclasses.go.

Benchstat results below for compilebench benchmarks on my P920. See
also [2] for micro benchmarks comparing the new functions against the
originals (as well as several basic attempts at optimizing them).

name                      old time/op       new time/op       delta
Template                        295ms ± 3%        290ms ± 1%  -1.95%  (p=0.000 n=20+20)
Unicode                         113ms ± 3%        110ms ± 2%  -2.32%  (p=0.000 n=21+17)
GoTypes                         1.78s ± 1%        1.76s ± 1%  -1.23%  (p=0.000 n=21+20)
Compiler                        119ms ± 2%        117ms ± 4%  -1.53%  (p=0.007 n=20+20)
SSA                             14.3s ± 1%        13.8s ± 1%  -3.12%  (p=0.000 n=17+20)
Flate                           173ms ± 2%        170ms ± 1%  -1.64%  (p=0.000 n=20+19)
GoParser                        278ms ± 2%        273ms ± 2%  -1.92%  (p=0.000 n=20+19)
Reflect                         686ms ± 3%        671ms ± 3%  -2.18%  (p=0.000 n=19+20)
Tar                             255ms ± 2%        248ms ± 2%  -2.90%  (p=0.000 n=20+20)
XML                             335ms ± 3%        327ms ± 2%  -2.34%  (p=0.000 n=20+20)
LinkCompiler                    799ms ± 1%        799ms ± 1%    ~     (p=0.925 n=20+20)
ExternalLinkCompiler            1.90s ± 1%        1.90s ± 0%    ~     (p=0.327 n=20+20)
LinkWithoutDebugCompiler        385ms ± 1%        386ms ± 1%    ~     (p=0.251 n=18+20)
[Geo mean]                      512ms             504ms       -1.61%

name                      old user-time/op  new user-time/op  delta
Template                        286ms ± 4%        282ms ± 4%  -1.42%  (p=0.025 n=21+20)
Unicode                         104ms ± 9%        102ms ±14%    ~     (p=0.294 n=21+20)
GoTypes                         1.75s ± 3%        1.72s ± 2%  -1.36%  (p=0.000 n=21+20)
Compiler                        109ms ±11%        108ms ± 8%    ~     (p=0.187 n=21+19)
SSA                             14.0s ± 1%        13.5s ± 2%  -3.25%  (p=0.000 n=16+20)
Flate                           166ms ± 4%        164ms ± 4%  -1.34%  (p=0.032 n=19+19)
GoParser                        268ms ± 4%        263ms ± 4%  -1.71%  (p=0.011 n=18+20)
Reflect                         666ms ± 3%        654ms ± 4%  -1.77%  (p=0.002 n=18+20)
Tar                             245ms ± 5%        236ms ± 6%  -3.34%  (p=0.000 n=20+20)
XML                             320ms ± 4%        314ms ± 3%  -2.01%  (p=0.001 n=19+18)
LinkCompiler                    744ms ± 4%        747ms ± 3%    ~     (p=0.627 n=20+19)
ExternalLinkCompiler            1.71s ± 3%        1.72s ± 2%    ~     (p=0.149 n=20+20)
LinkWithoutDebugCompiler        345ms ± 6%        342ms ± 8%    ~     (p=0.355 n=20+20)
[Geo mean]                      484ms             477ms       -1.50%

[1] Daniel Lemire, Owen Kaser, Nathan Kurz. 2019. "Faster Remainder by
Direct Computation: Applications to Compilers and Software Libraries."
https://arxiv.org/abs/1902.01961

[2] https://github.com/mdempsky/benchdivmagic

Change-Id: Ie4d214e7a908b0d979c878f2d404bd56bdf374f6
Reviewed-on: https://go-review.googlesource.com/c/go/+/300994
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
Trust: Matthew Dempsky <mdempsky@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2021-03-12 18:02:59 +00:00
Joel Sing
5ca43acdb3 runtime: allow physical page aligned stacks to be allocated
Add a physPageAlignedStack boolean which if set, results in over allocation
by a physical page, the allocation being rounded to physical page alignment
and the unused memory surrounding the allocation being freed again.

OpenBSD/octeon has 16KB physical pages and requires stacks to be physical page
aligned in order for them to be remapped as MAP_STACK. This change allows Go
to work on this platform.

Based on a suggestion from mknyszek in issue #41008.

Updates #40995
Fixes #41008

Change-Id: Ia5d652292b515916db473043b41f6030094461d8
Reviewed-on: https://go-review.googlesource.com/c/go/+/266919
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-11-04 06:14:02 +00:00
Michael Anthony Knyszek
39a5ee52b9 runtime: decouple consistent stats from mcache and allow P-less update
This change modifies the consistent stats implementation to keep the
per-P sequence counter on each P instead of each mcache. A valid mcache
is not available everywhere that we want to call e.g. allocSpan, as per
issue #42339. By decoupling these two, we can add a mechanism to allow
contexts without a P to update stats consistently.

In this CL, we achieve that with a mutex. In practice, it will be very
rare for an M to update these stats without a P. Furthermore, the stats
reader also only needs to hold the mutex across the update to "gen"
since once that changes, writers are free to continue updating the new
stats generation. Contention could thus only arise between writers
without a P, and as mentioned earlier, those should be rare.

A nice side-effect of this change is that the consistent stats acquire
and release API becomes simpler.

Fixes #42339.

Change-Id: Ied74ab256f69abd54b550394c8ad7c4c40a5fe34
Reviewed-on: https://go-review.googlesource.com/c/go/+/267158
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-11-02 21:21:46 +00:00
Michael Anthony Knyszek
ac766e3718 runtime: make getMCache inlineable
This change moves the responsibility of throwing if an mcache is not
available to the caller, because the inlining cost of throw is set very
high in the compiler. Even if it was reduced down to the cost of a usual
function call, it would still be too expensive, so just move it out.

This choice also makes sense in the context of #42339 since we're going
to have to handle the case where we don't have an mcache to update stats
in a few contexts anyhow.

Also, add getMCache to the list of functions that should be inlined to
prevent future regressions.

getMCache is called on the allocation fast path and because its not
inlined actually causes a significant regression (~10%) in some
microbenchmarks.

Fixes #42305.

Change-Id: I64ac5e4f26b730bd4435ea1069a4a50f55411ced
Reviewed-on: https://go-review.googlesource.com/c/go/+/267157
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-11-02 21:10:41 +00:00
Michael Pratt
9393b5bae5 runtime: add heap lock assertions
Some functions that required holding the heap lock _or_ world stop have
been simplified to simply requiring the heap lock. This is conceptually
simpler and taking the heap lock during world stop is guaranteed to not
contend. This was only done on functions already called on the
systemstack to avoid too many extra systemstack calls in GC.

Updates #40677

Change-Id: I15aa1dadcdd1a81aac3d2a9ecad6e7d0377befdc
Reviewed-on: https://go-review.googlesource.com/c/go/+/250262
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Trust: Michael Pratt <mpratt@google.com>
2020-10-30 20:21:14 +00:00
Michael Anthony Knyszek
f77a9025f1 runtime: replace some memstats with consistent stats
This change replaces stacks_inuse, gcWorkBufInUse and
gcProgPtrScalarBitsInUse with their corresponding consistent stats. It
also adds checks to make sure the rest of the sharded stats line up with
existing stats in updatememstats.

Change-Id: I17d0bd181aedb5c55e09c8dff18cef5b2a3a14e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/247038
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 18:28:20 +00:00
Michael Anthony Knyszek
fe7ff71185 runtime: add consistent heap statistics
This change adds a global set of heap statistics which are similar
to existing memory statistics. The purpose of these new statistics
is to be able to read them and get a consistent result without stopping
the world. The goal is to eventually replace as many of the existing
memstats statistics with the sharded ones as possible.

The consistent memory statistics use a tailor-made synchronization
mechanism to allow writers (allocators) to proceed with minimal
synchronization by using a sequence counter and a global generation
counter to determine which set of statistics to update. Readers
increment the global generation counter to effectively grab a snapshot
of the statistics, and then iterate over all Ps using the sequence
counter to ensure that they may safely read the snapshotted statistics.
To keep statistics fresh, the reader also has a responsibility to merge
sets of statistics.

These consistent statistics are computed, but otherwise unused for now.
Upcoming changes will integrate them with the rest of the codebase and
will begin to phase out existing statistics.

Change-Id: I637a11f2439e2049d7dccb8650c5d82500733ca5
Reviewed-on: https://go-review.googlesource.com/c/go/+/247037
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 18:28:14 +00:00
Michael Anthony Knyszek
c5dea8f387 runtime: remove memstats.heap_idle
This statistic is updated in many places but for MemStats may be
computed from existing statistics. Specifically by definition
heap_idle = heap_sys - heap_inuse since heap_sys is all memory allocated
from the OS for use in the heap minus memory used for non-heap purposes.
heap_idle is almost the same (since it explicitly includes memory that
*could* be used for non-heap purposes) but also doesn't include memory
that's actually used to hold heap objects.

Although it has some utility as a sanity check, it complicates
accounting and we want fewer, orthogonal statistics for upcoming metrics
changes, so just drop it.

Change-Id: I40af54a38e335f43249f6e218f35088bfd4380d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/246974
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 18:10:12 +00:00
Michael Anthony Knyszek
ad863ba32a runtime: break down memstats.gc_sys
This change breaks apart gc_sys into three distinct pieces. Two of those
pieces are pieces which come from heap_sys since they're allocated from
the page heap. The rest comes from memory mapped from e.g.
persistentalloc which better fits the purpose of a sysMemStat. Also,
rename gc_sys to gcMiscSys.

Change-Id: I098789170052511e7b31edbcdc9a53e5c24573f7
Reviewed-on: https://go-review.googlesource.com/c/go/+/246973
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 18:10:04 +00:00
Michael Anthony Knyszek
8ebc58452a runtime: delineate which memstats are system stats with a type
This change modifies the type of several mstats fields to be a new type:
sysMemStat. This type has the same structure as the fields used to have.

The purpose of this change is to make it very clear which stats may be
used in various functions for accounting (usually the platform-specific
sys* functions, but there are others). Currently there's an implicit
understanding that the *uint64 value passed to these functions is some
kind of statistic whose value is atomically managed. This understanding
isn't inherently problematic, but we're about to change how some stats
(which currently use mSysStatInc and mSysStatDec) work, so we want to
make it very clear what the various requirements are around "sysStat".

This change also removes mSysStatInc and mSysStatDec in favor of a
method on sysMemStat. Note that those two functions were originally
written the way they were because atomic 64-bit adds required a valid G
on ARM, but this hasn't been the case for a very long time (since
golang.org/cl/14204, but even before then it wasn't clear if mutexes
required a valid G anymore). Today we implement 64-bit adds on ARM with
a spinlock table.

Change-Id: I4e9b37cf14afc2ae20cf736e874eb0064af086d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/246971
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 18:09:41 +00:00
Michael Anthony Knyszek
dc02578ac8 runtime: make the span allocation purpose more explicit
This change modifies mheap's span allocation API to have each caller
declare a purpose, defined as a new enum called spanAllocType.

The purpose behind this change is two-fold:
1. Tight control over who gets to allocate heap memory is, generally
   speaking, a good thing. Every codepath that allocates heap memory
   places additional implicit restrictions on the allocator. A notable
   example of a restriction is work bufs coming from heap memory: write
   barriers are not allowed in allocation paths because then we could
   have a situation where the allocator calls into the allocator.
2. Memory statistic updating is explicit. Instead of passing an opaque
   pointer for statistic updating, which places restrictions on how that
   statistic may be updated, we use the spanAllocType to determine which
   statistic to update and how.

We also take this opportunity to group all the statistic updating code
together, which should make the accounting code a little easier to
follow.

Change-Id: Ic0b0898959ba2a776f67122f0e36c9d7d60e3085
Reviewed-on: https://go-review.googlesource.com/c/go/+/246970
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:27:14 +00:00
Michael Anthony Knyszek
d677899e90 runtime: flush local_scan directly and more often
Now that local_scan is the last mcache-based statistic that is flushed
by purgecachedstats, and heap_scan and gcController.revise may be
interacted with concurrently, we don't need to flush heap_scan at
arbitrary locations where the heap is locked, and we don't need
purgecachedstats and cachestats anymore. Instead, we can flush
local_scan at the same time we update heap_live in refill, so the two
updates may share the same revise call.

Clean up unused functions, remove code that would cause the heap to get
locked in the allocSpan when it didn't need to (other than to flush
local_scan), and flush local_scan explicitly in a few important places.
Notably we need to flush local_scan whenever we flush the other stats,
but it doesn't need to be donated anywhere, so have releaseAll do the
flushing. Also, we need to flush local_scan before we set heap_scan at
the end of a GC, which was previously handled by cachestats. Just do so
explicitly -- it's not much code and it becomes a lot more clear why we
need to do so.

Change-Id: I35ac081784df7744d515479896a41d530653692d
Reviewed-on: https://go-review.googlesource.com/c/go/+/246968
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Trust: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:40 +00:00
Michael Anthony Knyszek
cca3d1e553 runtime: don't flush local_tinyallocs
This change makes local_tinyallocs work like the rest of the malloc
stats and doesn't flush local_tinyallocs, instead making that the
source-of-truth.

Change-Id: I3e6cb5f1b3d086e432ce7d456895511a48e3617a
Reviewed-on: https://go-review.googlesource.com/c/go/+/246967
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:30 +00:00
Michael Anthony Knyszek
e63716bc76 runtime: make nlargealloc and largealloc mcache fields
This change makes nlargealloc and largealloc into mcache fields just
like nlargefree and largefree. These local fields become the new
source-of-truth. This change also moves the accounting for these fields
out of allocSpan (which is an inappropriate place for it -- this
accounting generally happens much closer to the point of allocation) and
into largeAlloc. This move is partially possible now that we can call
gcController.revise at that point.

Furthermore, this change moves largeAlloc into mcache.go and makes it a
method of mcache. While there's a little bit of a mismatch here because
largeAlloc barely interacts with the mcache, it helps solidify the
mcache as the first allocation layer and provides a clear place to
aggregate and manage statistics.

Change-Id: I37b5e648710733bb4c04430b71e96700e438587a
Reviewed-on: https://go-review.googlesource.com/c/go/+/246965
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:16 +00:00
Michael Anthony Knyszek
42019613df runtime: make distributed/local malloc stats the source-of-truth
This change makes it so that various local malloc stats (excluding
heap_scan and local_tinyallocs) are no longer written first to mheap
fields but are instead accessed directly from each mcache.

This change is part of a move toward having stats be distributed, and
cleaning up some old code related to the stats.

Note that because there's no central source-of-truth, when an mcache
dies, it must donate its stats to another mcache. It's always safe to
donate to the mcache for the 0th P, so do that.

Change-Id: I2556093dbc27357cb9621c9b97671f3c00aa1173
Reviewed-on: https://go-review.googlesource.com/c/go/+/246964
Trust: Michael Knyszek <mknyszek@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:26:08 +00:00
Michael Anthony Knyszek
8cc280aa72 runtime: define and enforce synchronization on heap_scan
Currently heap_scan is mostly protected by the heap lock, but
gcControllerState.revise sometimes accesses it without a lock. In an
effort to make gcControllerState.revise callable from more contexts (and
have its synchronization guarantees actually respected), make heap_scan
atomically read from and written to, unless the world is stopped.

Note that we don't update gcControllerState.revise's erroneous doc
comment here because this change isn't about revise's guarantees, just
about heap_scan. The comment is updated in a later change.

Change-Id: Iddbbeb954767c704c2bd1d221f36e6c4fc9948a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/246960
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Emmanuel Odeke <emmanuel@orijtech.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-10-26 17:25:40 +00:00
lihaowei
7fbd8c75c6 all: fix spelling mistakes
Change-Id: I7d512281d8442d306594b57b5deaecd132b5ea9e
GitHub-Last-Rev: 251e1d6857
GitHub-Pull-Request: golang/go#40793
Reviewed-on: https://go-review.googlesource.com/c/go/+/248441
Reviewed-by: Dave Cheney <dave@cheney.net>
2020-08-18 03:28:52 +00:00
Michael Anthony Knyszek
e6d0bd2b89 runtime: clean up old mcentral code
This change deletes the old mcentral implementation from the code base
and the newMCentralImpl feature flag along with it.

Updates #37487.

Change-Id: Ibca8f722665f0865051f649ffe699cbdbfdcfcf2
Reviewed-on: https://go-review.googlesource.com/c/go/+/221184
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-08-17 20:06:49 +00:00
Michael Anthony Knyszek
260dff3ca3 runtime: clean up old markrootSpans
This change removes the old markrootSpans implementation and deletes the
feature flag.

Updates #37487.

Change-Id: Idb5a2559abcc3be5a7da6f2ccce1a86e1d7634e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/221183
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-08-17 20:06:41 +00:00
Austin Clements
d19fedd180 runtime: move checkmarks to a separate bitmap
Currently, the GC stores the object marks for checkmarks mode in the
heap bitmap using a rather complex encoding: for one word objects, the
checkmark is stored in the pointer/scalar bit since one word objects
must be pointers; for larger objects, the checkmark is stored in what
would be the scan/dead bit for the second word of the object. This
encoding made more sense when the runtime used the first scan/dead bit
as the regular mark bit, but we moved away from that long ago.

This encoding and overloading of the heap bitmap bits causes a great
deal of complexity in many parts of the allocator and garbage
collector and leads to some subtle bugs like #15903.

This CL moves the checkmarks mark bits into their own per-arena bitmap
and reclaims the second scan/dead bit as a regular scan/dead bit.

I tested this by enabling doubleCheck mode in heapBitsSetType and
running in both regular and GODEBUG=gccheckmark=1 mode.

Fixes #15903.

No performance degradation. (Very slight improvement on a few
benchmarks, but it's probably just noise.)

name                                old time/op            new time/op            delta
BiogoIgor                                      16.6s ± 1%             16.4s ± 1%  -0.94%  (p=0.000 n=25+24)
BiogoKrishna                                   19.2s ± 3%             19.2s ± 3%    ~     (p=0.638 n=23+25)
BleveIndexBatch100                             6.12s ± 5%             6.17s ± 4%    ~     (p=0.170 n=25+25)
CompileTemplate                                206ms ± 1%             205ms ± 1%  -0.43%  (p=0.005 n=24+24)
CompileUnicode                                82.2ms ± 2%            81.5ms ± 2%  -0.95%  (p=0.001 n=22+22)
CompileGoTypes                                 755ms ± 3%             754ms ± 4%    ~     (p=0.715 n=25+25)
CompileCompiler                                3.73s ± 1%             3.73s ± 1%    ~     (p=0.445 n=25+24)
CompileSSA                                     8.67s ± 1%             8.66s ± 1%    ~     (p=0.836 n=24+22)
CompileFlate                                   134ms ± 2%             133ms ± 1%  -0.66%  (p=0.001 n=24+23)
CompileGoParser                                164ms ± 1%             163ms ± 1%  -0.85%  (p=0.000 n=24+24)
CompileReflect                                 466ms ± 5%             466ms ± 3%    ~     (p=0.863 n=25+25)
CompileTar                                     182ms ± 1%             182ms ± 1%  -0.31%  (p=0.048 n=24+24)
CompileXML                                     249ms ± 1%             248ms ± 1%  -0.32%  (p=0.031 n=21+25)
CompileStdCmd                                  10.3s ± 1%             10.3s ± 1%    ~     (p=0.459 n=23+23)
FoglemanFauxGLRenderRotateBoat                 8.66s ± 1%             8.62s ± 1%  -0.47%  (p=0.000 n=23+24)
FoglemanPathTraceRenderGopherIter1             20.3s ± 3%             20.2s ± 2%    ~     (p=0.893 n=25+25)
GopherLuaKNucleotide                           29.7s ± 1%             29.8s ± 2%    ~     (p=0.421 n=24+25)
MarkdownRenderXHTML                            246ms ± 1%             247ms ± 1%    ~     (p=0.558 n=25+24)
Tile38WithinCircle100kmRequest                 779µs ± 4%             779µs ± 3%    ~     (p=0.954 n=25+25)
Tile38IntersectsCircle100kmRequest            1.02ms ± 3%            1.01ms ± 4%    ~     (p=0.658 n=25+25)
Tile38KNearestLimit100Request                  984µs ± 4%             986µs ± 4%    ~     (p=0.627 n=24+25)
[Geo mean]                                     552ms                  551ms       -0.19%

https://perf.golang.org/search?q=upload:20200723.6

Change-Id: Ic703f26a83fb034941dc6f4788fc997d56890dec
Reviewed-on: https://go-review.googlesource.com/c/go/+/244539
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
2020-08-17 14:31:20 +00:00
Cherry Zhang
c53b2bdb35 runtime: add a barrier after a new span is allocated
When copying a stack, we
1. allocate a new stack,
2. adjust pointers pointing to the old stack to pointing to the
   new stack.

If the GC is running on another thread concurrently, on a machine
with weak memory model, the GC could observe the adjusted pointer
(e.g. through gp._defer which could be a special heap-to-stack
pointer), but not observe the publish of the new stack span. In
this case, the GC will see the adjusted pointer pointing to an
unallocated span, and throw. Fixing this by adding a publication
barrier between the allocation of the span and adjusting pointers.

One testcase for this is TestDeferHeapAndStack in long mode. It
fails reliably on linux-mips64le-mengzhuo builder without the fix,
and passes reliably after the fix.

Fixes #35541.

Change-Id: I82b09b824fdf14be7336a9ee853f56dec1b13b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/234478
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-05-21 14:31:36 +00:00
Michael Anthony Knyszek
796786cd0c runtime: make maxOffAddr reflect the actual address space upper bound
Currently maxOffAddr is defined in terms of the whole 64-bit address
space, assuming that it's all supported, by using ^uintptr(0) as the
maximal address in the offset space. In reality, the maximal address in
the offset space is (1<<heapAddrBits)-1 because we don't have more than
that actually available to us on a given platform.

On most platforms this is fine, because arenaBaseOffset is just
connecting two segments of address space, but on AIX we use it as an
actual offset for the starting address of the available address space,
which is limited. This means using ^uintptr(0) as the maximal address in
the offset address space causes wrap-around, especially when we just
want to represent a range approximately like [addr, infinity), which
today we do by using maxOffAddr.

To fix this, we define maxOffAddr more appropriately, in terms of
(1<<heapAddrBits)-1.

This change also redefines arenaBaseOffset to not be the negation of the
virtual address corresponding to address zero in the virtual address
space, but instead directly as the virtual address corresponding to
zero. This matches the existing documentation more closely and makes the
logic around arenaBaseOffset decidedly simpler, especially when trying
to reason about its use on AIX.

Fixes #38966.

Change-Id: I1336e5036a39de846f64cc2d253e8536dee57611
Reviewed-on: https://go-review.googlesource.com/c/go/+/233497
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2020-05-14 16:20:19 +00:00
Michael Anthony Knyszek
55ec5182d7 runtime: remove scavAddr in favor of address ranges
This change removes the concept of s.scavAddr in favor of explicitly
reserving and unreserving address ranges. s.scavAddr has several
problems with raciness that can cause the scavenger to miss updates, or
move it back unnecessarily, forcing future scavenge calls to iterate
over searched address space unnecessarily.

This change achieves this by replacing scavAddr with a second addrRanges
which is cloned from s.inUse at the end of each sweep phase. Ranges from
this second addrRanges are then reserved by scavengers (with the
reservation size proportional to the heap size) who are then able to
safely iterate over those ranges without worry of another scavenger
coming in.

Fixes #35788.

Change-Id: Ief01ae170384174875118742f6c26b2a41cbb66d
Reviewed-on: https://go-review.googlesource.com/c/go/+/208378
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2020-05-08 16:24:40 +00:00
Michael Anthony Knyszek
14ae846f54 runtime: avoid overflow in (*mheap).grow
Currently when checking if we can grow the heap into the current arena,
we do an addition which may overflow. This is particularly likely on
32-bit systems.

Avoid this situation by explicitly checking for overflow, and adding in
some comments about when overflow is possible, when it isn't, and why.

For #35954.

Change-Id: I2d4ecbb1ccbd43da55979cc721f0cd8d1757add2
Reviewed-on: https://go-review.googlesource.com/c/go/+/231337
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-05-07 21:39:50 +00:00
Michael Anthony Knyszek
a13691966a runtime: add new mcentral implementation
Currently mcentral is implemented as a couple of linked lists of spans
protected by a lock. Unfortunately this design leads to significant lock
contention.

The span ownership model is also confusing and complicated. In-use spans
jump between being owned by multiple sources, generally some combination
of a gcSweepBuf, a concurrent sweeper, an mcentral or an mcache.

So first to address contention, this change replaces those linked lists
with gcSweepBufs which have an atomic fast path. Then, we change up the
ownership model: a span may be simultaneously owned only by an mcentral
and the page reclaimer. Otherwise, an mcentral (which now consists of
sweep bufs), a sweeper, or an mcache are the sole owners of a span at
any given time. This dramatically simplifies reasoning about span
ownership in the runtime.

As a result of this new ownership model, sweeping is now driven by
walking over the mcentrals rather than having its own global list of
spans. Because we no longer have a global list and we traditionally
haven't used the mcentrals for large object spans, we no longer have
anywhere to put large objects. So, this change also makes it so that we
keep large object spans in the appropriate mcentral lists.

In terms of the static lock ranking, we add the spanSet spine locks in
pretty much the same place as the mcentral locks, since they have the
potential to be manipulated both on the allocation and sweep paths, like
the mcentral locks.

This new implementation is turned on by default via a feature flag
called go115NewMCentralImpl.

Benchmark results for 1 KiB allocation throughput (5 runs each):

name \ MiB/s  go113       go114       gotip       gotip+this-patch
AllocKiB-1    1.71k ± 1%  1.68k ± 1%  1.59k ± 2%      1.71k ± 1%
AllocKiB-2    2.46k ± 1%  2.51k ± 1%  2.54k ± 1%      2.93k ± 1%
AllocKiB-4    4.27k ± 1%  4.41k ± 2%  4.33k ± 1%      5.01k ± 2%
AllocKiB-8    4.38k ± 3%  5.24k ± 1%  5.46k ± 1%      8.23k ± 1%
AllocKiB-12   4.38k ± 3%  4.49k ± 1%  5.10k ± 1%     10.04k ± 0%
AllocKiB-16   4.31k ± 1%  4.14k ± 3%  4.22k ± 0%     10.42k ± 0%
AllocKiB-20   4.26k ± 1%  3.98k ± 1%  4.09k ± 1%     10.46k ± 3%
AllocKiB-24   4.20k ± 1%  3.97k ± 1%  4.06k ± 1%     10.74k ± 1%
AllocKiB-28   4.15k ± 0%  4.00k ± 0%  4.20k ± 0%     10.76k ± 1%

Fixes #37487.

Change-Id: I92d47355acacf9af2c41bf080c08a8c1638ba210
Reviewed-on: https://go-review.googlesource.com/c/go/+/221182
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-27 18:19:26 +00:00
Michael Anthony Knyszek
eacdf76b93 runtime: add bitmap-based markrootSpans implementation
Currently markrootSpans, the scanning routine which scans span specials
(particularly finalizers) as roots, uses sweepSpans to shard work and
find spans to mark.

However, as part of a future CL to change span ownership and how
mcentral works, we want to avoid having markrootSpans use the sweep bufs
to find specials, so in this change we introduce a new mechanism.

Much like for the page reclaimer, we set up a per-page bitmap where the
first page for a span is marked if the span contains any specials, and
unmarked if it has no specials. This bitmap is updated by addspecial,
removespecial, and during sweeping.

markrootSpans then shards this bitmap into mark work and markers iterate
over the bitmap looking for spans with specials to mark. Unlike the page
reclaimer, we don't need to use the pageInUse bits because having a
special implies that a span is in-use.

While in terms of computational complexity this design is technically
worse, because it needs to iterate over the mapped heap, in practice
this iteration is very fast (we can skip over large swathes of the heap
very quickly) and we only look at spans that have any specials at all,
rather than having to touch each span.

This new implementation of markrootSpans is behind a feature flag called
go115NewMarkrootSpans.

Updates #37487.

Change-Id: I8ea07b6c11059f6d412fe419e0ab512d989377b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/221178
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-04-21 22:50:51 +00:00
Michael Pratt
300ff5d8ac runtime: allow proflock and mheap.speciallock above globalAlloc.mutex
During schedinit, these may occur in:

mProf_Malloc
  stkbucket
    newBucket
      persistentalloc
        persistentalloc1

mProf_Malloc
  setprofilebucket
    fixalloc.alloc
      persistentalloc
        persistentalloc1

These seem to be legitimate lock orderings.

Additionally, mheap.speciallock had a defined rank, but it was never
actually used. That is fixed now.

Updates #38474

Change-Id: I0f6e981852eac66dafb72159f426476509620a65
Reviewed-on: https://go-review.googlesource.com/c/go/+/228786
Run-TryBot: Michael Pratt <mpratt@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
2020-04-21 20:22:06 +00:00
Dan Scales
0a820007e7 runtime: static lock ranking for the runtime (enabled by GOEXPERIMENT)
I took some of the infrastructure from Austin's lock logging CR
https://go-review.googlesource.com/c/go/+/192704 (with deadlock
detection from the logs), and developed a setup to give static lock
ranking for runtime locks.

Static lock ranking establishes a documented total ordering among locks,
and then reports an error if the total order is violated. This can
happen if a deadlock happens (by acquiring a sequence of locks in
different orders), or if just one side of a possible deadlock happens.
Lock ordering deadlocks cannot happen as long as the lock ordering is
followed.

Along the way, I found a deadlock involving the new timer code, which Ian fixed
via https://go-review.googlesource.com/c/go/+/207348, as well as two other
potential deadlocks.

See the constants at the top of runtime/lockrank.go to show the static
lock ranking that I ended up with, along with some comments. This is
great documentation of the current intended lock ordering when acquiring
multiple locks in the runtime.

I also added an array lockPartialOrder[] which shows and enforces the
current partial ordering among locks (which is embedded within the total
ordering). This is more specific about the dependencies among locks.

I don't try to check the ranking within a lock class with multiple locks
that can be acquired at the same time (i.e. check the ranking when
multiple hchan locks are acquired).

Currently, I am doing a lockInit() call to set the lock rank of most
locks. Any lock that is not otherwise initialized is assumed to be a
leaf lock (a very high rank lock), so that eliminates the need to do
anything for a bunch of locks (including all architecture-dependent
locks). For two locks, root.lock and notifyList.lock (only in the
runtime/sema.go file), it is not as easy to do lock initialization, so
instead, I am passing the lock rank with the lock calls.

For Windows compilation, I needed to increase the StackGuard size from
896 to 928 because of the new lock-rank checking functions.

Checking of the static lock ranking is enabled by setting
GOEXPERIMENT=staticlockranking before doing a run.

To make sure that the static lock ranking code has no overhead in memory
or CPU when not enabled by GOEXPERIMENT, I changed 'go build/install' so
that it defines a build tag (with the same name) whenever any experiment
has been baked into the toolchain (by checking Expstring()). This allows
me to avoid increasing the size of the 'mutex' type when static lock
ranking is not enabled.

Fixes #38029

Change-Id: I154217ff307c47051f8dae9c2a03b53081acd83a
Reviewed-on: https://go-review.googlesource.com/c/go/+/207619
Reviewed-by: Dan Scales <danscales@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-04-07 21:51:03 +00:00
Ian Lance Taylor
3093959ee1 runtime: remove mcache field from m
Having an mcache field in both m and p is confusing, so remove it from m.
Always use mcache field from p. Use new variable mcache0 during bootstrap.

Change-Id: If2cba9f8bb131d911d512b61fd883a86cf62cc98
Reviewed-on: https://go-review.googlesource.com/c/go/+/205239
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2020-02-24 16:39:52 +00:00
Michael Anthony Knyszek
8ac98e7b3f runtime: add scavtrace debug flag and remove scavenge info from gctrace
Currently, scavenging information is printed if the gctrace debug
variable is >0. Scavenging information is also printed naively, for
every page scavenged, resulting in a lot of noise when the typical
expectation for GC trace is one line per GC.

This change adds a new GODEBUG flag called scavtrace which prints
scavenge information roughly once per GC cycle and removes any scavenge
information from gctrace. The exception is debug.FreeOSMemory, which may
force an additional line to be printed.

Fixes #32952.

Change-Id: I4177dcb85fe3f9653fd74297ea93c97c389c1811
Reviewed-on: https://go-review.googlesource.com/c/go/+/212640
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-01-09 18:00:06 +00:00
Dan Scales
f266cce676 runtime: avoid potential deadlock when tracing memory code
In reclaimChunk, the runtime is calling traceGCSweepDone() while holding the mheap
lock. traceGCSweepDone() can call traceEvent() and traceFlush(). These functions
not only can get various trace locks, but they may also do memory allocations
(runtime.newobject) that may end up getting the mheap lock. So, there may be
either a self-deadlock or a possible deadlock between multiple threads.

It seems better to release the mheap lock before calling traceGCSweepDone(). It is
fine to release the lock, since the operations to get the index of the chunk of
work to do are atomic. We already release the lock to call sweep, so there is no
new behavior for any of the callers of reclaimChunk.

With this change, mheap is a leaf lock (no other lock is ever acquired while it
is held).

Testing: besides normal all.bash, also ran all.bash with --long enabled, since
it does longer tests of runtime/trace.

Change-Id: I4f8cb66c24bb8d424f24d6c2305b4b8387409248
Reviewed-on: https://go-review.googlesource.com/c/go/+/207846
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2020-01-07 00:05:43 +00:00
Michael Anthony Knyszek
4e3d58009a runtime: reset scavenge address in scavengeAll
Currently scavengeAll (which is called by debug.FreeOSMemory) doesn't
reset the scavenge address before scavenging, meaning it could miss
large portions of the heap. Fix this by reseting the address before
scavenging, which will ensure it is able to walk over the entire heap.

Fixes #35858.

Change-Id: I4a7408050b8e134318ff94428f98cb96a1795aa9
Reviewed-on: https://go-review.googlesource.com/c/go/+/208960
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-11-27 15:06:55 +00:00
Ville Skyttä
440f7d6404 all: fix a bunch of misspellings
Change-Id: I5b909df0fd048cd66c5a27fca1b06466d3bcaac7
GitHub-Last-Rev: 778c5d2131
GitHub-Pull-Request: golang/go#35624
Reviewed-on: https://go-review.googlesource.com/c/go/+/207421
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2019-11-15 21:04:43 +00:00
Michael Anthony Knyszek
a2cd2bd55d runtime: add per-p page allocation cache
This change adds a per-p free page cache which the page allocator may
allocate out of without a lock. The change also introduces a completely
lockless page allocator fast path.

Although the cache contains at most 64 pages (and usually less), the
vast majority (85%+) of page allocations are exactly 1 page in size.

Updates #35112.

Change-Id: I170bf0a9375873e7e3230845eb1df7e5cf741b78
Reviewed-on: https://go-review.googlesource.com/c/go/+/195701
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 18:00:54 +00:00
Michael Anthony Knyszek
4517c02f28 runtime: add per-p mspan cache
This change adds a per-p mspan object cache similar to the sudog cache.
Unfortunately this cache can't quite operate like the sudog cache, since
it is used in contexts where write barriers are disallowed (i.e.
allocation codepaths), so rather than managing an array and a slice,
it's just an array and a length. A little bit more unsafe, but avoids
any write barriers.

The purpose of this change is to reduce the number of operations which
require the heap lock in allocation, paving the way for a lockless fast
path.

Updates #35112.

Change-Id: I32cfdcd8528fb7be985640e4f3a13cb98ffb7865
Reviewed-on: https://go-review.googlesource.com/c/go/+/196642
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 17:01:32 +00:00
Michael Anthony Knyszek
a762221bea runtime: rearrange mheap_.alloc* into allocSpan
This change combines the functionality of allocSpanLocked, allocManual,
and alloc_m into a new method called allocSpan. While these methods'
abstraction boundaries are OK when the heap lock is held throughout,
they start to break down when we want finer-grained locking in the page
allocator.

allocSpan does just that, and only locks the heap when it absolutely has
to. Piggy-backing off of work in previous CLs to make more of span
initialization lockless, this change makes span initialization entirely
lockless as part of the reorganization.

Ultimately this change will enable us to add a lockless fast path to
allocSpan.

Updates #35112.

Change-Id: I99875939d75fb4e958a67ac99e4a7cda44f06864
Reviewed-on: https://go-review.googlesource.com/c/go/+/196641
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 17:01:18 +00:00
Michael Anthony Knyszek
dac936a4ab runtime: make more page sweeper operations atomic
This change makes it so that allocation and free related page sweeper
metadata operations (e.g. pageInUse and pagesInUse) are atomic rather
than protected by the heap lock. This will help in reducing the length
of the critical path with the heap lock held in future changes.

Updates #35112.

Change-Id: Ie82bff024204dd17c4c671af63350a7a41add354
Reviewed-on: https://go-review.googlesource.com/c/go/+/196640
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 17:00:57 +00:00
Michael Anthony Knyszek
7f574e476a runtime: remove unnecessary large parameter to mheap_.alloc
mheap_.alloc currently accepts both a spanClass and a "large" parameter
indicating whether the allocation is large. These are redundant, since
spanClass.sizeclass() == 0 is an equivalent way to determine this and is
already used in mheap_.alloc. There are no places in the runtime where
the size class could be non-zero and large == true.

Updates #35112.

Change-Id: Ie66facf8f0faca6f4cd3d20a8ac4bc259e11823d
Reviewed-on: https://go-review.googlesource.com/c/go/+/196639
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 16:44:33 +00:00
Michael Anthony Knyszek
ffb5646fe0 runtime: define maximum supported physical page and huge page sizes
This change defines a maximum supported physical and huge page size in
the runtime based on the new page allocator's implementation, and uses
them where appropriate.

Furthemore, if the system exceeds the maximum supported huge page
size, we simply ignore it silently.

It also fixes a huge-page-related test which is only triggered by a
condition which is definitely wrong.

Finally, it adds a few TODOs related to code clean-up and supporting
larger huge page sizes.

Updates #35112.
Fixes #35431.

Change-Id: Ie4348afb6bf047cce2c1433576d1514720d8230f
Reviewed-on: https://go-review.googlesource.com/c/go/+/205937
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-11-08 16:35:48 +00:00
Michael Anthony Knyszek
ae4534e659 runtime: ensure heap memstats are updated atomically
For the most part, heap memstats are already updated atomically when
passed down to OS-level memory functions (e.g. sysMap). Elsewhere,
however, they're updated with the heap lock.

In order to facilitate holding the heap lock for less time during
allocation paths, this change more consistently makes the update of
these statistics atomic by calling mSysStat{Inc,Dec} appropriately
instead of simply adding or subtracting. It also ensures these values
are loaded atomically.

Furthermore, an undocumented but safe update condition for these
memstats is during STW, at which point using atomics is unnecessary.
This change also documents this condition in mstats.go.

Updates #35112.

Change-Id: I87d0b6c27b98c88099acd2563ea23f8da1239b66
Reviewed-on: https://go-review.googlesource.com/c/go/+/196638
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 16:21:04 +00:00
Michael Anthony Knyszek
814c5058bb runtime: remove useless heap_objects accounting
This change removes useless additional heap_objects accounting for large
objects. heap_objects is computed from scratch at ReadMemStats time
(which stops the world) by using nlargealloc and nlargefree, so mutating
heap_objects turns out to be pointless.

As a result, the "large" parameter on "mheap_.freeSpan" is no longer
necessary and so this change cleans that up too.

Change-Id: I7d6b486d9b57c018e3db46221d81b55fe4c1b021
Reviewed-on: https://go-review.googlesource.com/c/go/+/196637
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 16:20:27 +00:00
Michael Anthony Knyszek
4208dbef16 runtime: make allocNeedsZero lock-free
In preparation for a lockless fast path in the page allocator, this
change makes it so that checking if an allocation needs to be zeroed may
be done atomically.

Unfortunately, this means there is a CAS-loop to ensure monotonicity of
the zeroedBase value in heapArena. This CAS-loop exits if an allocator
acquiring memory further on in the arena wins or if it succeeds. The
CAS-loop should have a relatively small amount of contention because of
this monotonicity, though it would be ideal if we could just have
CAS-ers with the greatest value always win. The CAS-loop is unnecessary
in the steady-state, but should bring some start-up performance gains as
it's likely cheaper than the additional zeroing required, especially for
large allocations.

For very large allocations that span arenas, the CAS-loop should be
completely uncontended for most of the arenas it touches, it may only
encounter contention on the first and last arena.

Updates #35112.

Change-Id: If3d19198b33f1b1387b71e1ce5902d39a5c0f98e
Reviewed-on: https://go-review.googlesource.com/c/go/+/203859
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 16:20:17 +00:00
Michael Anthony Knyszek
33dfd3529b runtime: remove old page allocator
This change removes the old page allocator from the runtime.

Updates #35112.

Change-Id: Ib20e1c030f869b6318cd6f4288a9befdbae1b771
Reviewed-on: https://go-review.googlesource.com/c/go/+/195700
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-08 00:07:43 +00:00
Michael Anthony Knyszek
a120cc8b36 runtime: compute whether a span needs zeroing in the new page allocator
This change adds the allocNeedZero method to mheap which uses the new
heapArena field zeroedBase to determine whether a new allocation needs
zeroing. The purpose of this work is to avoid zeroing memory that is
fresh from the OS in the context of the new allocator, where we no
longer have the concept of a free span to track this information.

The new field in heapArena, zeroedBase, is small, which runs counter to
the advice in the doc comment for heapArena. Since heapArenas are
already not a multiple of the system page size, this advice seems stale,
and we're OK with using an extra physical page for a heapArena. So, this
change also deletes the comment with that advice.

Updates #35112.

Change-Id: I688cd9fd3c57a98a6d43c45cf699543ce16697e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/203858
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 20:52:05 +00:00
Michael Anthony Knyszek
689f6f77f0 runtime: integrate new page allocator into runtime
This change integrates all the bits and pieces of the new page allocator
into the runtime, behind a global constant.

Updates #35112.

Change-Id: I6696bde7bab098a498ab37ed2a2caad2a05d30ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/201764
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 20:14:02 +00:00
Michael Anthony Knyszek
21445b091e runtime: make the scavenger self-paced
Currently the runtime background scavenger is paced externally,
controlled by a collection of variables which together describe a line
that we'd like to stay under.

However, the line to stay under is computed as a function of the number
of free and unscavenged huge pages in the heap at the end of the last
GC. Aside from this number being inaccurate (which is still acceptable),
the scavenging system also makes an order-of-magnitude assumption as to
how expensive scavenging a single page actually is.

This change simplifies the scavenger in preparation for making it
operate on bitmaps. It makes it so that the scavenger paces itself, by
measuring the amount of time it takes to scavenge a single page. The
scavenging methods on mheap already avoid breaking huge pages, so if we
scavenge a real huge page, then we'll have paced correctly, otherwise
we'll sleep for longer to avoid using more than scavengePercent wall
clock time.

Unfortunately, all this involves measuring time, which is quite tricky.
Currently we don't directly account for long process sleeps or OS-level
context switches (which is quite difficult to do in general), but we do
account for Go scheduler overhead and variations in it by maintaining an
EWMA of the ratio of time spent scavenging to the time spent sleeping.
This ratio, as well as the sleep time, are bounded in order to deal with
the aforementioned OS-related anomalies.

Updates #35112.

Change-Id: Ieca8b088fdfca2bebb06bcde25ef14a42fd5216b
Reviewed-on: https://go-review.googlesource.com/c/go/+/201763
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-11-07 20:12:18 +00:00
Michael Anthony Knyszek
383b447e0d runtime: clean up power-of-two rounding code with align functions
This change renames the "round" function to the more appropriately named
"alignUp" which rounds an integer up to the next multiple of a power of
two.

This change also adds the alignDown function, which is almost like
alignUp but rounds down to the previous multiple of a power of two.

With these two functions, we also go and replace manual rounding code
with it where we can.

Change-Id: Ie1487366280484dcb2662972b01b4f7135f72fec
Reviewed-on: https://go-review.googlesource.com/c/go/+/190618
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2019-11-04 23:41:34 +00:00