This splits the heap bitmap into separate chunks for every 64MB of the
heap and introduces an index mapping from virtual address to metadata.
It modifies the heapBits abstraction to use this two-level structure.
Finally, it modifies heapBitsSetType to unroll the bitmap into the
object itself and then copy it out if the bitmap would span
discontiguous bitmap chunks.
This is a step toward supporting general sparse heaps, which will
eliminate address space conflict failures as well as the limit on the
heap size.
It's also advantageous for 32-bit. 32-bit already supports
discontiguous heaps by always starting the arena at address 0.
However, as a result, with a contiguous bitmap, if the kernel chooses
a high address (near 2GB) for a heap mapping, the runtime is forced to
map up to 128MB of heap bitmap. Now the runtime can map sections of
the bitmap for just the parts of the address space used by the heap.
Updates #10460.
This slightly slows down the x/garbage and compilebench benchmarks.
However, I think the slowdown is acceptably small.
name old time/op new time/op delta
Template 178ms ± 1% 180ms ± 1% +0.78% (p=0.029 n=10+10)
Unicode 85.7ms ± 2% 86.5ms ± 2% ~ (p=0.089 n=10+10)
GoTypes 594ms ± 0% 599ms ± 1% +0.70% (p=0.000 n=9+9)
Compiler 2.86s ± 0% 2.87s ± 0% +0.40% (p=0.001 n=9+9)
SSA 7.23s ± 2% 7.29s ± 2% +0.94% (p=0.029 n=10+10)
Flate 116ms ± 1% 117ms ± 1% +0.99% (p=0.000 n=9+9)
GoParser 146ms ± 1% 146ms ± 0% ~ (p=0.193 n=10+7)
Reflect 399ms ± 0% 403ms ± 1% +0.89% (p=0.001 n=10+10)
Tar 173ms ± 1% 174ms ± 1% +0.91% (p=0.013 n=10+9)
XML 208ms ± 1% 210ms ± 1% +0.93% (p=0.000 n=10+10)
[Geo mean] 368ms 371ms +0.79%
name old time/op new time/op delta
Garbage/benchmem-MB=64-12 2.17ms ± 1% 2.21ms ± 1% +2.15% (p=0.000 n=20+20)
Change-Id: I037fd283221976f4f61249119d6b97b100bcbc66
Reviewed-on: https://go-review.googlesource.com/85883
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the heap bitamp is laid in reverse order in memory relative
to the heap itself. This was originally done out of "excessive
cleverness" so that computing a bitmap pointer could load only the
arena_start field and so that heaps could be more contiguous by
growing the arena and the bitmap out from a common center point.
However, this appears to have no actual performance benefit, it
complicates nearly every use of the bitmap, and it makes already
confusing code more confusing. Furthermore, it's still possible to use
a single field (the new bitmap_delta) for the bitmap pointer
computation by employing slightly different excessive cleverness.
Hence, this CL puts the bitmap into forward order.
This is a (very) updated version of CL 9404.
Change-Id: I743587cc626c4ecd81e660658bad85b54584108c
Reviewed-on: https://go-review.googlesource.com/85881
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
The logic in the spanOf* functions is open-coded in a lot of places
right now. Replace these with calls to the spanOf* functions.
Change-Id: I3cc996aceb9a529b60fea7ec6fef22008c012978
Reviewed-on: https://go-review.googlesource.com/85880
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
I think we'd forgotten about the mheap.lookup APIs when we introduced
spanOf*, but, at any rate, the spanOf* functions are used far more
widely at this point, so this CL eliminates the mheap.lookup*
functions in favor of spanOf*.
Change-Id: I15facd0856e238bb75d990e838a092b5bef5bdfc
Reviewed-on: https://go-review.googlesource.com/85879
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
heapBitsForObject does two things: it finds the base of the object and
it creates the heapBits for the base of the object. There are several
places where we just care about the base of the object. Furthermore,
greyobject only needs the heapBits in the checkmark path and can
easily compute them only when needed. Once we eliminate passing the
heap bits to grayobject, almost all uses of heapBitsForObject don't
need the heap bits.
Hence, this splits heapBitsForObject into findObject and
heapBitsForAddr (the latter already exists), removes the hbits
argument to grayobject, and replaces all heapBitsForObject calls with
calls to findObject.
In addition to making things cleaner overall, heapBitsForAddr is going
to get more expensive shortly, so it's important that we don't do it
needlessly.
Note that there's an interesting performance pitfall here. I had
originally moved findObject to mheap.go, since it made more sense
there. However, that leads to a ~2% slow down and a whopping 11%
increase in L1 icache misses on both the x/garbage and compilebench
benchmarks. This suggests we may want to be more principled about
this, but, for now, let's just leave findObject in mbitmap.go.
(I tried to make findObject small enough to inline by splitting out
the error case, but, sadly, wasn't quite able to get it under the
inlining budget.)
Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10
Reviewed-on: https://go-review.googlesource.com/85878
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
These functions all serve essentially the same purpose. mlookup is
used in only one place and findObject in only three. Use
heapBitsForObject instead, which is the most optimized implementation.
(This may seem slightly silly because none of these uses care about
the heap bits, but we're about to split up the functionality of
heapBitsForObject anyway. At that point, findObject will rise from the
ashes.)
Change-Id: I906468c972be095dd23cf2404a7d4434e802f250
Reviewed-on: https://go-review.googlesource.com/85877
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
recordspan has two remaining write barriers from writing to the
pointer to the backing store of h.allspans. However, h.allspans is
always backed by off-heap memory, so let the compiler know this.
Unfortunately, this isn't quite as clean as most go:notinheap uses
because we can't directly name the backing store of a slice, but we
can get it done with some judicious casting.
For #22460.
Change-Id: I296f92fa41cf2cb6ae572b35749af23967533877
Reviewed-on: https://go-review.googlesource.com/73414
Reviewed-by: Rick Hudson <rlh@golang.org>
The core dump reader wants to know the layout of this type.
No variable has this type, so it wasn't previously dumped
to DWARF output.
Change-Id: I982040b81bff202976743edc7fe53247533a9d81
Reviewed-on: https://go-review.googlesource.com/68312
Reviewed-by: Austin Clements <austin@google.com>
Found with mvdan.cc/unindent. Prioritized the ones with the biggest wins
for now.
Change-Id: I2b032e45cdd559fc9ed5b1ee4c4de42c4c92e07b
Reviewed-on: https://go-review.googlesource.com/56470
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
We use lock-free reads from mheap.spans, but the safety of these is
somewhat subtle. Document this.
Change-Id: I928c893232176135308e38bed788d5f84ff11533
Reviewed-on: https://go-review.googlesource.com/54310
Reviewed-by: Rick Hudson <rlh@golang.org>
We lazily map the bitmap and spans areas as the heap grows. However,
right now we're very slightly too lazy. Specifically, the following
can happen on 32-bit:
1. mallocinit fails to allocate any heap arena, so
arena_used == arena_alloc == arena_end == bitmap.
2. There's less than 256MB between the end of the bitmap mapping and
the next mapping.
3. On the first allocation, mheap.sysAlloc sees that there's not
enough room in [arena_alloc, arena_end) because there's no room at
all. It gets a 256MB mapping from somewhere *lower* in the address
space than arena_used and sets arena_alloc and arena_end to this
hole.
4. Since the new arena_alloc is lower than arena_used, mheap.sysAlloc
doesn't bother to call mheap.setArenaUsed, so we still don't have a
bitmap mapping or a spans array mapping.
5. mheap.grow, which called mheap.sysAlloc, attempts to fill in the
spans array and crashes.
Fix this by mapping the metadata regions for the initial arena_used
when the heap is initialized, rather than trying to wait for an
allocation. This maintains the intended invariant that the structures
are always mapped for [arena_start, arena_used).
Fixes#21044.
Change-Id: I4422375a6e234b9f979d22135fc63ae3395946b0
Reviewed-on: https://go-review.googlesource.com/51714
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
mheap.allocLarge just calls bestFitTreap and is the only caller of
bestFitTreap. Flatten these into a single function. Also fix their
comments: allocLarge claims to return exactly npages but can in fact
return a larger span, and h.freelarge is not in fact indexed by span
start address.
Change-Id: Ia20112bdc46643a501ea82ea77c58596bc96f125
Reviewed-on: https://go-review.googlesource.com/47315
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, the heap arena allocator allocates monotonically increasing
addresses. This is fine on 64-bit where we stake out a giant block of
the address space for ourselves and start at the beginning of it, but
on 32-bit the arena starts at address 0 but we start allocating from
wherever the OS feels like giving us memory. We can generally hint the
OS to start us at a low address, but this doesn't always work.
As a result, on 32-bit, if the OS gives us an arena block that's lower
than the current block we're allocating from, we simply say "thanks
but no thanks", return the whole (256MB!) block of memory, and then
take a fallback path that mmaps just the amount of memory we need
(which may be as little as 8K).
We have to do this because mheap_.arena_used is *both* the highest
used address in the arena and the next address we allocate from.
Fix all of this by separating the second role of arena_used out into a
new field called arena_alloc. This lets us accept any arena block the
OS gives us. This also slightly changes the invariants around
arena_end. Previously, we ensured arena_used <= arena_end, but this
was related to arena_used's second role, so the new invariant is
arena_alloc <= arena_end. As a result, we no longer necessarily update
arena_end when we're updating arena_used.
Fixes#20259 properly. (Unlike the original fix, this one should not
be cherry-picked to Go 1.8.)
This is reasonably low risk. I verified several key properties of the
32-bit code path with both 4K and 64K physical pages using a symbolic
model and the change does not materially affect 64-bit (arena_used ==
arena_alloc on 64-bit). The only oddity is that we no longer call
setArenaUsed with racemap == false to indicate that we're creating a
hole in the address space, but this only happened in a 32-bit-only
code path, and the race detector require 64-bit, so this never
mattered anyway.
Change-Id: Ib1334007933e615166bac4159bf357ae06ec6a25
Reviewed-on: https://go-review.googlesource.com/44010
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently _TinySizeClass is untyped, which means it can accidentally
be used as a spanClass (not that I would know this from experience or
anything). Make it an int8 to avoid this mix up.
This is a cherry-pick of dev.garbage commit 81b74bf9c5.
Change-Id: I1e69eccee436ea5aa45e9a9828a013e369e03f1a
Reviewed-on: https://go-review.googlesource.com/41254
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, we mix objects with pointers and objects without pointers
("noscan" objects) together in memory. As a result, for every object
we grey, we have to check that object's heap bits to find out if it's
noscan, which adds to the per-object cost of GC. This also hurts the
TLB footprint of the garbage collector because it decreases the
density of scannable objects at the page level.
This commit improves the situation by using separate spans for noscan
objects. This will allow a much simpler noscan check (in a follow up
CL), eliminate the need to clear the bitmap of noscan objects (in a
follow up CL), and improves TLB footprint by increasing the density of
scannable objects.
This is also a step toward eliminating dead bits, since the current
noscan check depends on checking the dead bit of the first word.
This has no effect on the heap size of the garbage benchmark.
We'll measure the performance change of this after the follow-up
optimizations.
This is a cherry-pick from dev.garbage commit d491e550c3. The only
non-trivial merge conflict was in updatememstats in mstats.go, where
we now have to separate the per-spanclass stats from the per-sizeclass
stats.
Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40
Reviewed-on: https://go-review.googlesource.com/41251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This may improve perormance during concurrent access
to mheap.central array from multiple CPU cores.
Change-Id: I8f48dd2e72aa62e9c32de07ae60fe552d8642782
Reviewed-on: https://go-review.googlesource.com/41550
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This changes gcSetTriggerRatio so it can be called even during
concurrent mark or sweep. In this case, it will adjust the pacing of
the current phase, accounting for progress that has already been made.
To make this work for concurrent sweep, this introduces a "basis" for
the pagesSwept count, much like the basis we just introduced for
heap_live. This lets gcSetTriggerRatio shift the basis to the current
heap_live and pagesSwept and compute a slope from there to completion.
This avoids creating a discontinuity where, if the ratio has
increased, there has to be a flurry of sweep activity to catch up.
Instead, this creates a continuous, piece-wise linear function as
adjustments are made.
For #19076.
Change-Id: Ibcd76aeeb81ff4814b00be7cbd3530b73bbdbba9
Reviewed-on: https://go-review.googlesource.com/39833
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, proportional sweep maintains its own count of how many
bytes have been allocated since the beginning of the sweep cycle so it
can compute how many pages need to be swept for a given allocation.
However, this requires a somewhat complex reimbursement scheme since
proportional sweep must be done before a span is allocated, but we
don't know how many bytes to charge until we've allocated a span. This
means that the allocated byte count used by proportional sweep can go
up and down, which has led to underflow bugs in the past (#18043) and
is going to interfere with adjusting sweep pacing on-the-fly (for #19076).
This approach also means we're maintaining a statistic that is very
closely related to heap_live, but has a different 0 value. This is
particularly confusing because the sweep ratio is computed based on
heap_live, so you have to understand that these two statistics are
very closely related.
Replace all of this and compute the sweep debt directly from the
current value of heap_live. To make this work, we simply save the
value of heap_live when the sweep ratio is computed to use as a
"basis" for later computing the sweep debt.
This eliminates the need for reimbursement as well as the code for
maintaining the sweeper's version of the live heap size.
For #19076.
Coincidentally fixes#18043, since this eliminates sweep reimbursement
entirely.
Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a
Reviewed-on: https://go-review.googlesource.com/39832
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.
Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.
This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.
Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This extends the sweeper to free workbufs back to the heap between GC
cycles, allowing this memory to be reused for GC'd allocations or
eventually returned to the OS.
This helps for applications that have high peak heap usage relative to
their regular heap usage (for example, a high-memory initialization
phase). Workbuf memory is roughly proportional to heap size and since
we currently never free workbufs, it's proportional to *peak* heap
size. By freeing workbufs, we can release and reuse this memory for
other purposes when the heap shrinks.
This is somewhat complicated because this costs ~1–2 µs per workbuf
span, so for large heaps it's too expensive to just do synchronously
after mark termination between starting the world and dropping the
worldsema. Hence, we do it asynchronously in the sweeper. This adds a
list of "free" workbuf spans that can be returned to the heap. GC
moves all workbuf spans to this list after mark termination and the
background sweeper drains this list back to the heap. If the sweeper
doesn't finish, that's fine, since getempty can directly reuse any
remaining spans to allocate more workbufs.
Performance impact is negligible. On the x/benchmarks, this reduces
GC-bytes-from-system by 6–11%.
Fixes#19325.
Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8
Reviewed-on: https://go-review.googlesource.com/38582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This introduces a new type, *gcBits, to use for alloc/mark bitmap
allocations instead of *uint8. This type is marked go:notinheap, so
uses of it correctly eliminate write barriers. Since we now have a
type, this also extracts some common operations to methods both for
convenience and to avoid (*uint8) casts at most use sites.
For #19325.
Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a
Reviewed-on: https://go-review.googlesource.com/38580
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This clarifies that the gcBits type is actually an arena of gcBits and
will let us introduce a new gcBits type representing a single
mark/alloc bitmap allocated from the arena.
For #19325.
Change-Id: Idedf76d202d9174a17c61bcca9d5539e042e2445
Reviewed-on: https://go-review.googlesource.com/38579
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, manually-managed spans are included in memstats.heap_inuse
and memstats.heap_sys, but when we export these stats to the user, we
subtract out how much has been allocated for stack spans from both.
This works for now because stacks are the only manually-managed spans
we have.
However, we're about to use manually-managed spans for more things
that don't necessarily have obvious stats we can use to adjust the
user-presented numbers. Prepare for this by changing the accounting so
manually-managed spans don't count toward heap_inuse or heap_sys. This
makes these fields align with the fields presented to the user and
means we don't have to track more statistics just so we can adjust
these statistics.
For #19325.
Change-Id: I5cb35527fd65587ff23339276ba2c3969e2ad98f
Reviewed-on: https://go-review.googlesource.com/38577
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're going to start using manually-managed spans for GC workbufs, so
rename the allocate/free methods and pass in a pointer to the stats to
use instead of using the stack stats directly.
For #19325.
Change-Id: I37df0147ae5a8e1f3cb37d59c8e57a1fcc6f2980
Reviewed-on: https://go-review.googlesource.com/38576
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're going to use this free list for other types of manually-managed
memory in the heap.
For #19325.
Change-Id: Ib7e682295133eabfddf3a84f44db43d937bfdd9c
Reviewed-on: https://go-review.googlesource.com/38575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
We're about to generalize _MSpanStack to be used for other forms of
in-heap manual memory management in the runtime. This is an automated
rename of _MSpanStack to _MSpanManual plus some comment fix-ups.
For #19325.
Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395
Reviewed-on: https://go-review.googlesource.com/38574
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.
In the future, this will also make it easier to add other auxiliary VM
structures.
Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
This code was added recently, and it doesn't seem like the parameter
will be useful in the near future.
Change-Id: I5d64dadb6820c159b588262ab90df2461b5fdf04
Reviewed-on: https://go-review.googlesource.com/39692
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Stack spans don't internally use many of the fields of the mspan,
which means things like the size class and element size get left over
from whatever last used the mspan. This can lead to confusing crashes
and debugging.
Zero these fields or initialize them to something reasonable. This
also lets us simplify some code that currently has to distinguish
between heap and stack spans.
Change-Id: I9bd114e76c147bb32de497045b932f8bf1988bbf
Reviewed-on: https://go-review.googlesource.com/38573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
sweepone returns ^uintptr(0) when there are no more spans to *start*
sweeping, but there may be spans being swept concurrently at the time
and there's currently no efficient way to tell when the sweeper is
done sweeping all the spans.
We'll need this for concurrent runtime.GC(), so add a count of the
number of active sweepone calls to make it possible to block until
sweeping is truly done.
This is also useful for more accurately printing the gcpacertrace,
since that should be printed after all of the sweeping stats are in
(currently we can print it slightly too early).
For #18216.
Change-Id: I06e6240c9e7b40aca6fd7b788bb6962107c10a0f
Reviewed-on: https://go-review.googlesource.com/37716
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently freeOSMemory calls gcStart directly, but we really just want
it to behave like runtime.GC() and then perform a scavenge, so make it
call runtime.GC() rather than gcStart.
For #18216.
Change-Id: I548ec007afc788e87d383532a443a10d92105937
Reviewed-on: https://go-review.googlesource.com/37518
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the GC triggering condition is an awkward combination of the
gcMode (whether or not it's gcBackgroundMode) and a boolean
"forceTrigger" flag.
Replace this with a new gcTrigger type that represents the range of
transition predicates we need. This has several advantages:
1. We can remove the awkward logic that affects the trigger behavior
based on the gcMode. Now gcMode purely controls whether to run a
STW GC or not and the gcTrigger controls whether this is a forced
GC that cannot be consolidated with other GC cycles.
2. We can lift the time-based triggering logic in sysmon to just
another type of GC trigger and move the logic to the trigger test.
3. This sets us up to have a cycle count-based trigger, which we'll
use to make runtime.GC trigger concurrent GC with the desired
consolidation properties.
For #18216.
Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc
Reviewed-on: https://go-review.googlesource.com/37516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Mallocs and panics in the scavenge path are particularly nasty because
they're likely to silently self-deadlock on the mheap.lock. Avoid
sinking lots of time into debugging these issues in the future by
turning these into immediate throws.
Change-Id: Ib36fdda33bc90b21c32432b03561630c1f3c69bc
Reviewed-on: https://go-review.googlesource.com/38293
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we acquire a global lock for every newMarkBits call. This is
unfortunate since every span sweep operation calls newMarkBits.
However, most allocations are simply linear allocations from the
current arena. Take advantage of this to add a lock-free fast path for
allocating from the current arena. With this change, the global lock
only protects the lists of arenas, not the free offset in the current
arena.
Change-Id: I6cf6182af8492c8bfc21276114c77275fe3d7826
Reviewed-on: https://go-review.googlesource.com/34595
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently, newArena holds the gcBitsArenas lock across allocating
memory from the OS for a new gcBits arena. This is a global lock and
allocating physical memory can be expensive, so this has the potential
to cause high lock contention, especially since every single span
sweep operation calls newArena (via newMarkBits).
Improve the situation by temporarily dropping the lock across
allocation. This means the caller now has to revalidate its
assumptions after the lock is dropped, so this also factors out that
code path and reinvokes it after the lock is acquired.
Change-Id: I1113200a954ab4aad16b5071512583cfac744bdc
Reviewed-on: https://go-review.googlesource.com/34594
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Currently ReadMemStats stops the world for ~1.7 ms/GB of heap because
it collects statistics from every single span. For large heaps, this
can be quite costly. This is particularly unfortunate because many
production infrastructures call this function regularly to collect and
report statistics.
Fix this by tracking the necessary cumulative statistics in the
mcaches. ReadMemStats still has to stop the world to stabilize these
statistics, but there are only O(GOMAXPROCS) mcaches to collect
statistics from, so this pause is only 25µs even at GOMAXPROCS=100.
Fixes#13613.
Change-Id: I3c0a4e14833f4760dab675efc1916e73b4c0032a
Reviewed-on: https://go-review.googlesource.com/34937
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
There are two accesses to mheap_.busy that are guarded by checks
against len(mheap_.free). This works because both lists are (and must
be) the same length, but it makes the code less clear. Change these to
use len(mheap_.busy) so the access more clearly parallels the check.
Fixes#18944.
Change-Id: I9bacbd3663988df351ed4396ae9018bc71018311
Reviewed-on: https://go-review.googlesource.com/36354
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
No point in computing this info on startup.
Compute it at build time.
This lets us spend more time computing & checking the size classes.
Improve the div magic for rounding to the start of an object.
We can now use 32-bit multiplies & shifts, which should help
32-bit platforms.
The static data is <1KB.
The actual size classes are not changed by this CL.
Change-Id: I6450cec7d1b2b4ad31fd3f945f504ed2ec6570e7
Reviewed-on: https://go-review.googlesource.com/32219
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Since barrier-less memclr is only safe in very narrow circumstances,
this commit renames memclr to avoid accidentally calling memclr on
typed memory. This can cause subtle, non-deterministic bugs, so it's
worth some effort to prevent. In the near term, this will also prevent
bugs creeping in from any concurrent CLs that add calls to memclr; if
this happens, whichever patch hits master second will fail to compile.
This also adds the other new memclr variants to the compiler's
builtin.go to minimize the churn on that binary blob. We'll use these
in future commits.
Updates #17503.
Change-Id: I00eead049f5bd35ca107ea525966831f3d1ed9ca
Reviewed-on: https://go-review.googlesource.com/31369
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently fixalloc does not zero memory it reuses. This is dangerous
with the hybrid barrier if the type may contain heap pointers, since
it may cause us to observe a dead heap pointer on reuse. It's also
error-prone since it's the only allocator that doesn't zero on
allocation (mallocgc of course zeroes, but so do persistentalloc and
sysAlloc). It's also largely pointless: for mcache, the caller
immediately memclrs the allocation; and the two specials types are
tiny so there's no real cost to zeroing them.
Change fixalloc to zero allocations by default.
The only type we don't zero by default is mspan. This actually
requires that the spsn's sweepgen survive across freeing and
reallocating a span. If we were to zero it, the following race would
be possible:
1. The current sweepgen is 2. Span s is on the unswept list.
2. Direct sweeping sweeps span s, finds it's all free, and releases s
to the fixalloc.
3. Thread 1 allocates s from fixalloc. Suppose this zeros s, including
s.sweepgen.
4. Thread 1 calls s.init, which sets s.state to _MSpanDead.
5. On thread 2, background sweeping comes across span s in allspans
and cas's s.sweepgen from 0 (sg-2) to 1 (sg-1). Now it thinks it
owns it for sweeping. 6. Thread 1 continues initializing s.
Everything breaks.
I would like to fix this because it's obviously confusing, but it's a
subtle enough problem that I'm leaving it alone for now. The solution
may be to skip sweepgen 0, but then we have to think about wrap-around
much more carefully.
Updates #17503.
Change-Id: Ie08691feed3abbb06a31381b94beb0a2e36a0613
Reviewed-on: https://go-review.googlesource.com/31368
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently the zero value of an mspan is in the "in use" state. This
seems like a bad idea in general. But it's going to wreak havoc when
we make fixalloc zero allocations: even "freed" mspan objects are
still on the allspans list and still get looked at by the garbage
collector. Hence, if we leave the mspan states the way they are,
allocating a span that reuses old memory will temporarily pass that
span (which is visible to GC!) through the "in use" state, which can
cause "unswept span" panics.
Fix all of this by making the zero state "dead".
Updates #17503.
Change-Id: I77c7ac06e297af4b9e6258bc091c37abe102acc3
Reviewed-on: https://go-review.googlesource.com/31367
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Now that sweeping and span marking use the sweep list, there's no need
for the work.spans snapshot of the allspans list. This change
eliminates the few remaining uses of it, which are either dead code or
can use allspans directly, and removes work.spans and its support
functions.
Change-Id: Id5388b42b1e68e8baee853d8eafb8bb4ff95bb43
Reviewed-on: https://go-review.googlesource.com/30537
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently sweeping walks the list of all spans, which means the work
in sweeping is proportional to the maximum number of spans ever used.
If the heap was once large but is now small, this causes an
amortization failure: on a small heap, GCs happen frequently, but a
full sweep still has to happen in each GC cycle, which means we spent
a lot of time in sweeping.
Fix this by creating a separate list consisting of just the in-use
spans to be swept, so sweeping is proportional to the number of in-use
spans (which is proportional to the live heap). Specifically, we
create two lists: a list of unswept in-use spans and a list of swept
in-use spans. At the start of the sweep cycle, the swept list becomes
the unswept list and the new swept list is empty. Allocating a new
in-use span adds it to the swept list. Sweeping moves spans from the
unswept list to the swept list.
This fixes the amortization problem because a shrinking heap moves
spans off the unswept list without adding them to the swept list,
reducing the time required by the next sweep cycle.
Updates #9265. This fix eliminates almost all of the time spent in
sweepone; however, markrootSpans has essentially the same bug, so now
the test program from this issue spends all of its time in
markrootSpans.
No significant effect on other benchmarks.
Change-Id: Ib382e82790aad907da1c127e62b3ab45d7a4ac1e
Reviewed-on: https://go-review.googlesource.com/30535
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Currently we set the len and cap of h.spans to the full reserved
region of the address space and track the actual mapped region
separately in h.spans_mapped. Since we have both the len and cap at
our disposal, change things so len(h.spans) tracks how much of the
spans array is mapped and eliminate h.spans_mapped. This simplifies
mheap and means we'll get nice "index out of bounds" exceptions if we
do try to go off the end of the spans rather than a SIGSEGV.
Change-Id: I8ed9a1a9a844d90e9fd2e269add4704623dbdfe6
Reviewed-on: https://go-review.googlesource.com/30533
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Like h_allspans and mheap_.allspans, these were two ways of referring
to the spans array from when the runtime was split between C and Go.
Clean this up by making mheap_.spans a slice and eliminating h_spans.
Change-Id: I3aa7038d53c3a4252050aa33e468c48dfed0b70e
Reviewed-on: https://go-review.googlesource.com/30532
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
This was necessary in the C days when allspans was an mspan**, but now
that allspans is a Go slice, this is redundant with len(allspans) and
we can use range loops over allspans.
Change-Id: Ie1dc39611e574e29a896e01690582933f4c5be7e
Reviewed-on: https://go-review.googlesource.com/30531
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
These are two ways to refer to the allspans array that hark back to
when the runtime was split between C and Go. Clean this up by making
mheap_.allspans a slice and eliminating h_allspans.
Change-Id: Ic9360d040cf3eb590b5dfbab0b82e8ace8525610
Reviewed-on: https://go-review.googlesource.com/30530
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>