Commit graph

160 commits

Author SHA1 Message Date
Michael Anthony Knyszek
2f99e889f0 runtime: de-duplicate coalescing code
Currently the code surrounding coalescing is duplicated between merging
with the span before the span considered for coalescing and merging with
the span after. This change factors out the shared portions of these
codepaths into a local closure which acts as a helper.

Change-Id: I7919fbed3f9a833eafb324a21a4beaa81f2eaa91
Reviewed-on: https://go-review.googlesource.com/c/158077
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-01-17 17:05:37 +00:00
Michael Anthony Knyszek
79ac638e41 runtime: refactor coalescing into its own method
The coalescing process is complex and in a follow-up change we'll need
to do it in more than one place, so this change factors out the
coalescing code in freeSpanLocked into a method on mheap.

Change-Id: Ia266b6cb1157c1b8d3d8a4287b42fbcc032bbf3a
Reviewed-on: https://go-review.googlesource.com/c/157838
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2019-01-17 16:59:49 +00:00
Michael Anthony Knyszek
4b3f04c63b runtime: make mTreap iterator bidirectional
This change makes mTreap's iterator type, treapIter, bidirectional
instead of unidirectional. This change helps support moving the find
operation on a treap to return an iterator instead of a treapNode, in
order to hide the details of the treap when accessing elements.

For #28479.

Change-Id: I5dbea4fd4fb9bede6e81bfd089f2368886f98943
Reviewed-on: https://go-review.googlesource.com/c/156918
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2019-01-10 18:15:48 +00:00
Michael Anthony Knyszek
064842450b runtime: allocate from free and scav fairly
This change modifies the behavior of span allocations to no longer
prefer the free treap over the scavenged treap.

While there is an additional cost to allocating out of the scavenged
treap, the current behavior of preferring the unscavenged spans can
lead to unbounded growth of a program's virtual memory footprint.

In small programs (low # of Ps, low resident set size, low allocation
rate) this behavior isn't really apparent and is difficult to
reproduce.

However, in relatively large, long-running programs we see this
unbounded growth in free spans, and an unbounded amount of heap
growths.

It still remains unclear how this policy change actually ends up
increasing the number of heap growths over time, but switching the
policy back to best-fit does indeed solve the problem.

Change-Id: Ibb88d24f9ef6766baaa7f12b411974cc03341e7b
Reviewed-on: https://go-review.googlesource.com/c/148979
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-12-17 23:28:36 +00:00
Michael Anthony Knyszek
3651476075 runtime: add iterator abstraction for mTreap
This change adds the treapIter type which provides an iterator
abstraction for walking over an mTreap. In particular, the mTreap type
now has iter() and rev() for iterating both forwards (smallest to
largest) and backwards (largest to smallest). It also has an erase()
method for erasing elements at the iterator's current position.

For #28479.

While the expectation is that this change will slow down Go programs,
the impact on Go1 and Garbage is negligible.

Go1:     https://perf.golang.org/search?q=upload:20181214.6
Garbage: https://perf.golang.org/search?q=upload:20181214.11

Change-Id: I60dbebbbe73cbbe7b78d45d2093cec12cc0bc649
Reviewed-on: https://go-review.googlesource.com/c/151537
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-12-17 23:28:18 +00:00
Austin Clements
b00a6d8bfe runtime: eliminate mheap.busy* lists
The old whole-page reclaimer was the only thing that used the busy
span lists. Remove them so nothing uses them any more.

Change-Id: I4007dd2be08b9ef41bfdb0c387215c73c392cc4c
Reviewed-on: https://go-review.googlesource.com/c/138960
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-15 19:27:13 +00:00
Austin Clements
5333550bdc runtime: implement efficient page reclaimer
When we attempt to allocate an N page span (either for a large
allocation or when an mcentral runs dry), we first try to sweep spans
to release N pages. Currently, this can be extremely expensive:
sweeping a span to emptiness is the hardest thing to ask for and the
sweeper generally doesn't know where to even look for potentially
fruitful results. Since this is on the critical path of many
allocations, this is unfortunate.

This CL changes how we reclaim empty spans. Instead of trying lots of
spans and hoping for the best, it uses the newly introduced span marks
to efficiently find empty spans. The span marks (and in-use bits) are
in a dense bitmap, so these spans can be found with an efficient
sequential memory scan. This approach can scan for unmarked spans at
about 300 GB/ms and can free unmarked spans at about 32 MB/ms. We
could probably significantly improve the rate at which is can free
unmarked spans, but that's a separate issue.

Like the current reclaimer, this is still linear in the number of
spans that are swept, but the constant factor is now so vanishingly
small that it doesn't matter.

The benchmark in #18155 demonstrates both significant page reclaiming
delays, and object reclaiming delays. With "-retain-count=20000000
-preallocate=true -loop-count=3", the benchmark demonstrates several
page reclaiming delays on the order of 40ms. After this change, the
page reclaims are insignificant. The longest sweeps are still ~150ms,
but are object reclaiming delays. We'll address those in the next
several CLs.

Updates #18155.

Fixes #21378 by completely replacing the logic that had that bug.

Change-Id: Iad80eec11d7fc262d02c8f0761ac6998425c4064
Reviewed-on: https://go-review.googlesource.com/c/138959
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-11-15 19:27:11 +00:00
Austin Clements
ba1698e963 runtime: mark span when marking any object on the span
This adds a mark bit for each span that is set if any objects on the
span are marked. This will be used for sweeping.

For #18155.

The impact of this is negligible for most benchmarks, and < 1% for
GC-heavy benchmarks.

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.18ms ± 0%  2.20ms ± 1%  +0.88%  (p=0.000 n=16+18)

(https://perf.golang.org/search?q=upload:20180928.1)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.68s ± 1%     2.68s ± 1%    ~     (p=0.707 n=17+19)
Fannkuch11-12                2.28s ± 0%     2.39s ± 0%  +4.95%  (p=0.000 n=19+18)
FmtFprintfEmpty-12          40.3ns ± 4%    39.4ns ± 2%  -2.27%  (p=0.000 n=17+18)
FmtFprintfString-12         67.9ns ± 1%    68.3ns ± 1%  +0.55%  (p=0.000 n=18+19)
FmtFprintfInt-12            75.7ns ± 1%    76.1ns ± 1%  +0.44%  (p=0.005 n=18+19)
FmtFprintfIntInt-12          123ns ± 1%     121ns ± 1%  -1.00%  (p=0.000 n=18+18)
FmtFprintfPrefixedInt-12     150ns ± 0%     148ns ± 0%  -1.33%  (p=0.000 n=16+13)
FmtFprintfFloat-12           208ns ± 0%     204ns ± 0%  -1.92%  (p=0.000 n=13+17)
FmtManyArgs-12               501ns ± 1%     498ns ± 0%  -0.55%  (p=0.000 n=19+17)
GobDecode-12                6.24ms ± 0%    6.25ms ± 1%    ~     (p=0.113 n=20+19)
GobEncode-12                5.33ms ± 0%    5.29ms ± 1%  -0.72%  (p=0.000 n=20+18)
Gzip-12                      220ms ± 1%     218ms ± 1%  -1.02%  (p=0.000 n=19+19)
Gunzip-12                   35.5ms ± 0%    35.7ms ± 0%  +0.45%  (p=0.000 n=16+18)
HTTPClientServer-12         77.9µs ± 1%    77.7µs ± 1%  -0.30%  (p=0.047 n=20+19)
JSONEncode-12               8.82ms ± 0%    8.93ms ± 0%  +1.20%  (p=0.000 n=18+17)
JSONDecode-12               47.3ms ± 0%    47.0ms ± 0%  -0.49%  (p=0.000 n=17+18)
Mandelbrot200-12            3.69ms ± 0%    3.68ms ± 0%  -0.25%  (p=0.000 n=19+18)
GoParse-12                  3.13ms ± 1%    3.13ms ± 1%    ~     (p=0.640 n=20+20)
RegexpMatchEasy0_32-12      76.2ns ± 1%    76.2ns ± 1%    ~     (p=0.818 n=20+19)
RegexpMatchEasy0_1K-12       226ns ± 0%     226ns ± 0%  -0.22%  (p=0.001 n=17+18)
RegexpMatchEasy1_32-12      71.9ns ± 1%    72.0ns ± 1%    ~     (p=0.653 n=18+18)
RegexpMatchEasy1_1K-12       355ns ± 1%     356ns ± 1%    ~     (p=0.160 n=18+19)
RegexpMatchMedium_32-12      106ns ± 1%     106ns ± 1%    ~     (p=0.325 n=17+20)
RegexpMatchMedium_1K-12     31.1µs ± 2%    31.2µs ± 0%  +0.59%  (p=0.007 n=19+15)
RegexpMatchHard_32-12       1.54µs ± 2%    1.53µs ± 2%  -0.78%  (p=0.021 n=17+18)
RegexpMatchHard_1K-12       46.0µs ± 1%    45.9µs ± 1%  -0.31%  (p=0.025 n=17+19)
Revcomp-12                   391ms ± 1%     394ms ± 2%  +0.80%  (p=0.000 n=17+19)
Template-12                 59.9ms ± 1%    59.9ms ± 1%    ~     (p=0.428 n=20+19)
TimeParse-12                 304ns ± 1%     312ns ± 0%  +2.88%  (p=0.000 n=20+17)
TimeFormat-12                318ns ± 0%     326ns ± 0%  +2.64%  (p=0.000 n=20+17)

(https://perf.golang.org/search?q=upload:20180928.2)

Change-Id: I336b9bf054113580a24103192904c8c76593e90e
Reviewed-on: https://go-review.googlesource.com/c/138958
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-15 19:27:09 +00:00
Austin Clements
69e666e4f7 runtime: record in-use spans in a page-indexed bitmap
This adds a bitmap indexed by page number that marks the starts of
in-use spans. This will be used to quickly find in-use spans with no
marked objects for sweeping.

For #18155.

Change-Id: Icee56f029cde502447193e136fa54a74c74326dd
Reviewed-on: https://go-review.googlesource.com/c/138957
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2018-11-15 19:27:06 +00:00
Austin Clements
e500ffd88c runtime: track all heap arenas in a slice
Currently, there's no efficient way to iterate over the Go heap. We're
going to need this for fast free page sweeping, so this CL adds a
slice of all allocated heap arenas. This will also be useful for
generational GC.

For #18155.

Change-Id: I58d126cfb9c3f61b3125d80b74ccb1b2169efbcc
Reviewed-on: https://go-review.googlesource.com/c/138076
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-11-15 19:14:10 +00:00
Michael Anthony Knyszek
3a7a56cc70 runtime: gofmt all improperly formatted code
This change fixes incorrect formatting in mheap.go (the result of my
previous heap scavenging changes) and map_test.go.

Change-Id: I2963687504abdc4f0cdf2f0c558174b3bc0ed2df
Reviewed-on: https://go-review.googlesource.com/c/148977
Run-TryBot: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-11 16:09:05 +00:00
Michael Anthony Knyszek
06be7cbf3c runtime: stop unnecessary span scavenges on free
This change fixes a bug wherein freeing a scavenged span that didn't
coalesce with any neighboring spans would result in that span getting
scavenged again. This case may actually be a common occurance because
"freeing" span trimmings and newly-grown spans end up using the same
codepath. On systems where madvise is relatively expensive, this can
have a large performance impact.

This change also cleans up some of this logic in freeSpanLocked since
a number of factors made the coalescing code somewhat difficult to
reason about with respect to scavenging. Notably, the way the
needsScavenge boolean is handled could be better expressed and the
inverted conditions (e.g. !after.released) can make things even more
confusing.

Fixes #28595.

Change-Id: I75228dba70b6596b90853020b7c24fbe7ab937cf
Reviewed-on: https://go-review.googlesource.com/c/147559
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-11-09 20:57:57 +00:00
Michael Anthony Knyszek
44dcb5cb61 runtime: clean up MSpan* MCache* MCentral* in docs
This change cleans up references to MSpan, MCache, and MCentral in the
docs via a bunch of sed invocations to better reflect the Go names for
the equivalent structures (i.e. mspan, mcache, mcentral) and their
methods (i.e. MSpan_Sweep -> mspan.sweep).

Change-Id: Ie911ac975a24bd25200a273086dd835ab78b1711
Reviewed-on: https://go-review.googlesource.com/c/147557
Reviewed-by: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-11-05 22:56:22 +00:00
Michael Anthony Knyszek
2ae8bf7054 runtime: fix stale comments about mheap and mspan
As of 07e738e all spans are allocated out of a treap, and not just
large spans or spans for large objects. Also, now we have a separate
treap for spans that have been scavenged.

Change-Id: I9c2cb7b6798fc536bbd34835da2e888224fd7ed4
Reviewed-on: https://go-review.googlesource.com/c/142958
Reviewed-by: Austin Clements <austin@google.com>
2018-11-05 19:30:42 +00:00
Michael Anthony Knyszek
c803ffc67d runtime: scavenge large spans before heap growth
This change scavenges the largest spans before growing the heap for
physical pages to "make up" for the newly-mapped space which,
presumably, will be touched.

In theory, this approach to scavenging helps reduce the RSS of an
application by marking fragments in memory as reclaimable to the OS
more eagerly than before. In practice this may not necessarily be
true, depending on how sysUnused is implemented for each platform.

Fixes #14045.

Change-Id: Iab60790be05935865fc71f793cb9323ab00a18bd
Reviewed-on: https://go-review.googlesource.com/c/139719
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-30 15:41:55 +00:00
Michael Anthony Knyszek
db82a1bc12 runtime: sysUsed spans after trimming
Currently, we mark a whole span as sysUsed before trimming, but this
unnecessarily tells the OS that the trimmed section from the span is
used when it may have been scavenged, if s was scavenged. Overall,
this just makes invocations of sysUsed a little more fine-grained.

It does come with the caveat that now heap_released needs to be managed
a little more carefully in allocSpanLocked. In this case, we choose to
(like before this change) negate any effect the span has on
heap_released before trimming, then add it back if the trimmed part is
scavengable.

For #14045.

Change-Id: Ifa384d989611398bfad3ca39d3bb595a5962a3ea
Reviewed-on: https://go-review.googlesource.com/c/140198
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-30 15:28:24 +00:00
Michael Anthony Knyszek
78bb91cbd3 runtime: remove npreleased in favor of boolean
This change removes npreleased from mspan since spans may now either be
scavenged or not scavenged; how many of its pages were actually scavenged
doesn't matter. It saves some space in mpsan overhead too, as the boolean
fits into what would otherwise be struct padding.

For #14045.

Change-Id: I63f25a4d98658f5fe21c6a466fc38c59bfc5d0f5
Reviewed-on: https://go-review.googlesource.com/c/139737
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-30 15:28:01 +00:00
Michael Anthony Knyszek
b46bf0240c runtime: separate scavenged spans
This change adds a new treap to mheap which contains scavenged (i.e.
its physical pages were returned to the OS) spans.

As of this change, spans may no longer be partially scavenged.

For #14045.

Change-Id: I0d428a255c6d3f710b9214b378f841b997df0993
Reviewed-on: https://go-review.googlesource.com/c/139298
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-30 15:27:51 +00:00
Michael Anthony Knyszek
07e738ec32 runtime: use only treaps for tracking spans
Currently, mheap tracks spans in both mSpanLists and mTreaps, but
mSpanLists, while they tend to be smaller, complicate the
implementation. Here we simplify the implementation by removing
free and busy from mheap and renaming freelarge -> free and busylarge
-> busy.

This change also slightly changes the reclamation policy. Previously,
for allocations under 1MB we would attempt to find a small span of the
right size. Now, we just try to find any number of spans totaling the
right size. This may increase heap fragmentation, but that will be dealt
with using virtual memory tricks in follow-up CLs.

For #14045.

Garbage-heavy benchmarks show very little change, except what appears
to be a decrease in STW times and peak RSS.

name                      old STW-ns/GC       new STW-ns/GC       delta
Garbage/benchmem-MB=64-8           263k ±64%           217k ±24%  -17.66%  (p=0.028 n=25+23)

name                      old STW-ns/op       new STW-ns/op       delta
Garbage/benchmem-MB=64-8          9.39k ±65%          7.80k ±24%  -16.88%  (p=0.037 n=25+23)

name                      old peak-RSS-bytes  new peak-RSS-bytes  delta
Garbage/benchmem-MB=64-8           281M ± 0%           249M ± 4%  -11.40%  (p=0.000 n=19+18)

https://perf.golang.org/search?q=upload:20181005.1

Go1 benchmarks perform roughly the same, the most notable regression
being the JSON encode/decode benchmark with worsens by ~2%.

name                     old time/op    new time/op    delta
BinaryTree17-8              3.02s ± 2%     2.99s ± 2%  -1.18%  (p=0.000 n=25+24)
Fannkuch11-8                3.05s ± 1%     3.02s ± 2%  -1.20%  (p=0.000 n=25+25)
FmtFprintfEmpty-8          43.6ns ± 5%    43.4ns ± 3%    ~     (p=0.528 n=25+25)
FmtFprintfString-8         74.9ns ± 3%    73.4ns ± 1%  -2.03%  (p=0.001 n=25+24)
FmtFprintfInt-8            79.3ns ± 3%    77.9ns ± 1%  -1.73%  (p=0.003 n=25+25)
FmtFprintfIntInt-8          119ns ± 6%     116ns ± 0%  -2.68%  (p=0.000 n=25+18)
FmtFprintfPrefixedInt-8     134ns ± 4%     132ns ± 1%  -1.52%  (p=0.004 n=25+25)
FmtFprintfFloat-8           240ns ± 1%     241ns ± 1%    ~     (p=0.403 n=24+23)
FmtManyArgs-8               543ns ± 1%     537ns ± 1%  -1.00%  (p=0.000 n=25+25)
GobDecode-8                6.88ms ± 1%    6.92ms ± 4%    ~     (p=0.088 n=24+22)
GobEncode-8                5.92ms ± 1%    5.93ms ± 1%    ~     (p=0.898 n=25+24)
Gzip-8                      267ms ± 2%     266ms ± 2%    ~     (p=0.213 n=25+24)
Gunzip-8                   35.4ms ± 1%    35.6ms ± 1%  +0.70%  (p=0.000 n=25+25)
HTTPClientServer-8          104µs ± 2%     104µs ± 2%    ~     (p=0.686 n=25+25)
JSONEncode-8               9.67ms ± 1%    9.80ms ± 4%  +1.32%  (p=0.000 n=25+25)
JSONDecode-8               47.7ms ± 1%    48.8ms ± 5%  +2.33%  (p=0.000 n=25+25)
Mandelbrot200-8            4.87ms ± 1%    4.91ms ± 1%  +0.79%  (p=0.000 n=25+25)
GoParse-8                  3.59ms ± 4%    3.55ms ± 1%    ~     (p=0.199 n=25+24)
RegexpMatchEasy0_32-8      90.3ns ± 1%    89.9ns ± 1%  -0.47%  (p=0.000 n=25+21)
RegexpMatchEasy0_1K-8       204ns ± 1%     204ns ± 1%    ~     (p=0.914 n=25+24)
RegexpMatchEasy1_32-8      84.9ns ± 0%    84.6ns ± 1%  -0.36%  (p=0.000 n=24+25)
RegexpMatchEasy1_1K-8       350ns ± 1%     348ns ± 3%  -0.59%  (p=0.007 n=25+25)
RegexpMatchMedium_32-8      122ns ± 1%     121ns ± 0%  -1.08%  (p=0.000 n=25+18)
RegexpMatchMedium_1K-8     36.1µs ± 1%    34.6µs ± 1%  -4.02%  (p=0.000 n=25+25)
RegexpMatchHard_32-8       1.69µs ± 2%    1.65µs ± 1%  -2.38%  (p=0.000 n=25+25)
RegexpMatchHard_1K-8       50.8µs ± 1%    49.4µs ± 1%  -2.69%  (p=0.000 n=25+24)
Revcomp-8                   453ms ± 2%     449ms ± 3%  -0.74%  (p=0.022 n=25+24)
Template-8                 63.2ms ± 2%    63.4ms ± 1%    ~     (p=0.127 n=25+24)
TimeParse-8                 313ns ± 1%     315ns ± 3%    ~     (p=0.924 n=24+25)
TimeFormat-8                294ns ± 1%     292ns ± 2%  -0.65%  (p=0.004 n=23+24)
[Geo mean]                 49.9µs         49.6µs       -0.65%

name                     old speed      new speed      delta
GobDecode-8               112MB/s ± 1%   110MB/s ± 4%  -1.00%  (p=0.036 n=24+24)
GobEncode-8               130MB/s ± 1%   129MB/s ± 1%    ~     (p=0.894 n=25+24)
Gzip-8                   72.7MB/s ± 2%  73.0MB/s ± 2%    ~     (p=0.208 n=25+24)
Gunzip-8                  549MB/s ± 1%   545MB/s ± 1%  -0.70%  (p=0.000 n=25+25)
JSONEncode-8              201MB/s ± 1%   198MB/s ± 3%  -1.29%  (p=0.000 n=25+25)
JSONDecode-8             40.7MB/s ± 1%  39.8MB/s ± 5%  -2.23%  (p=0.000 n=25+25)
GoParse-8                16.2MB/s ± 4%  16.3MB/s ± 1%    ~     (p=0.211 n=25+24)
RegexpMatchEasy0_32-8     354MB/s ± 1%   356MB/s ± 1%  +0.47%  (p=0.000 n=25+21)
RegexpMatchEasy0_1K-8    5.00GB/s ± 0%  4.99GB/s ± 1%    ~     (p=0.588 n=24+24)
RegexpMatchEasy1_32-8     377MB/s ± 1%   378MB/s ± 1%  +0.39%  (p=0.000 n=25+25)
RegexpMatchEasy1_1K-8    2.92GB/s ± 1%  2.94GB/s ± 3%  +0.65%  (p=0.008 n=25+25)
RegexpMatchMedium_32-8   8.14MB/s ± 1%  8.22MB/s ± 1%  +0.98%  (p=0.000 n=25+24)
RegexpMatchMedium_1K-8   28.4MB/s ± 1%  29.6MB/s ± 1%  +4.19%  (p=0.000 n=25+25)
RegexpMatchHard_32-8     18.9MB/s ± 2%  19.4MB/s ± 1%  +2.43%  (p=0.000 n=25+25)
RegexpMatchHard_1K-8     20.2MB/s ± 1%  20.7MB/s ± 1%  +2.76%  (p=0.000 n=25+24)
Revcomp-8                 561MB/s ± 2%   566MB/s ± 3%  +0.75%  (p=0.021 n=25+24)
Template-8               30.7MB/s ± 2%  30.6MB/s ± 1%    ~     (p=0.131 n=25+24)
[Geo mean]                120MB/s        121MB/s       +0.48%

https://perf.golang.org/search?q=upload:20181004.6

Change-Id: I97f9fee34577961a116a8ddd445c6272253f0f95
Reviewed-on: https://go-review.googlesource.com/c/139837
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-17 16:26:10 +00:00
Michael Anthony Knyszek
e508a5f072 runtime: de-duplicate span scavenging
Currently, span scavenging was done nearly identically in two different
locations. This change deduplicates that into one shared routine.

For #14045.

Change-Id: I15006b2c9af0e70b7a9eae9abb4168d3adca3860
Reviewed-on: https://go-review.googlesource.com/c/139297
Run-TryBot: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-10-17 16:25:42 +00:00
Austin Clements
3f86d7cc67 runtime: tidy mheap.freeSpan
freeSpan currently takes a mysterious "acct int32" argument. This is
really just a boolean and actually just needs to match the "large"
argument to alloc in order to balance out accounting.

To make this clearer, replace acct with a "large bool" argument that
must match the call to mheap.alloc.

Change-Id: Ibc81faefdf9f0583114e1953fcfb362e9c3c76de
Reviewed-on: https://go-review.googlesource.com/c/138655
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-10-09 16:43:18 +00:00
Igor Zhilianin
04dc1b2443 all: fix a bunch of misspellings
Change-Id: I94cebca86706e072fbe3be782d3edbe0e22b9432
GitHub-Last-Rev: 8e15a40545
GitHub-Pull-Request: golang/go#28067
Reviewed-on: https://go-review.googlesource.com/c/140437
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-10-08 03:12:03 +00:00
Austin Clements
415e948eae runtime: improve mheap.alloc doc and let compiler check system stack
The alloc_m documentation refers to concepts that don't exist (and
maybe never did?). alloc_m is also not the API entry point to span
allocation.

Hence, rewrite the documentation for alloc and alloc_m. While we're
here, document why alloc_m must run on the system stack and replace
alloc_m's hand-implemented system stack check with a go:systemstack
annotation.

Change-Id: I30e263d8e53c2774a6614e1b44df5464838cef09
Reviewed-on: https://go-review.googlesource.com/c/139459
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-10-05 16:05:17 +00:00
Keith Randall
cbafcc55e8 cmd/compile,runtime: implement stack objects
Rework how the compiler+runtime handles stack-allocated variables
whose address is taken.

Direct references to such variables work as before. References through
pointers, however, use a new mechanism. The new mechanism is more
precise than the old "ambiguously live" mechanism. It computes liveness
at runtime based on the actual references among objects on the stack.

Each function records all of its address-taken objects in a FUNCDATA.
These are called "stack objects". The runtime then uses that
information while scanning a stack to find all of the stack objects on
a stack. It then does a mark phase on the stack objects, using all the
pointers found on the stack (and ancillary structures, like defer
records) as the root set. Only stack objects which are found to be
live during this mark phase will be scanned and thus retain any heap
objects they point to.

A subsequent CL will remove all the "ambiguously live" logic from
the compiler, so that the stack object tracing will be required.
For this CL, the stack tracing is all redundant with the current
ambiguously live logic.

Update #22350

Change-Id: Ide19f1f71a5b6ec8c4d54f8f66f0e9a98344772f
Reviewed-on: https://go-review.googlesource.com/c/134155
Reviewed-by: Austin Clements <austin@google.com>
2018-10-03 19:52:49 +00:00
Austin Clements
873bd47dfb runtime: flush mcaches lazily
Currently, all mcaches are flushed during STW mark termination as a
root marking job. This is currently necessary because all spans must
be out of these caches before sweeping begins to avoid races with
allocation and to ensure the spans are in the state expected by
sweeping. We do it as a root marking job because mcache flushing is
somewhat expensive and O(GOMAXPROCS) and this parallelizes the work
across the Ps. However, it's also the last remaining root marking job
performed during mark termination.

This CL moves mcache flushing out of mark termination and performs it
lazily. We keep track of the last sweepgen at which each mcache was
flushed and as each P is woken from STW, it observes that its mcache
is out-of-date and flushes it.

The introduces a complication for spans cached in stale mcaches. These
may now be observed by background or proportional sweeping or when
attempting to add a finalizer, but aren't in a stable state. For
example, they are likely to be on the wrong mcentral list. To fix
this, this CL extends the sweepgen protocol to also capture whether a
span is cached and, if so, whether or not its cache is stale. This
protocol blocks asynchronous sweeping from touching cached spans and
makes it the responsibility of mcache flushing to sweep the flushed
spans.

This eliminates the last mark termination root marking job, which
means we can now eliminate that entire infrastructure.

Updates #26903. This implements lazy mcache flushing.

Change-Id: Iadda7aabe540b2026cffc5195da7be37d5b4125e
Reviewed-on: https://go-review.googlesource.com/c/134783
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:35 +00:00
Austin Clements
d398dbdfc3 runtime: eliminate gcBlackenPromptly mode
Now that there is no mark 2 phase, gcBlackenPromptly is no longer
used.

Updates #26903. This is a follow-up to eliminating mark 2.

Change-Id: Ib9c534f21b36b8416fcf3cab667f186167b827f8
Reviewed-on: https://go-review.googlesource.com/c/134319
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-10-02 20:35:21 +00:00
Austin Clements
5a8c11ce3e runtime: rename _MSpan* constants to mSpan*
We already aliased mSpanInUse to _MSpanInUse. The dual constants are
getting annoying, so fix all of these to use the mSpan* naming
convention.

This was done automatically with:
  sed -i -re 's/_?MSpan(Dead|InUse|Manual|Free)/mSpan\1/g' *.go
plus deleting the existing definition of mSpanInUse.

Change-Id: I09979d9d491d06c10689cea625dc57faa9cc6767
Reviewed-on: https://go-review.googlesource.com/137875
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-09-26 20:51:07 +00:00
Martin Möhrmann
961eb13b67 runtime: replace sys.CacheLineSize by corresponding internal/cpu const and vars
sys here is runtime/internal/sys.

Replace uses of sys.CacheLineSize for padding by
cpu.CacheLinePad or cpu.CacheLinePadSize.
Replace other uses of sys.CacheLineSize by cpu.CacheLineSize.
Remove now unused sys.CacheLineSize.

Updates #25203

Change-Id: I1daf410fe8f6c0493471c2ceccb9ca0a5a75ed8f
Reviewed-on: https://go-review.googlesource.com/126601
Run-TryBot: Martin Möhrmann <moehrmann@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-08-24 18:28:25 +00:00
Austin Clements
ec25210564 runtime: support a two-level arena map
Currently, the heap arena map is a single, large array that covers
every possible arena frame in the entire address space. This is
practical up to about 48 bits of address space with 64 MB arenas.

However, there are two problems with this:

1. mips64, ppc64, and s390x support full 64-bit address spaces (though
   on Linux only s390x has kernel support for 64-bit address spaces).
   On these platforms, it would be good to support these larger
   address spaces.

2. On Windows, processes are charged for untouched memory, so for
   processes with small heaps, the mostly-untouched 32 MB arena map
   plus a 64 MB arena are significant overhead. Hence, it would be
   good to reduce both the arena map size and the arena size, but with
   a single-level arena, these are inversely proportional.

This CL adds support for a two-level arena map. Arena frame numbers
are now divided into arenaL1Bits of L1 index and arenaL2Bits of L2
index.

At the moment, arenaL1Bits is always 0, so we effectively have a
single level map. We do a few things so that this has no cost beyond
the current single-level map:

1. We embed the L2 array directly in mheap, so if there's a single
   entry in the L2 array, the representation is identical to the
   current representation and there's no extra level of indirection.

2. Hot code that accesses the arena map is structured so that it
   optimizes to nearly the same machine code as it does currently.

3. We make some small tweaks to hot code paths and to the inliner
   itself to keep some important functions inlined despite their
   now-larger ASTs. In particular, this is necessary for
   heapBitsForAddr and heapBits.next.

Possibly as a result of some of the tweaks, this actually slightly
improves the performance of the x/benchmarks garbage benchmark:

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.28ms ± 1%  2.26ms ± 1%  -1.07%  (p=0.000 n=17+19)

(https://perf.golang.org/search?q=upload:20180223.2)

For #23900.

Change-Id: If5164e0961754f97eb9eca58f837f36d759505ff
Reviewed-on: https://go-review.googlesource.com/96779
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:50 +00:00
Austin Clements
33b76920ec runtime: rename "arena index" to "arena map"
There are too many places where I want to talk about "indexing into
the arena index". Make this less awkward and ambiguous by calling it
the "arena map" instead.

Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
Reviewed-on: https://go-review.googlesource.com/96777
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:48 +00:00
Jerrin Shaji George
5b3cd56038 runtime: fix a few typos in comments
Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
Reviewed-on: https://go-review.googlesource.com/96475
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-23 00:17:20 +00:00
Austin Clements
ea8d7a370d runtime: clarify address space limit constants and comments
Now that we support the full non-contiguous virtual address space of
amd64 hardware, some of the comments and constants related to this are
out of date.

This renames memLimitBits to heapAddrBits because 1<<memLimitBits is
no longer the limit of the address space and rewrites the comment to
focus first on hardware limits (which span OSes) and then discuss
kernel limits.

Second, this eliminates the memLimit constant because there's no
longer a meaningful "highest possible heap pointer value" on amd64.

Updates #23862.

Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11
Reviewed-on: https://go-review.googlesource.com/95498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:36 +00:00
Austin Clements
ed1959c6e6 runtime: offset the heap arena index by 2^47 on amd64
On amd64, the virtual address space, when interpreted as signed
values, is [-2^47, 2^47). Currently, we only support heap addresses in
the "positive" half of this, [0, 2^47). This suffices for linux/amd64
and windows/amd64, but solaris/amd64 can map user addresses in the
negative part of this range. Specifically, addresses
0xFFFF8000'00000000 to 0xFFFFFD80'00000000 are part of user space.
This leads to "memory allocated by OS not in usable address space"
panic, since we don't map heap arena index space for these addresses.

Fix this by offsetting addresses when computing arena indexes so that
arena entry 0 corresponds to address -2^47 on amd64. We already map
enough arena space for 2^48 heap addresses on 64-bit (because arm64's
virtual address space is [0, 2^48)), so we don't need to grow any
structures to support this.

A different approach would be to simply mask out the top 16 bits.
However, there are two advantages to the offset approach: 1) invalid
heap addresses continue to naturally map to invalid arena indexes so
we don't need extra checks and 2) it perturbs the mapping of addresses
to arena indexes more, which helps check that we don't accidentally
compute incorrect arena indexes somewhere that happen to be right most
of the time.

Several comments and constant names are now somewhat misleading. We'll
fix that in the next CL. This CL is the core change the arena
indexing.

Fixes #23862.

Change-Id: Idb8e299fded04593a286b01a9582da6ddbac2f9a
Reviewed-on: https://go-review.googlesource.com/95497
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:35 +00:00
Austin Clements
e9db7b9dd1 runtime: abstract indexing of arena index
Accessing the arena index is about to get slightly more complicated.
Abstract this away into a set of functions for going back and forth
between addresses and arena slice indexes.

For #23862.

Change-Id: I0b20e74ef47a07b78ed0cf0a6128afe6f6e40f4b
Reviewed-on: https://go-review.googlesource.com/95496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:34 +00:00
Ryuma Yoshida
8fc25b531b all: remove duplicate word "the"
Change-Id: Ia5908e94a6bd362099ca3c63f6ffb7e94457131d
GitHub-Last-Rev: 545a40571a
GitHub-Pull-Request: golang/go#23942
Reviewed-on: https://go-review.googlesource.com/95435
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-20 16:45:55 +00:00
Austin Clements
51ae88ee2f runtime: remove non-reserved heap logic
Currently large sysReserve calls on some OSes don't actually reserve
the memory, but just check that it can be reserved. This was important
when we called sysReserve to "reserve" many gigabytes for the heap up
front, but now that we map memory in small increments as we need it,
this complication is no longer necessary.

This has one curious side benefit: currently, on Linux, allocations
that are large enough to be rejected by mmap wind up freezing the
application for a long time before it panics. This happens because
sysReserve doesn't reserve the memory, so sysMap calls mmap_fixed,
which calls mmap, which fails because the mapping is too large.
However, mmap_fixed doesn't inspect *why* mmap fails, so it falls back
to probing every page in the desired region individually with mincore
before performing an (otherwise dangerous) MAP_FIXED mapping, which
will also fail. This takes a long time for a large region. Now this
logic is gone, so the mmap failure leads to an immediate panic.

Updates #10460.

Change-Id: I8efe88c611871cdb14f99fadd09db83e0161ca2e
Reviewed-on: https://go-review.googlesource.com/85888
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:24 +00:00
Austin Clements
2b415549b8 runtime: use sparse mappings for the heap
This replaces the contiguous heap arena mapping with a potentially
sparse mapping that can support heap mappings anywhere in the address
space.

This has several advantages over the current approach:

* There is no longer any limit on the size of the Go heap. (Currently
  it's limited to 512GB.) Hence, this fixes #10460.

* It eliminates many failures modes of heap initialization and
  growing. In particular it eliminates any possibility of panicking
  with an address space conflict. This can happen for many reasons and
  even causes a low but steady rate of TSAN test failures because of
  conflicts with the TSAN runtime. See #16936 and #11993.

* It eliminates the notion of "non-reserved" heap, which was added
  because creating huge address space reservations (particularly on
  64-bit) led to huge process VSIZE. This was at best confusing and at
  worst conflicted badly with ulimit -v. However, the non-reserved
  heap logic is complicated, can race with other mappings in non-pure
  Go binaries (e.g., #18976), and requires that the entire heap be
  either reserved or non-reserved. We currently maintain the latter
  property, but it's quite difficult to convince yourself of that, and
  hence difficult to keep correct. This logic is still present, but
  will be removed in the next CL.

* It fixes problems on 32-bit where skipping over parts of the address
  space leads to mapping huge (and never-to-be-used) metadata
  structures. See #19831.

This also completely rewrites and significantly simplifies
mheap.sysAlloc, which has been a source of many bugs. E.g., #21044,
 #20259, #18651, and #13143 (and maybe #23222).

This change also makes it possible to allocate individual objects
larger than 512GB. As a result, a few tests that expected huge
allocations to fail needed to be changed to make even larger
allocations. However, at the moment attempting to allocate a humongous
object may cause the program to freeze for several minutes on Linux as
we fall back to probing every page with addrspace_free. That logic
(and this failure mode) will be removed in the next CL.

Fixes #10460.
Fixes #22204 (since it rewrites the code involved).

This slightly slows down compilebench and the x/benchmarks garbage
benchmark.

name       old time/op     new time/op     delta
Template       184ms ± 1%      185ms ± 1%    ~     (p=0.065 n=10+9)
Unicode       86.9ms ± 3%     86.3ms ± 1%    ~     (p=0.631 n=10+10)
GoTypes        599ms ± 0%      602ms ± 0%  +0.56%  (p=0.000 n=10+9)
Compiler       2.87s ± 1%      2.89s ± 1%  +0.51%  (p=0.002 n=9+10)
SSA            7.29s ± 1%      7.25s ± 1%    ~     (p=0.182 n=10+9)
Flate          118ms ± 2%      118ms ± 1%    ~     (p=0.113 n=9+9)
GoParser       147ms ± 1%      148ms ± 1%  +1.07%  (p=0.003 n=9+10)
Reflect        401ms ± 1%      404ms ± 1%  +0.71%  (p=0.003 n=10+9)
Tar            175ms ± 1%      175ms ± 1%    ~     (p=0.604 n=9+10)
XML            209ms ± 1%      210ms ± 1%    ~     (p=0.052 n=10+10)

(https://perf.golang.org/search?q=upload:20171231.4)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.23ms ± 1%  2.25ms ± 1%  +0.84%  (p=0.000 n=19+19)

(https://perf.golang.org/search?q=upload:20171231.3)

Relative to the start of the sparse heap changes (starting at and
including "runtime: fix various contiguous bitmap assumptions"),
overall slowdown is roughly 1% on GC-intensive benchmarks:

name        old time/op     new time/op     delta
Template        183ms ± 1%      185ms ± 1%  +1.32%  (p=0.000 n=9+9)
Unicode        84.9ms ± 2%     86.3ms ± 1%  +1.65%  (p=0.000 n=9+10)
GoTypes         595ms ± 1%      602ms ± 0%  +1.19%  (p=0.000 n=9+9)
Compiler        2.86s ± 0%      2.89s ± 1%  +0.91%  (p=0.000 n=9+10)
SSA             7.19s ± 0%      7.25s ± 1%  +0.75%  (p=0.000 n=8+9)
Flate           117ms ± 1%      118ms ± 1%  +1.10%  (p=0.000 n=10+9)
GoParser        146ms ± 2%      148ms ± 1%  +1.48%  (p=0.002 n=10+10)
Reflect         398ms ± 1%      404ms ± 1%  +1.51%  (p=0.000 n=10+9)
Tar             173ms ± 1%      175ms ± 1%  +1.17%  (p=0.000 n=10+10)
XML             208ms ± 1%      210ms ± 1%  +0.62%  (p=0.011 n=10+10)
[Geo mean]      369ms           373ms       +1.17%

(https://perf.golang.org/search?q=upload:20180101.2)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.22ms ± 1%  2.25ms ± 1%  +1.51%  (p=0.000 n=20+19)

(https://perf.golang.org/search?q=upload:20180101.3)

Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0
Reviewed-on: https://go-review.googlesource.com/85887
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:23 +00:00
Austin Clements
d6e8218581 runtime: make span map sparse
This splits the span map into separate chunks for every 64MB of the
heap. The span map chunks now live in the same indirect structure as
the bitmap.

Updates #10460.

This causes a slight improvement in compilebench and the x/benchmarks
garbage benchmark. I'm not sure why it improves performance.

name       old time/op     new time/op     delta
Template       185ms ± 1%      184ms ± 1%    ~            (p=0.315 n=9+10)
Unicode       86.9ms ± 1%     86.9ms ± 3%    ~            (p=0.356 n=9+10)
GoTypes        602ms ± 1%      599ms ± 0%  -0.59%         (p=0.002 n=9+10)
Compiler       2.89s ± 0%      2.87s ± 1%  -0.50%          (p=0.003 n=9+9)
SSA            7.25s ± 0%      7.29s ± 1%    ~            (p=0.400 n=9+10)
Flate          118ms ± 1%      118ms ± 2%    ~            (p=0.065 n=10+9)
GoParser       147ms ± 2%      147ms ± 1%    ~            (p=0.549 n=10+9)
Reflect        403ms ± 1%      401ms ± 1%  -0.47%         (p=0.035 n=9+10)
Tar            176ms ± 1%      175ms ± 1%  -0.59%         (p=0.013 n=10+9)
XML            211ms ± 1%      209ms ± 1%  -0.83%        (p=0.011 n=10+10)

(https://perf.golang.org/search?q=upload:20171231.1)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.24ms ± 1%  2.23ms ± 1%  -0.36%  (p=0.001 n=20+19)

(https://perf.golang.org/search?q=upload:20171231.2)

Change-Id: I2563f8704ab9812434947faf293c5327f9b0d07a
Reviewed-on: https://go-review.googlesource.com/85885
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:20 +00:00
Austin Clements
0de5324d61 runtime: abstract remaining mheap.spans access
This abstracts the remaining direct accesses to mheap.spans into new
mheap.setSpan and mheap.setSpans methods.

For #10460.

Change-Id: Id1db8bc5e34a77a9221032aa2e62d05322707364
Reviewed-on: https://go-review.googlesource.com/85884
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:19 +00:00
Austin Clements
c0392d2e7f runtime: make the heap bitmap sparse
This splits the heap bitmap into separate chunks for every 64MB of the
heap and introduces an index mapping from virtual address to metadata.
It modifies the heapBits abstraction to use this two-level structure.
Finally, it modifies heapBitsSetType to unroll the bitmap into the
object itself and then copy it out if the bitmap would span
discontiguous bitmap chunks.

This is a step toward supporting general sparse heaps, which will
eliminate address space conflict failures as well as the limit on the
heap size.

It's also advantageous for 32-bit. 32-bit already supports
discontiguous heaps by always starting the arena at address 0.
However, as a result, with a contiguous bitmap, if the kernel chooses
a high address (near 2GB) for a heap mapping, the runtime is forced to
map up to 128MB of heap bitmap. Now the runtime can map sections of
the bitmap for just the parts of the address space used by the heap.

Updates #10460.

This slightly slows down the x/garbage and compilebench benchmarks.
However, I think the slowdown is acceptably small.

name        old time/op     new time/op     delta
Template        178ms ± 1%      180ms ± 1%  +0.78%    (p=0.029 n=10+10)
Unicode        85.7ms ± 2%     86.5ms ± 2%    ~       (p=0.089 n=10+10)
GoTypes         594ms ± 0%      599ms ± 1%  +0.70%    (p=0.000 n=9+9)
Compiler        2.86s ± 0%      2.87s ± 0%  +0.40%    (p=0.001 n=9+9)
SSA             7.23s ± 2%      7.29s ± 2%  +0.94%    (p=0.029 n=10+10)
Flate           116ms ± 1%      117ms ± 1%  +0.99%    (p=0.000 n=9+9)
GoParser        146ms ± 1%      146ms ± 0%    ~       (p=0.193 n=10+7)
Reflect         399ms ± 0%      403ms ± 1%  +0.89%    (p=0.001 n=10+10)
Tar             173ms ± 1%      174ms ± 1%  +0.91%    (p=0.013 n=10+9)
XML             208ms ± 1%      210ms ± 1%  +0.93%    (p=0.000 n=10+10)
[Geo mean]      368ms           371ms       +0.79%

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.17ms ± 1%  2.21ms ± 1%  +2.15%  (p=0.000 n=20+20)

Change-Id: I037fd283221976f4f61249119d6b97b100bcbc66
Reviewed-on: https://go-review.googlesource.com/85883
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:18 +00:00
Austin Clements
29e9c4d4a4 runtime: lay out heap bitmap forward in memory
Currently the heap bitamp is laid in reverse order in memory relative
to the heap itself. This was originally done out of "excessive
cleverness" so that computing a bitmap pointer could load only the
arena_start field and so that heaps could be more contiguous by
growing the arena and the bitmap out from a common center point.

However, this appears to have no actual performance benefit, it
complicates nearly every use of the bitmap, and it makes already
confusing code more confusing. Furthermore, it's still possible to use
a single field (the new bitmap_delta) for the bitmap pointer
computation by employing slightly different excessive cleverness.

Hence, this CL puts the bitmap into forward order.

This is a (very) updated version of CL 9404.

Change-Id: I743587cc626c4ecd81e660658bad85b54584108c
Reviewed-on: https://go-review.googlesource.com/85881
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:16 +00:00
Austin Clements
4de468621a runtime: use spanOf* more widely
The logic in the spanOf* functions is open-coded in a lot of places
right now. Replace these with calls to the spanOf* functions.

Change-Id: I3cc996aceb9a529b60fea7ec6fef22008c012978
Reviewed-on: https://go-review.googlesource.com/85880
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:15 +00:00
Austin Clements
a90f9a00ca runtime: consolidate mheap.lookup* and spanOf*
I think we'd forgotten about the mheap.lookup APIs when we introduced
spanOf*, but, at any rate, the spanOf* functions are used far more
widely at this point, so this CL eliminates the mheap.lookup*
functions in favor of spanOf*.

Change-Id: I15facd0856e238bb75d990e838a092b5bef5bdfc
Reviewed-on: https://go-review.googlesource.com/85879
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:14 +00:00
Austin Clements
058bb7ea27 runtime: split object finding out of heapBitsForObject
heapBitsForObject does two things: it finds the base of the object and
it creates the heapBits for the base of the object. There are several
places where we just care about the base of the object. Furthermore,
greyobject only needs the heapBits in the checkmark path and can
easily compute them only when needed. Once we eliminate passing the
heap bits to grayobject, almost all uses of heapBitsForObject don't
need the heap bits.

Hence, this splits heapBitsForObject into findObject and
heapBitsForAddr (the latter already exists), removes the hbits
argument to grayobject, and replaces all heapBitsForObject calls with
calls to findObject.

In addition to making things cleaner overall, heapBitsForAddr is going
to get more expensive shortly, so it's important that we don't do it
needlessly.

Note that there's an interesting performance pitfall here. I had
originally moved findObject to mheap.go, since it made more sense
there. However, that leads to a ~2% slow down and a whopping 11%
increase in L1 icache misses on both the x/garbage and compilebench
benchmarks. This suggests we may want to be more principled about
this, but, for now, let's just leave findObject in mbitmap.go.

(I tried to make findObject small enough to inline by splitting out
the error case, but, sadly, wasn't quite able to get it under the
inlining budget.)

Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10
Reviewed-on: https://go-review.googlesource.com/85878
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:13 +00:00
Austin Clements
41e6abdc61 runtime: replace mlookup and findObject with heapBitsForObject
These functions all serve essentially the same purpose. mlookup is
used in only one place and findObject in only three. Use
heapBitsForObject instead, which is the most optimized implementation.

(This may seem slightly silly because none of these uses care about
the heap bits, but we're about to split up the functionality of
heapBitsForObject anyway. At that point, findObject will rise from the
ashes.)

Change-Id: I906468c972be095dd23cf2404a7d4434e802f250
Reviewed-on: https://go-review.googlesource.com/85877
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:12 +00:00
Austin Clements
164e1b8477 runtime: eliminate remaining recordspan write barriers
recordspan has two remaining write barriers from writing to the
pointer to the backing store of h.allspans. However, h.allspans is
always backed by off-heap memory, so let the compiler know this.
Unfortunately, this isn't quite as clean as most go:notinheap uses
because we can't directly name the backing store of a slice, but we
can get it done with some judicious casting.

For #22460.

Change-Id: I296f92fa41cf2cb6ae572b35749af23967533877
Reviewed-on: https://go-review.googlesource.com/73414
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-29 20:22:00 +00:00
Keith Randall
97d17fcfd1 runtime: force the type of specialfinalizer into DWARF
The core dump reader wants to know the layout of this type.
No variable has this type, so it wasn't previously dumped
to DWARF output.

Change-Id: I982040b81bff202976743edc7fe53247533a9d81
Reviewed-on: https://go-review.googlesource.com/68312
Reviewed-by: Austin Clements <austin@google.com>
2017-10-05 20:03:42 +00:00
Kunpei Sakai
5a986eca86 all: fix article typos
a -> an

Change-Id: I7362bdc199e83073a712be657f5d9ba16df3077e
Reviewed-on: https://go-review.googlesource.com/63850
Reviewed-by: Rob Pike <r@golang.org>
2017-09-15 02:39:16 +00:00
Daniel Martí
59413d34c9 all: unindent some big chunks of code
Found with mvdan.cc/unindent. Prioritized the ones with the biggest wins
for now.

Change-Id: I2b032e45cdd559fc9ed5b1ee4c4de42c4c92e07b
Reviewed-on: https://go-review.googlesource.com/56470
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-18 06:59:48 +00:00
Austin Clements
53f2d53450 runtime: document concurrency of mheap.spans
We use lock-free reads from mheap.spans, but the safety of these is
somewhat subtle. Document this.

Change-Id: I928c893232176135308e38bed788d5f84ff11533
Reviewed-on: https://go-review.googlesource.com/54310
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-08-09 16:06:23 +00:00