Commit graph

131 commits

Author SHA1 Message Date
Austin Clements
33b76920ec runtime: rename "arena index" to "arena map"
There are too many places where I want to talk about "indexing into
the arena index". Make this less awkward and ambiguous by calling it
the "arena map" instead.

Change-Id: I726b0667bb2139dbc006175a0ec09a871cdf73f9
Reviewed-on: https://go-review.googlesource.com/96777
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-23 21:59:48 +00:00
Jerrin Shaji George
5b3cd56038 runtime: fix a few typos in comments
Change-Id: I07a1eb02ffc621c5696b49491181300bf411f822
Reviewed-on: https://go-review.googlesource.com/96475
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-23 00:17:20 +00:00
Austin Clements
ea8d7a370d runtime: clarify address space limit constants and comments
Now that we support the full non-contiguous virtual address space of
amd64 hardware, some of the comments and constants related to this are
out of date.

This renames memLimitBits to heapAddrBits because 1<<memLimitBits is
no longer the limit of the address space and rewrites the comment to
focus first on hardware limits (which span OSes) and then discuss
kernel limits.

Second, this eliminates the memLimit constant because there's no
longer a meaningful "highest possible heap pointer value" on amd64.

Updates #23862.

Change-Id: I44b32033d2deb6b69248fb8dda14fc0e65c47f11
Reviewed-on: https://go-review.googlesource.com/95498
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:36 +00:00
Austin Clements
ed1959c6e6 runtime: offset the heap arena index by 2^47 on amd64
On amd64, the virtual address space, when interpreted as signed
values, is [-2^47, 2^47). Currently, we only support heap addresses in
the "positive" half of this, [0, 2^47). This suffices for linux/amd64
and windows/amd64, but solaris/amd64 can map user addresses in the
negative part of this range. Specifically, addresses
0xFFFF8000'00000000 to 0xFFFFFD80'00000000 are part of user space.
This leads to "memory allocated by OS not in usable address space"
panic, since we don't map heap arena index space for these addresses.

Fix this by offsetting addresses when computing arena indexes so that
arena entry 0 corresponds to address -2^47 on amd64. We already map
enough arena space for 2^48 heap addresses on 64-bit (because arm64's
virtual address space is [0, 2^48)), so we don't need to grow any
structures to support this.

A different approach would be to simply mask out the top 16 bits.
However, there are two advantages to the offset approach: 1) invalid
heap addresses continue to naturally map to invalid arena indexes so
we don't need extra checks and 2) it perturbs the mapping of addresses
to arena indexes more, which helps check that we don't accidentally
compute incorrect arena indexes somewhere that happen to be right most
of the time.

Several comments and constant names are now somewhat misleading. We'll
fix that in the next CL. This CL is the core change the arena
indexing.

Fixes #23862.

Change-Id: Idb8e299fded04593a286b01a9582da6ddbac2f9a
Reviewed-on: https://go-review.googlesource.com/95497
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:35 +00:00
Austin Clements
e9db7b9dd1 runtime: abstract indexing of arena index
Accessing the arena index is about to get slightly more complicated.
Abstract this away into a set of functions for going back and forth
between addresses and arena slice indexes.

For #23862.

Change-Id: I0b20e74ef47a07b78ed0cf0a6128afe6f6e40f4b
Reviewed-on: https://go-review.googlesource.com/95496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-21 20:32:34 +00:00
Ryuma Yoshida
8fc25b531b all: remove duplicate word "the"
Change-Id: Ia5908e94a6bd362099ca3c63f6ffb7e94457131d
GitHub-Last-Rev: 545a40571a
GitHub-Pull-Request: golang/go#23942
Reviewed-on: https://go-review.googlesource.com/95435
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2018-02-20 16:45:55 +00:00
Austin Clements
51ae88ee2f runtime: remove non-reserved heap logic
Currently large sysReserve calls on some OSes don't actually reserve
the memory, but just check that it can be reserved. This was important
when we called sysReserve to "reserve" many gigabytes for the heap up
front, but now that we map memory in small increments as we need it,
this complication is no longer necessary.

This has one curious side benefit: currently, on Linux, allocations
that are large enough to be rejected by mmap wind up freezing the
application for a long time before it panics. This happens because
sysReserve doesn't reserve the memory, so sysMap calls mmap_fixed,
which calls mmap, which fails because the mapping is too large.
However, mmap_fixed doesn't inspect *why* mmap fails, so it falls back
to probing every page in the desired region individually with mincore
before performing an (otherwise dangerous) MAP_FIXED mapping, which
will also fail. This takes a long time for a large region. Now this
logic is gone, so the mmap failure leads to an immediate panic.

Updates #10460.

Change-Id: I8efe88c611871cdb14f99fadd09db83e0161ca2e
Reviewed-on: https://go-review.googlesource.com/85888
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:24 +00:00
Austin Clements
2b415549b8 runtime: use sparse mappings for the heap
This replaces the contiguous heap arena mapping with a potentially
sparse mapping that can support heap mappings anywhere in the address
space.

This has several advantages over the current approach:

* There is no longer any limit on the size of the Go heap. (Currently
  it's limited to 512GB.) Hence, this fixes #10460.

* It eliminates many failures modes of heap initialization and
  growing. In particular it eliminates any possibility of panicking
  with an address space conflict. This can happen for many reasons and
  even causes a low but steady rate of TSAN test failures because of
  conflicts with the TSAN runtime. See #16936 and #11993.

* It eliminates the notion of "non-reserved" heap, which was added
  because creating huge address space reservations (particularly on
  64-bit) led to huge process VSIZE. This was at best confusing and at
  worst conflicted badly with ulimit -v. However, the non-reserved
  heap logic is complicated, can race with other mappings in non-pure
  Go binaries (e.g., #18976), and requires that the entire heap be
  either reserved or non-reserved. We currently maintain the latter
  property, but it's quite difficult to convince yourself of that, and
  hence difficult to keep correct. This logic is still present, but
  will be removed in the next CL.

* It fixes problems on 32-bit where skipping over parts of the address
  space leads to mapping huge (and never-to-be-used) metadata
  structures. See #19831.

This also completely rewrites and significantly simplifies
mheap.sysAlloc, which has been a source of many bugs. E.g., #21044,
 #20259, #18651, and #13143 (and maybe #23222).

This change also makes it possible to allocate individual objects
larger than 512GB. As a result, a few tests that expected huge
allocations to fail needed to be changed to make even larger
allocations. However, at the moment attempting to allocate a humongous
object may cause the program to freeze for several minutes on Linux as
we fall back to probing every page with addrspace_free. That logic
(and this failure mode) will be removed in the next CL.

Fixes #10460.
Fixes #22204 (since it rewrites the code involved).

This slightly slows down compilebench and the x/benchmarks garbage
benchmark.

name       old time/op     new time/op     delta
Template       184ms ± 1%      185ms ± 1%    ~     (p=0.065 n=10+9)
Unicode       86.9ms ± 3%     86.3ms ± 1%    ~     (p=0.631 n=10+10)
GoTypes        599ms ± 0%      602ms ± 0%  +0.56%  (p=0.000 n=10+9)
Compiler       2.87s ± 1%      2.89s ± 1%  +0.51%  (p=0.002 n=9+10)
SSA            7.29s ± 1%      7.25s ± 1%    ~     (p=0.182 n=10+9)
Flate          118ms ± 2%      118ms ± 1%    ~     (p=0.113 n=9+9)
GoParser       147ms ± 1%      148ms ± 1%  +1.07%  (p=0.003 n=9+10)
Reflect        401ms ± 1%      404ms ± 1%  +0.71%  (p=0.003 n=10+9)
Tar            175ms ± 1%      175ms ± 1%    ~     (p=0.604 n=9+10)
XML            209ms ± 1%      210ms ± 1%    ~     (p=0.052 n=10+10)

(https://perf.golang.org/search?q=upload:20171231.4)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.23ms ± 1%  2.25ms ± 1%  +0.84%  (p=0.000 n=19+19)

(https://perf.golang.org/search?q=upload:20171231.3)

Relative to the start of the sparse heap changes (starting at and
including "runtime: fix various contiguous bitmap assumptions"),
overall slowdown is roughly 1% on GC-intensive benchmarks:

name        old time/op     new time/op     delta
Template        183ms ± 1%      185ms ± 1%  +1.32%  (p=0.000 n=9+9)
Unicode        84.9ms ± 2%     86.3ms ± 1%  +1.65%  (p=0.000 n=9+10)
GoTypes         595ms ± 1%      602ms ± 0%  +1.19%  (p=0.000 n=9+9)
Compiler        2.86s ± 0%      2.89s ± 1%  +0.91%  (p=0.000 n=9+10)
SSA             7.19s ± 0%      7.25s ± 1%  +0.75%  (p=0.000 n=8+9)
Flate           117ms ± 1%      118ms ± 1%  +1.10%  (p=0.000 n=10+9)
GoParser        146ms ± 2%      148ms ± 1%  +1.48%  (p=0.002 n=10+10)
Reflect         398ms ± 1%      404ms ± 1%  +1.51%  (p=0.000 n=10+9)
Tar             173ms ± 1%      175ms ± 1%  +1.17%  (p=0.000 n=10+10)
XML             208ms ± 1%      210ms ± 1%  +0.62%  (p=0.011 n=10+10)
[Geo mean]      369ms           373ms       +1.17%

(https://perf.golang.org/search?q=upload:20180101.2)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.22ms ± 1%  2.25ms ± 1%  +1.51%  (p=0.000 n=20+19)

(https://perf.golang.org/search?q=upload:20180101.3)

Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0
Reviewed-on: https://go-review.googlesource.com/85887
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:23 +00:00
Austin Clements
d6e8218581 runtime: make span map sparse
This splits the span map into separate chunks for every 64MB of the
heap. The span map chunks now live in the same indirect structure as
the bitmap.

Updates #10460.

This causes a slight improvement in compilebench and the x/benchmarks
garbage benchmark. I'm not sure why it improves performance.

name       old time/op     new time/op     delta
Template       185ms ± 1%      184ms ± 1%    ~            (p=0.315 n=9+10)
Unicode       86.9ms ± 1%     86.9ms ± 3%    ~            (p=0.356 n=9+10)
GoTypes        602ms ± 1%      599ms ± 0%  -0.59%         (p=0.002 n=9+10)
Compiler       2.89s ± 0%      2.87s ± 1%  -0.50%          (p=0.003 n=9+9)
SSA            7.25s ± 0%      7.29s ± 1%    ~            (p=0.400 n=9+10)
Flate          118ms ± 1%      118ms ± 2%    ~            (p=0.065 n=10+9)
GoParser       147ms ± 2%      147ms ± 1%    ~            (p=0.549 n=10+9)
Reflect        403ms ± 1%      401ms ± 1%  -0.47%         (p=0.035 n=9+10)
Tar            176ms ± 1%      175ms ± 1%  -0.59%         (p=0.013 n=10+9)
XML            211ms ± 1%      209ms ± 1%  -0.83%        (p=0.011 n=10+10)

(https://perf.golang.org/search?q=upload:20171231.1)

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.24ms ± 1%  2.23ms ± 1%  -0.36%  (p=0.001 n=20+19)

(https://perf.golang.org/search?q=upload:20171231.2)

Change-Id: I2563f8704ab9812434947faf293c5327f9b0d07a
Reviewed-on: https://go-review.googlesource.com/85885
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:20 +00:00
Austin Clements
0de5324d61 runtime: abstract remaining mheap.spans access
This abstracts the remaining direct accesses to mheap.spans into new
mheap.setSpan and mheap.setSpans methods.

For #10460.

Change-Id: Id1db8bc5e34a77a9221032aa2e62d05322707364
Reviewed-on: https://go-review.googlesource.com/85884
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:19 +00:00
Austin Clements
c0392d2e7f runtime: make the heap bitmap sparse
This splits the heap bitmap into separate chunks for every 64MB of the
heap and introduces an index mapping from virtual address to metadata.
It modifies the heapBits abstraction to use this two-level structure.
Finally, it modifies heapBitsSetType to unroll the bitmap into the
object itself and then copy it out if the bitmap would span
discontiguous bitmap chunks.

This is a step toward supporting general sparse heaps, which will
eliminate address space conflict failures as well as the limit on the
heap size.

It's also advantageous for 32-bit. 32-bit already supports
discontiguous heaps by always starting the arena at address 0.
However, as a result, with a contiguous bitmap, if the kernel chooses
a high address (near 2GB) for a heap mapping, the runtime is forced to
map up to 128MB of heap bitmap. Now the runtime can map sections of
the bitmap for just the parts of the address space used by the heap.

Updates #10460.

This slightly slows down the x/garbage and compilebench benchmarks.
However, I think the slowdown is acceptably small.

name        old time/op     new time/op     delta
Template        178ms ± 1%      180ms ± 1%  +0.78%    (p=0.029 n=10+10)
Unicode        85.7ms ± 2%     86.5ms ± 2%    ~       (p=0.089 n=10+10)
GoTypes         594ms ± 0%      599ms ± 1%  +0.70%    (p=0.000 n=9+9)
Compiler        2.86s ± 0%      2.87s ± 0%  +0.40%    (p=0.001 n=9+9)
SSA             7.23s ± 2%      7.29s ± 2%  +0.94%    (p=0.029 n=10+10)
Flate           116ms ± 1%      117ms ± 1%  +0.99%    (p=0.000 n=9+9)
GoParser        146ms ± 1%      146ms ± 0%    ~       (p=0.193 n=10+7)
Reflect         399ms ± 0%      403ms ± 1%  +0.89%    (p=0.001 n=10+10)
Tar             173ms ± 1%      174ms ± 1%  +0.91%    (p=0.013 n=10+9)
XML             208ms ± 1%      210ms ± 1%  +0.93%    (p=0.000 n=10+10)
[Geo mean]      368ms           371ms       +0.79%

name                       old time/op  new time/op  delta
Garbage/benchmem-MB=64-12  2.17ms ± 1%  2.21ms ± 1%  +2.15%  (p=0.000 n=20+20)

Change-Id: I037fd283221976f4f61249119d6b97b100bcbc66
Reviewed-on: https://go-review.googlesource.com/85883
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:18 +00:00
Austin Clements
29e9c4d4a4 runtime: lay out heap bitmap forward in memory
Currently the heap bitamp is laid in reverse order in memory relative
to the heap itself. This was originally done out of "excessive
cleverness" so that computing a bitmap pointer could load only the
arena_start field and so that heaps could be more contiguous by
growing the arena and the bitmap out from a common center point.

However, this appears to have no actual performance benefit, it
complicates nearly every use of the bitmap, and it makes already
confusing code more confusing. Furthermore, it's still possible to use
a single field (the new bitmap_delta) for the bitmap pointer
computation by employing slightly different excessive cleverness.

Hence, this CL puts the bitmap into forward order.

This is a (very) updated version of CL 9404.

Change-Id: I743587cc626c4ecd81e660658bad85b54584108c
Reviewed-on: https://go-review.googlesource.com/85881
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:16 +00:00
Austin Clements
4de468621a runtime: use spanOf* more widely
The logic in the spanOf* functions is open-coded in a lot of places
right now. Replace these with calls to the spanOf* functions.

Change-Id: I3cc996aceb9a529b60fea7ec6fef22008c012978
Reviewed-on: https://go-review.googlesource.com/85880
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:15 +00:00
Austin Clements
a90f9a00ca runtime: consolidate mheap.lookup* and spanOf*
I think we'd forgotten about the mheap.lookup APIs when we introduced
spanOf*, but, at any rate, the spanOf* functions are used far more
widely at this point, so this CL eliminates the mheap.lookup*
functions in favor of spanOf*.

Change-Id: I15facd0856e238bb75d990e838a092b5bef5bdfc
Reviewed-on: https://go-review.googlesource.com/85879
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:14 +00:00
Austin Clements
058bb7ea27 runtime: split object finding out of heapBitsForObject
heapBitsForObject does two things: it finds the base of the object and
it creates the heapBits for the base of the object. There are several
places where we just care about the base of the object. Furthermore,
greyobject only needs the heapBits in the checkmark path and can
easily compute them only when needed. Once we eliminate passing the
heap bits to grayobject, almost all uses of heapBitsForObject don't
need the heap bits.

Hence, this splits heapBitsForObject into findObject and
heapBitsForAddr (the latter already exists), removes the hbits
argument to grayobject, and replaces all heapBitsForObject calls with
calls to findObject.

In addition to making things cleaner overall, heapBitsForAddr is going
to get more expensive shortly, so it's important that we don't do it
needlessly.

Note that there's an interesting performance pitfall here. I had
originally moved findObject to mheap.go, since it made more sense
there. However, that leads to a ~2% slow down and a whopping 11%
increase in L1 icache misses on both the x/garbage and compilebench
benchmarks. This suggests we may want to be more principled about
this, but, for now, let's just leave findObject in mbitmap.go.

(I tried to make findObject small enough to inline by splitting out
the error case, but, sadly, wasn't quite able to get it under the
inlining budget.)

Change-Id: I7bcb92f383ade565d22a9f2494e4c66fd513fb10
Reviewed-on: https://go-review.googlesource.com/85878
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:13 +00:00
Austin Clements
41e6abdc61 runtime: replace mlookup and findObject with heapBitsForObject
These functions all serve essentially the same purpose. mlookup is
used in only one place and findObject in only three. Use
heapBitsForObject instead, which is the most optimized implementation.

(This may seem slightly silly because none of these uses care about
the heap bits, but we're about to split up the functionality of
heapBitsForObject anyway. At that point, findObject will rise from the
ashes.)

Change-Id: I906468c972be095dd23cf2404a7d4434e802f250
Reviewed-on: https://go-review.googlesource.com/85877
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2018-02-15 21:12:12 +00:00
Austin Clements
164e1b8477 runtime: eliminate remaining recordspan write barriers
recordspan has two remaining write barriers from writing to the
pointer to the backing store of h.allspans. However, h.allspans is
always backed by off-heap memory, so let the compiler know this.
Unfortunately, this isn't quite as clean as most go:notinheap uses
because we can't directly name the backing store of a slice, but we
can get it done with some judicious casting.

For #22460.

Change-Id: I296f92fa41cf2cb6ae572b35749af23967533877
Reviewed-on: https://go-review.googlesource.com/73414
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-10-29 20:22:00 +00:00
Keith Randall
97d17fcfd1 runtime: force the type of specialfinalizer into DWARF
The core dump reader wants to know the layout of this type.
No variable has this type, so it wasn't previously dumped
to DWARF output.

Change-Id: I982040b81bff202976743edc7fe53247533a9d81
Reviewed-on: https://go-review.googlesource.com/68312
Reviewed-by: Austin Clements <austin@google.com>
2017-10-05 20:03:42 +00:00
Kunpei Sakai
5a986eca86 all: fix article typos
a -> an

Change-Id: I7362bdc199e83073a712be657f5d9ba16df3077e
Reviewed-on: https://go-review.googlesource.com/63850
Reviewed-by: Rob Pike <r@golang.org>
2017-09-15 02:39:16 +00:00
Daniel Martí
59413d34c9 all: unindent some big chunks of code
Found with mvdan.cc/unindent. Prioritized the ones with the biggest wins
for now.

Change-Id: I2b032e45cdd559fc9ed5b1ee4c4de42c4c92e07b
Reviewed-on: https://go-review.googlesource.com/56470
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-08-18 06:59:48 +00:00
Austin Clements
53f2d53450 runtime: document concurrency of mheap.spans
We use lock-free reads from mheap.spans, but the safety of these is
somewhat subtle. Document this.

Change-Id: I928c893232176135308e38bed788d5f84ff11533
Reviewed-on: https://go-review.googlesource.com/54310
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-08-09 16:06:23 +00:00
Austin Clements
623e2c4603 runtime: map bitmap and spans during heap initialization
We lazily map the bitmap and spans areas as the heap grows. However,
right now we're very slightly too lazy. Specifically, the following
can happen on 32-bit:

1. mallocinit fails to allocate any heap arena, so
   arena_used == arena_alloc == arena_end == bitmap.

2. There's less than 256MB between the end of the bitmap mapping and
   the next mapping.

3. On the first allocation, mheap.sysAlloc sees that there's not
   enough room in [arena_alloc, arena_end) because there's no room at
   all. It gets a 256MB mapping from somewhere *lower* in the address
   space than arena_used and sets arena_alloc and arena_end to this
   hole.

4. Since the new arena_alloc is lower than arena_used, mheap.sysAlloc
   doesn't bother to call mheap.setArenaUsed, so we still don't have a
   bitmap mapping or a spans array mapping.

5. mheap.grow, which called mheap.sysAlloc, attempts to fill in the
   spans array and crashes.

Fix this by mapping the metadata regions for the initial arena_used
when the heap is initialized, rather than trying to wait for an
allocation. This maintains the intended invariant that the structures
are always mapped for [arena_start, arena_used).

Fixes #21044.

Change-Id: I4422375a6e234b9f979d22135fc63ae3395946b0
Reviewed-on: https://go-review.googlesource.com/51714
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-07-31 16:52:36 +00:00
Austin Clements
a89e6be5e4 runtime: clean up mheap.allocLarge
mheap.allocLarge just calls bestFitTreap and is the only caller of
bestFitTreap. Flatten these into a single function. Also fix their
comments: allocLarge claims to return exactly npages but can in fact
return a larger span, and h.freelarge is not in fact indexed by span
start address.

Change-Id: Ia20112bdc46643a501ea82ea77c58596bc96f125
Reviewed-on: https://go-review.googlesource.com/47315
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-07-03 14:08:01 +00:00
Austin Clements
13ae3b3a8d runtime: accept non-monotonic arena allocation on 32-bit
Currently, the heap arena allocator allocates monotonically increasing
addresses. This is fine on 64-bit where we stake out a giant block of
the address space for ourselves and start at the beginning of it, but
on 32-bit the arena starts at address 0 but we start allocating from
wherever the OS feels like giving us memory. We can generally hint the
OS to start us at a low address, but this doesn't always work.

As a result, on 32-bit, if the OS gives us an arena block that's lower
than the current block we're allocating from, we simply say "thanks
but no thanks", return the whole (256MB!) block of memory, and then
take a fallback path that mmaps just the amount of memory we need
(which may be as little as 8K).

We have to do this because mheap_.arena_used is *both* the highest
used address in the arena and the next address we allocate from.

Fix all of this by separating the second role of arena_used out into a
new field called arena_alloc. This lets us accept any arena block the
OS gives us. This also slightly changes the invariants around
arena_end. Previously, we ensured arena_used <= arena_end, but this
was related to arena_used's second role, so the new invariant is
arena_alloc <= arena_end. As a result, we no longer necessarily update
arena_end when we're updating arena_used.

Fixes #20259 properly. (Unlike the original fix, this one should not
be cherry-picked to Go 1.8.)

This is reasonably low risk. I verified several key properties of the
32-bit code path with both 4K and 64K physical pages using a symbolic
model and the change does not materially affect 64-bit (arena_used ==
arena_alloc on 64-bit). The only oddity is that we no longer call
setArenaUsed with racemap == false to indicate that we're creating a
hole in the address space, but this only happened in a 32-bit-only
code path, and the race detector require 64-bit, so this never
mattered anyway.

Change-Id: Ib1334007933e615166bac4159bf357ae06ec6a25
Reviewed-on: https://go-review.googlesource.com/44010
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-05-25 14:26:19 +00:00
Austin Clements
295d160e01 runtime: make _TinySizeClass an int8 to prevent use as spanClass
Currently _TinySizeClass is untyped, which means it can accidentally
be used as a spanClass (not that I would know this from experience or
anything). Make it an int8 to avoid this mix up.

This is a cherry-pick of dev.garbage commit 81b74bf9c5.

Change-Id: I1e69eccee436ea5aa45e9a9828a013e369e03f1a
Reviewed-on: https://go-review.googlesource.com/41254
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:39 +00:00
Austin Clements
1a033b1a70 runtime: separate spans of noscan objects
Currently, we mix objects with pointers and objects without pointers
("noscan" objects) together in memory. As a result, for every object
we grey, we have to check that object's heap bits to find out if it's
noscan, which adds to the per-object cost of GC. This also hurts the
TLB footprint of the garbage collector because it decreases the
density of scannable objects at the page level.

This commit improves the situation by using separate spans for noscan
objects. This will allow a much simpler noscan check (in a follow up
CL), eliminate the need to clear the bitmap of noscan objects (in a
follow up CL), and improves TLB footprint by increasing the density of
scannable objects.

This is also a step toward eliminating dead bits, since the current
noscan check depends on checking the dead bit of the first word.

This has no effect on the heap size of the garbage benchmark.

We'll measure the performance change of this after the follow-up
optimizations.

This is a cherry-pick from dev.garbage commit d491e550c3. The only
non-trivial merge conflict was in updatememstats in mstats.go, where
we now have to separate the per-spanclass stats from the per-sizeclass
stats.

Change-Id: I13bdc4869538ece5649a8d2a41c6605371618e40
Reviewed-on: https://go-review.googlesource.com/41251
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-28 22:50:31 +00:00
Aliaksandr Valialkin
259d60995d runtime: align mcentral by cache line size
This may improve perormance during concurrent access
to mheap.central array from multiple CPU cores.

Change-Id: I8f48dd2e72aa62e9c32de07ae60fe552d8642782
Reviewed-on: https://go-review.googlesource.com/41550
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-26 03:48:23 +00:00
Austin Clements
1c4f3c5ea0 runtime: make gcSetTriggerRatio work at any time
This changes gcSetTriggerRatio so it can be called even during
concurrent mark or sweep. In this case, it will adjust the pacing of
the current phase, accounting for progress that has already been made.

To make this work for concurrent sweep, this introduces a "basis" for
the pagesSwept count, much like the basis we just introduced for
heap_live. This lets gcSetTriggerRatio shift the basis to the current
heap_live and pagesSwept and compute a slope from there to completion.
This avoids creating a discontinuity where, if the ratio has
increased, there has to be a flurry of sweep activity to catch up.
Instead, this creates a continuous, piece-wise linear function as
adjustments are made.

For #19076.

Change-Id: Ibcd76aeeb81ff4814b00be7cbd3530b73bbdbba9
Reviewed-on: https://go-review.googlesource.com/39833
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:59 +00:00
Austin Clements
a5eb3dceaf runtime: drive proportional sweep directly off heap_live
Currently, proportional sweep maintains its own count of how many
bytes have been allocated since the beginning of the sweep cycle so it
can compute how many pages need to be swept for a given allocation.

However, this requires a somewhat complex reimbursement scheme since
proportional sweep must be done before a span is allocated, but we
don't know how many bytes to charge until we've allocated a span. This
means that the allocated byte count used by proportional sweep can go
up and down, which has led to underflow bugs in the past (#18043) and
is going to interfere with adjusting sweep pacing on-the-fly (for #19076).

This approach also means we're maintaining a statistic that is very
closely related to heap_live, but has a different 0 value. This is
particularly confusing because the sweep ratio is computed based on
heap_live, so you have to understand that these two statistics are
very closely related.

Replace all of this and compute the sweep debt directly from the
current value of heap_live. To make this work, we simply save the
value of heap_live when the sweep ratio is computed to use as a
"basis" for later computing the sweep debt.

This eliminates the need for reimbursement as well as the code for
maintaining the sweeper's version of the live heap size.

For #19076.

Coincidentally fixes #18043, since this eliminates sweep reimbursement
entirely.

Change-Id: I1f931ddd6e90c901a3972c7506874c899251dc2a
Reviewed-on: https://go-review.googlesource.com/39832
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-21 17:41:57 +00:00
Austin Clements
79c56addb6 runtime: make sweep trace events encompass entire sweep loop
Currently, each individual span sweep emits a span to the trace. But
sweeps are generally done in loops until some condition is satisfied,
so this tracing is lower-level than anyone really wants any hides the
fact that no other work is being accomplished between adjacent sweep
events. This is also high overhead: enabling tracing significantly
impacts sweep latency.

Replace this with instead tracing around the sweep loops used for
allocation. This is slightly tricky because sweep loops don't
generally know if any sweeping will happen in them. Hence, we make the
tracing lazy by recording in the P that we would like to start tracing
the sweep *if* one happens, and then only closing the sweep event if
we started it.

This does mean we don't get tracing on every sweep path, which are
legion. However, we get much more informative tracing on the paths
that block allocation, which are the paths that matter.

Change-Id: I73e14fbb250acb0c9d92e3648bddaa5e7d7e271c
Reviewed-on: https://go-review.googlesource.com/40810
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Hyang-Ah Hana Kim <hyangah@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-04-19 18:31:11 +00:00
Austin Clements
051809e352 runtime: free workbufs during sweeping
This extends the sweeper to free workbufs back to the heap between GC
cycles, allowing this memory to be reused for GC'd allocations or
eventually returned to the OS.

This helps for applications that have high peak heap usage relative to
their regular heap usage (for example, a high-memory initialization
phase). Workbuf memory is roughly proportional to heap size and since
we currently never free workbufs, it's proportional to *peak* heap
size. By freeing workbufs, we can release and reuse this memory for
other purposes when the heap shrinks.

This is somewhat complicated because this costs ~1–2 µs per workbuf
span, so for large heaps it's too expensive to just do synchronously
after mark termination between starting the world and dropping the
worldsema. Hence, we do it asynchronously in the sweeper. This adds a
list of "free" workbuf spans that can be returned to the heap. GC
moves all workbuf spans to this list after mark termination and the
background sweeper drains this list back to the heap. If the sweeper
doesn't finish, that's fine, since getempty can directly reuse any
remaining spans to allocate more workbufs.

Performance impact is negligible. On the x/benchmarks, this reduces
GC-bytes-from-system by 6–11%.

Fixes #19325.

Change-Id: Icb92da2196f0c39ee984faf92d52f29fd9ded7a8
Reviewed-on: https://go-review.googlesource.com/38582
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:47 +00:00
Austin Clements
42c1214762 runtime: eliminate write barriers from alloc/mark bitmaps
This introduces a new type, *gcBits, to use for alloc/mark bitmap
allocations instead of *uint8. This type is marked go:notinheap, so
uses of it correctly eliminate write barriers. Since we now have a
type, this also extracts some common operations to methods both for
convenience and to avoid (*uint8) casts at most use sites.

For #19325.

Change-Id: Id51f734fb2e96b8b7715caa348c8dcd4aef0696a
Reviewed-on: https://go-review.googlesource.com/38580
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:42 +00:00
Austin Clements
9d1b2f888e runtime: rename gcBits -> gcBitsArena
This clarifies that the gcBits type is actually an arena of gcBits and
will let us introduce a new gcBits type representing a single
mark/alloc bitmap allocated from the arena.

For #19325.

Change-Id: Idedf76d202d9174a17c61bcca9d5539e042e2445
Reviewed-on: https://go-review.googlesource.com/38579
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:40 +00:00
Austin Clements
dc0f0ab70f runtime: don't count manually-managed spans from heap_{inuse,sys}
Currently, manually-managed spans are included in memstats.heap_inuse
and memstats.heap_sys, but when we export these stats to the user, we
subtract out how much has been allocated for stack spans from both.
This works for now because stacks are the only manually-managed spans
we have.

However, we're about to use manually-managed spans for more things
that don't necessarily have obvious stats we can use to adjust the
user-presented numbers. Prepare for this by changing the accounting so
manually-managed spans don't count toward heap_inuse or heap_sys. This
makes these fields align with the fields presented to the user and
means we don't have to track more statistics just so we can adjust
these statistics.

For #19325.

Change-Id: I5cb35527fd65587ff23339276ba2c3969e2ad98f
Reviewed-on: https://go-review.googlesource.com/38577
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:38 +00:00
Austin Clements
407c56ae9f runtime: generalize {alloc,free}Stack to {alloc,free}Manual
We're going to start using manually-managed spans for GC workbufs, so
rename the allocate/free methods and pass in a pointer to the stats to
use instead of using the stack stats directly.

For #19325.

Change-Id: I37df0147ae5a8e1f3cb37d59c8e57a1fcc6f2980
Reviewed-on: https://go-review.googlesource.com/38576
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:35 +00:00
Austin Clements
ab9db51e1c runtime: rename mspan.stackfreelist -> manualFreeList
We're going to use this free list for other types of manually-managed
memory in the heap.

For #19325.

Change-Id: Ib7e682295133eabfddf3a84f44db43d937bfdd9c
Reviewed-on: https://go-review.googlesource.com/38575
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:33 +00:00
Austin Clements
8fbaa4f70b runtime: rename _MSpanStack -> _MSpanManual
We're about to generalize _MSpanStack to be used for other forms of
in-heap manual memory management in the runtime. This is an automated
rename of _MSpanStack to _MSpanManual plus some comment fix-ups.

For #19325.

Change-Id: I1e20a57bb3b87a0d324382f92a3e294ffc767395
Reviewed-on: https://go-review.googlesource.com/38574
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-13 18:20:30 +00:00
Austin Clements
6c6f455f88 runtime: consolidate changes to arena_used
Changing mheap_.arena_used requires several steps that are currently
repeated multiple times in mheap_.sysAlloc. Consolidate these into a
single function.

In the future, this will also make it easier to add other auxiliary VM
structures.

Change-Id: Ie68837d2612e1f4ba4904acb1b6b832b15431d56
Reviewed-on: https://go-review.googlesource.com/40151
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2017-04-11 01:35:47 +00:00
Daniel Martí
4e7724b2db runtime: remove unused parameter from bestFitTreap
This code was added recently, and it doesn't seem like the parameter
will be useful in the near future.

Change-Id: I5d64dadb6820c159b588262ab90df2461b5fdf04
Reviewed-on: https://go-review.googlesource.com/39692
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-06 17:20:43 +00:00
Austin Clements
9741f0275c runtime: initialize more fields of stack spans
Stack spans don't internally use many of the fields of the mspan,
which means things like the size class and element size get left over
from whatever last used the mspan. This can lead to confusing crashes
and debugging.

Zero these fields or initialize them to something reasonable. This
also lets us simplify some code that currently has to distinguish
between heap and stack spans.

Change-Id: I9bd114e76c147bb32de497045b932f8bf1988bbf
Reviewed-on: https://go-review.googlesource.com/38573
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-04-05 19:17:41 +00:00
Austin Clements
44ed88a5a7 runtime: track the number of active sweepone calls
sweepone returns ^uintptr(0) when there are no more spans to *start*
sweeping, but there may be spans being swept concurrently at the time
and there's currently no efficient way to tell when the sweeper is
done sweeping all the spans.

We'll need this for concurrent runtime.GC(), so add a count of the
number of active sweepone calls to make it possible to block until
sweeping is truly done.

This is also useful for more accurately printing the gcpacertrace,
since that should be printed after all of the sweeping stats are in
(currently we can print it slightly too early).

For #18216.

Change-Id: I06e6240c9e7b40aca6fd7b788bb6962107c10a0f
Reviewed-on: https://go-review.googlesource.com/37716
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:18 +00:00
Austin Clements
786eb5b754 runtime: make debug.FreeOSMemory call runtime.GC()
Currently freeOSMemory calls gcStart directly, but we really just want
it to behave like runtime.GC() and then perform a scavenge, so make it
call runtime.GC() rather than gcStart.

For #18216.

Change-Id: I548ec007afc788e87d383532a443a10d92105937
Reviewed-on: https://go-review.googlesource.com/37518
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:10 +00:00
Austin Clements
29be3f1999 runtime: generalize GC trigger
Currently the GC triggering condition is an awkward combination of the
gcMode (whether or not it's gcBackgroundMode) and a boolean
"forceTrigger" flag.

Replace this with a new gcTrigger type that represents the range of
transition predicates we need. This has several advantages:

1. We can remove the awkward logic that affects the trigger behavior
   based on the gcMode. Now gcMode purely controls whether to run a
   STW GC or not and the gcTrigger controls whether this is a forced
   GC that cannot be consolidated with other GC cycles.

2. We can lift the time-based triggering logic in sysmon to just
   another type of GC trigger and move the logic to the trigger test.

3. This sets us up to have a cycle count-based trigger, which we'll
   use to make runtime.GC trigger concurrent GC with the desired
   consolidation properties.

For #18216.

Change-Id: If9cd49349579a548800f5022ae47b8128004bbfc
Reviewed-on: https://go-review.googlesource.com/37516
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-31 01:15:06 +00:00
Rick Hudson
6e9ec14186 runtime: redo insert/remove of large spans
Currently for spans with up to 1 MBytes (128 pages) we
maintain an array indexed by the number of pages in the
span. This is efficient both in terms of space as well
as time to insert or remove a span of a particular size.

Unfortunately for spans larger than 1 MByte we currently
place them on a separate linked list. This results in
O(n) behavior. Now that we are seeing heaps approaching
100 GBytes n is large enough to be noticed in real programs.

This change replaces the linked list now used with a balanced
binary tree structure called a treap. A treap is a
probabilistically balanced tree offering O(logN) behavior for
inserting and removing spans.

To verify that this approach will work we start with noting
that only spans with sizes > 1MByte will be put into the treap.
This means that to support 1 TByte a treap will need at most
1 million nodes and can ideally be held in a treap with a
depth of 20. Experiments with adding and removing randomly
sized spans from the treap seem to result in treaps with
depths of about twice the ideal or 40. A petabyte would
require a tree of only twice again that depth again so this
algorithm should last well into the future.

Fixes #19393

Go1 benchmarks indicate this is basically an overall wash.
Tue Mar 28 21:29:21 EDT 2017
name                     old time/op    new time/op    delta
BinaryTree17-4              2.42s ± 1%     2.42s ± 1%    ~     (p=0.980 n=21+21)
Fannkuch11-4                3.00s ± 1%     3.18s ± 4%  +6.10%  (p=0.000 n=22+24)
FmtFprintfEmpty-4          40.5ns ± 1%    40.3ns ± 3%    ~     (p=0.692 n=22+25)
FmtFprintfString-4         65.9ns ± 3%    64.6ns ± 1%  -1.98%  (p=0.000 n=24+23)
FmtFprintfInt-4            69.6ns ± 1%    68.0ns ± 7%  -2.30%  (p=0.001 n=21+22)
FmtFprintfIntInt-4          102ns ± 2%      99ns ± 1%  -3.07%  (p=0.000 n=23+23)
FmtFprintfPrefixedInt-4     126ns ± 0%     125ns ± 0%  -0.79%  (p=0.000 n=19+17)
FmtFprintfFloat-4           206ns ± 2%     205ns ± 1%    ~     (p=0.671 n=23+21)
FmtManyArgs-4               441ns ± 1%     445ns ± 1%  +0.88%  (p=0.000 n=22+23)
GobDecode-4                5.73ms ± 1%    5.86ms ± 1%  +2.37%  (p=0.000 n=23+22)
GobEncode-4                4.51ms ± 1%    4.89ms ± 1%  +8.32%  (p=0.000 n=22+22)
Gzip-4                      197ms ± 0%     202ms ± 1%  +2.75%  (p=0.000 n=23+24)
Gunzip-4                   32.9ms ± 8%    32.7ms ± 2%    ~     (p=0.466 n=23+24)
HTTPClientServer-4         57.3µs ± 1%    56.7µs ± 1%  -0.94%  (p=0.000 n=21+22)
JSONEncode-4               13.8ms ± 1%    13.9ms ± 2%  +1.14%  (p=0.000 n=22+23)
JSONDecode-4               47.4ms ± 1%    48.1ms ± 1%  +1.49%  (p=0.000 n=23+23)
Mandelbrot200-4            3.92ms ± 0%    3.92ms ± 1%  +0.21%  (p=0.000 n=22+22)
GoParse-4                  2.89ms ± 1%    2.87ms ± 1%  -0.68%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4      73.6ns ± 1%    72.0ns ± 2%  -2.15%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4       173ns ± 1%     173ns ± 1%    ~     (p=0.847 n=22+24)
RegexpMatchEasy1_32-4      71.9ns ± 1%    69.8ns ± 1%  -2.99%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4       314ns ± 1%     308ns ± 1%  -1.91%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4      106ns ± 0%     105ns ± 1%  -0.58%  (p=0.000 n=19+21)
RegexpMatchMedium_1K-4     34.3µs ± 1%    34.3µs ± 1%    ~     (p=0.871 n=23+22)
RegexpMatchHard_32-4       1.67µs ± 1%    1.67µs ± 7%    ~     (p=0.224 n=22+23)
RegexpMatchHard_1K-4       51.5µs ± 1%    50.4µs ± 1%  -1.99%  (p=0.000 n=22+23)
Revcomp-4                   383ms ± 1%     415ms ± 0%  +8.51%  (p=0.000 n=22+22)
Template-4                 51.5ms ± 1%    51.5ms ± 1%    ~     (p=0.555 n=20+23)
TimeParse-4                 279ns ± 2%     277ns ± 1%  -0.95%  (p=0.000 n=24+22)
TimeFormat-4                294ns ± 1%     296ns ± 1%  +0.58%  (p=0.003 n=24+23)
[Geo mean]                 43.7µs         43.8µs       +0.32%

name                     old speed      new speed      delta
GobDecode-4               134MB/s ± 1%   131MB/s ± 1%  -2.32%  (p=0.000 n=23+22)
GobEncode-4               170MB/s ± 1%   157MB/s ± 1%  -7.68%  (p=0.000 n=22+22)
Gzip-4                   98.7MB/s ± 0%  96.1MB/s ± 1%  -2.68%  (p=0.000 n=23+24)
Gunzip-4                  590MB/s ± 7%   593MB/s ± 2%    ~     (p=0.466 n=23+24)
JSONEncode-4              141MB/s ± 1%   139MB/s ± 2%  -1.13%  (p=0.000 n=22+23)
JSONDecode-4             40.9MB/s ± 1%  40.3MB/s ± 0%  -1.47%  (p=0.000 n=23+23)
GoParse-4                20.1MB/s ± 1%  20.2MB/s ± 1%  +0.69%  (p=0.000 n=21+22)
RegexpMatchEasy0_32-4     435MB/s ± 1%   444MB/s ± 2%  +2.21%  (p=0.000 n=21+22)
RegexpMatchEasy0_1K-4    5.89GB/s ± 1%  5.89GB/s ± 1%    ~     (p=0.439 n=22+24)
RegexpMatchEasy1_32-4     445MB/s ± 1%   459MB/s ± 1%  +3.06%  (p=0.000 n=23+20)
RegexpMatchEasy1_1K-4    3.26GB/s ± 1%  3.32GB/s ± 1%  +1.97%  (p=0.000 n=22+23)
RegexpMatchMedium_32-4   9.40MB/s ± 1%  9.44MB/s ± 1%  +0.43%  (p=0.000 n=23+21)
RegexpMatchMedium_1K-4   29.8MB/s ± 1%  29.8MB/s ± 1%    ~     (p=0.826 n=23+22)
RegexpMatchHard_32-4     19.1MB/s ± 1%  19.1MB/s ± 7%    ~     (p=0.233 n=22+23)
RegexpMatchHard_1K-4     19.9MB/s ± 1%  20.3MB/s ± 1%  +2.03%  (p=0.000 n=22+23)
Revcomp-4                 664MB/s ± 1%   612MB/s ± 0%  -7.85%  (p=0.000 n=22+22)
Template-4               37.6MB/s ± 1%  37.7MB/s ± 1%    ~     (p=0.558 n=20+23)
[Geo mean]                134MB/s        133MB/s       -0.76%
Tue Mar 28 22:16:54 EDT 2017

Change-Id: I4a4f5c2b53d3fb85ef76c98522d3ed5cf8ae5b7e
Reviewed-on: https://go-review.googlesource.com/38732
Reviewed-by: Russ Cox <rsc@golang.org>
2017-03-29 14:18:24 +00:00
Austin Clements
df6025bc0d runtime: disallow malloc or panic in scavenge
Mallocs and panics in the scavenge path are particularly nasty because
they're likely to silently self-deadlock on the mheap.lock. Avoid
sinking lots of time into debugging these issues in the future by
turning these into immediate throws.

Change-Id: Ib36fdda33bc90b21c32432b03561630c1f3c69bc
Reviewed-on: https://go-review.googlesource.com/38293
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-19 22:42:28 +00:00
Austin Clements
2ef88f7fcf runtime: lock-free fast path for mark bits allocation
Currently we acquire a global lock for every newMarkBits call. This is
unfortunate since every span sweep operation calls newMarkBits.

However, most allocations are simply linear allocations from the
current arena. Take advantage of this to add a lock-free fast path for
allocating from the current arena. With this change, the global lock
only protects the lists of arenas, not the free offset in the current
arena.

Change-Id: I6cf6182af8492c8bfc21276114c77275fe3d7826
Reviewed-on: https://go-review.googlesource.com/34595
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-06 18:40:26 +00:00
Austin Clements
6c4a8d195b runtime: don't hold global gcBitsArenas lock over allocation
Currently, newArena holds the gcBitsArenas lock across allocating
memory from the OS for a new gcBits arena. This is a global lock and
allocating physical memory can be expensive, so this has the potential
to cause high lock contention, especially since every single span
sweep operation calls newArena (via newMarkBits).

Improve the situation by temporarily dropping the lock across
allocation. This means the caller now has to revalidate its
assumptions after the lock is dropped, so this also factors out that
code path and reinvokes it after the lock is acquired.

Change-Id: I1113200a954ab4aad16b5071512583cfac744bdc
Reviewed-on: https://go-review.googlesource.com/34594
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-06 18:40:23 +00:00
Eitan Adler
789c5255a4 all: remove the the duplicate words
Change-Id: I6343c162e27e2e492547c96f1fc504909b1c03c0
Reviewed-on: https://go-review.googlesource.com/37793
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
Run-TryBot: Ian Lance Taylor <iant@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-06 04:39:12 +00:00
Austin Clements
4a7cf960c3 runtime: make ReadMemStats STW for < 25µs
Currently ReadMemStats stops the world for ~1.7 ms/GB of heap because
it collects statistics from every single span. For large heaps, this
can be quite costly. This is particularly unfortunate because many
production infrastructures call this function regularly to collect and
report statistics.

Fix this by tracking the necessary cumulative statistics in the
mcaches. ReadMemStats still has to stop the world to stabilize these
statistics, but there are only O(GOMAXPROCS) mcaches to collect
statistics from, so this pause is only 25µs even at GOMAXPROCS=100.

Fixes #13613.

Change-Id: I3c0a4e14833f4760dab675efc1916e73b4c0032a
Reviewed-on: https://go-review.googlesource.com/34937
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-04 02:56:37 +00:00
Austin Clements
77f64c50db runtime: clarify access to mheap_.busy
There are two accesses to mheap_.busy that are guarded by checks
against len(mheap_.free). This works because both lists are (and must
be) the same length, but it makes the code less clear. Change these to
use len(mheap_.busy) so the access more clearly parallels the check.

Fixes #18944.

Change-Id: I9bacbd3663988df351ed4396ae9018bc71018311
Reviewed-on: https://go-review.googlesource.com/36354
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Rick Hudson <rlh@golang.org>
2017-03-03 17:02:18 +00:00