Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Austin Clements	991a85c889	runtime: make mSpanList more go:notinheap-friendly Currently mspan links to its previous mspan using a *mspan field that points to the previous span's next field. This simplifies some of the list manipulation code, but is going to make it very hard to convince the compiler that mspan list manipulations don't need write barriers. Fix this by using a more traditional ("boring") linked list that uses a simple mspan pointer to the previous mspan. This complicates some of the list manipulation slightly, but it will let us eliminate all write barriers from the mspan list manipulation code by marking mspan go:notinheap. Change-Id: I0d0b212db5f20002435d2a0ed2efc8aa0364b905 Reviewed-on: https://go-review.googlesource.com/30940 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-15 17:58:17 +00:00
Austin Clements	38f1df66ff	runtime: make gcDumpObject useful on stack frames gcDumpObject is often used on a stack pointer (for example, when checkmark finds an unmarked object on the stack), but since stack spans don't have an elemsize, it doesn't print any of the memory from the frame. Make it at least slightly more useful by printing everything between obj and obj+off (inclusive). While we're here, also print out the span state. Change-Id: I51be064ea8791b4a365865bfdc7afa7b5aaecfbd Reviewed-on: https://go-review.googlesource.com/30142 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-03 21:59:54 +00:00
Austin Clements	6879dbde4e	runtime: introduce a type for span states Currently span states are untyped constants and the field is just a uint8. Make this more type-safe by introducing a type for the span state. Change-Id: I369bf59fe6e8234475f4921611424fceb7d0a6de Reviewed-on: https://go-review.googlesource.com/30141 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-10-03 21:59:45 +00:00
Austin Clements	6dda7b2f5f	runtime: don't hard-code physical page size Now that the runtime fetches the true physical page size from the OS, make the physical page size used by heap growth a variable instead of a constant. This isn't used in any performance-critical paths, so it shouldn't be an issue. sys.PhysPageSize is also renamed to sys.DefaultPhysPageSize to make it clear that it's not necessarily the true page size. There are no uses of this constant any more, but we'll keep it around for now. Updates #12480 and #10180. Change-Id: I6c23b9df860db309c38c8287a703c53817754f03 Reviewed-on: https://go-review.googlesource.com/25022 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>	2016-09-06 21:05:53 +00:00
Austin Clements	3de7dbb191	runtime: fix check for vacuous page boundary rounding again The previous fix for this, commit `336dad2a`, had everything right in the commit message, but reversed the test in the code. Fix the test in the code. This reversal effectively disabled the scavenger on large page systems except in the rare cases where this code was originally wrong, which is why it didn't obviously show up in testing. Fixes #16644. Again. :( Change-Id: I27cce4aea13de217197db4b628f17860f27ce83e Reviewed-on: https://go-review.googlesource.com/27402 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2016-08-19 20:16:43 +00:00
Austin Clements	336dad2a07	runtime: fix check for vacuous page boundary rounding sysUnused (e.g., madvise MADV_FREE) is only sensible to call on physical page boundaries, so scavengelist rounds in the bounds of the region being released to the nearest physical page boundaries. However, if the region is smaller than a physical page and neither the start nor end fall on a boundary, then rounding the start up to a page boundary and the end down to a page boundary will result in end < start. Currently, we only give up on the region if start == end, so if we encounter end < start, we'll call madvise with a negative length and the madvise will fail. Issue #16644 gives a concrete example of this: start = 0x1285ac000 end = 0x1285ae000 (1 8K page) This leads to the rounded values start = 0x1285b0000 end = 0x1285a0000 which leads to len = -65536. Fix this by giving up on the region if end <= start, not just if end == start. Fixes #16644. Change-Id: I8300db492dbadc82ac1ad878318b36bcb7c39524 Reviewed-on: https://go-review.googlesource.com/27230 Reviewed-by: Keith Randall <khr@golang.org>	2016-08-17 14:04:16 +00:00
Austin Clements	f407ca9288	runtime: support smaller physical pages than PhysPageSize Most operations need an upper bound on the physical page size, which is what sys.PhysPageSize is for (this is checked at runtime init on Linux). However, a few operations need a lower bound on the physical page size. Introduce a "minPhysPageSize" constant to act as this lower bound and use it where it makes sense: 1) In addrspace_free, we have to query each page in the given range. Currently we increment by the upper bound on the physical page size, which means we may skip over pages if the true size is smaller. Worse, we currently pass a result buffer that only has enough room for one page. If there are actually multiple pages in the range passed to mincore, the kernel will overflow this buffer. Fix these problems by incrementing by the lower-bound on the physical page size and by passing "1" for the length, which the kernel will round up to the true physical page size. 2) In the write barrier, the bad pointer check tests for pointers to the first physical page, which are presumably small integers masquerading as pointers. However, if physical pages are smaller than we think, we may have legitimate pointers below sys.PhysPageSize. Hence, use minPhysPageSize for this test since pointers should never fall below that. In particular, this applies to ARM64 and MIPS. The runtime is configured to use 64kB pages on ARM64, but by default Linux uses 4kB pages. Similarly, the runtime assumes 16kB pages on MIPS, but both 4kB and 16kB kernel configurations are common. This also applies to ARM on systems where the runtime is recompiled to deal with a larger page size. It is also a step toward making the runtime use only a dynamically-queried page size. Change-Id: I1fdfd18f6e7cbca170cc100354b9faa22fde8a69 Reviewed-on: https://go-review.googlesource.com/25020 Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Run-TryBot: Austin Clements <austin@google.com>	2016-07-20 18:28:43 +00:00
Austin Clements	64770f642f	runtime: use conventional shift style for gcBitsChunkBytes The convention for writing something like "64 kB" is 64<<10, since this is easier to read than 1<<16. Update gcBitsChunkBytes to follow this convention. Change-Id: I5b5a3f726dcf482051ba5b1814db247ff3b8bb2f Reviewed-on: https://go-review.googlesource.com/23132 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-05-16 18:28:38 +00:00
Elias Naur	e6ec82067a	runtime: use entire address space on 32 bit In issue #13992, Russ mentioned that the heap bitmap footprint was halved but that the bitmap size calculation hadn't been updated. This presents the opportunity to either halve the bitmap size or double the addressable virtual space. This CL doubles the addressable virtual space. On 32 bit this can be tweaked further to allow the bitmap to cover the entire 4GB virtual address space, removing a failure mode if the kernel hands out memory with a too low address. First, fix the calculation and double _MaxArena32 to cover 4GB virtual memory space with the same bitmap size (256 MB). Then, allow the fallback mode for the initial memory reservation on 32 bit (or 64 bit with too little available virtual memory) to not include space for the arena. mheap.sysAlloc will automatically reserve additional space when the existing arena is full. Finally, set arena_start to 0 in 32 bit mode, so that any address is acceptable for subsequent (additional) reservations. Before, the bitmap was always located just before arena_start, so fix the two places relying on that assumption: Point the otherwise unused mheap.bitmap to one byte after the end of the bitmap, and use it for bitmap addressing instead of arena_start. With arena_start set to 0 on 32 bit, the cgoInRange check is no longer a sufficient check for Go pointers. Introduce and call inHeapOrStack to check whether a pointer is to the Go heap or stack. While we're here, remove sysReserveHigh which seems to be unused. Fixes #13992 Change-Id: I592b513148a50b9d3967b5c5d94b86b3ec39acc2 Reviewed-on: https://go-review.googlesource.com/20471 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-05-07 03:04:39 +00:00
Austin Clements	3e2462387f	[dev.garbage] runtime: eliminate mspan.start This converts all remaining uses of mspan.start to instead use mspan.base(). In many cases, this actually reduces the complexity of the code. Change-Id: If113840e00d3345a6cf979637f6a152e6344aee7 Reviewed-on: https://go-review.googlesource.com/22590 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 03:53:17 +00:00
Austin Clements	b7adc41fba	[dev.garbage] runtime: use s.base() everywhere it makes sense Currently we have lots of (s.start << _PageShift) and variants. We now have an s.base() function that returns this. It's faster and more readable, so use it. Change-Id: I888060a9dae15ea75ca8cc1c2b31c905e71b452b Reviewed-on: https://go-review.googlesource.com/22559 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2016-04-29 03:53:14 +00:00
Rick Hudson	2fb75ea6c6	[dev.garbage] runtime: use sys.Ctz64 intrinsic Our compilers now provides instrinsics including sys.Ctz64 that support CTZ (count trailing zero) instructions. This CL replaces the Go versions of CTZ with the compiler intrinsic. Count trailing zeros CTZ finds the least significant 1 in a word and returns the number of less significant 0s in the word. Allocation uses the bitmap created by the garbage collector to locate an unmarked object. The logic takes a word of the bitmap, complements, and then caches it. It then uses CTZ to locate an available unmarked object. It then shifts marked bits out of the bitmap word preparing it for the next search. Once all the unmarked objects are used in the cached work the bitmap gets another word and repeats the process. Change-Id: Id2fc42d1d4b9893efaa2e1bd01896985b7e42f82 Reviewed-on: https://go-review.googlesource.com/21366 Reviewed-by: Austin Clements <austin@google.com>	2016-04-29 00:00:50 +00:00
Rick Hudson	2063d5d903	[dev.garbage] runtime: restructure alloc and mark bits Two changes are included here that are dependent on the other. The first is that allocBits and gcamrkBits are changed to a *uint8 which points to the first byte of that span's mark and alloc bits. Several places were altered to perform pointer arithmetic to locate the byte corresponding to an object in the span. The actual bit corresponding to an object is indexed in the byte by using the lower three bits of the objects index. The second change avoids the redundant calculation of an object's index. The index is returned from heapBitsForObject and then used by the functions indexing allocBits and gcmarkBits. Finally we no longer allocate the gc bits in the span structures. Instead we use an arena based allocation scheme that allows for a more compact bit map as well as recycling and bulk clearing of the mark bits. Change-Id: If4d04b2021c092ec39a4caef5937a8182c64dfef Reviewed-on: https://go-review.googlesource.com/20705 Reviewed-by: Austin Clements <austin@google.com>	2016-04-29 00:00:47 +00:00
Rick Hudson	23aeb34df1	[dev.garbage] Merge remote-tracking branch 'origin/master' into HEAD Change-Id: I282fd9ce9db435dfd35e882a9502ab1abc185297	2016-04-27 18:46:52 -04:00
Rick Hudson	f8d0d4fd59	[dev.garbage] runtime: cleanup and optimize span.base() Prior to this CL the base of a span was calculated in various places using shifts or calls to base(). This CL now always calls base() which has been optimized to calculate the base of the span when the span is initialized and store that value in the span structure. Change-Id: I661f2bfa21e3748a249cdf049ef9062db6e78100 Reviewed-on: https://go-review.googlesource.com/20703 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:59 +00:00
Rick Hudson	8dda1c4c08	[dev.garbage] runtime: remove heapBitsSweepSpan Prior to this CL the sweep phase was responsible for locating all objects that were about to be freed and calling a function to process the object. This was done by the function heapBitsSweepSpan. Part of processing included calls to tracefree and msanfree as well as counting how many objects were freed. The calls to tracefree and msanfree have been moved into the gcmalloc routine and called when the object is about to be reallocated. The counting of free objects has been optimized using an array based popcnt algorithm and if all the objects in a span are free then span is freed. Similarly the code to locate the next free object has been optimized to use an array based ctz (count trailing zero). Various hot paths in the allocation logic have been optimized. At this point the garbage benchmark is within 3% of the 1.6 release. Change-Id: I00643c442e2ada1685c010c3447e4ea8537d2dfa Reviewed-on: https://go-review.googlesource.com/20201 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:57 +00:00
Rick Hudson	4093481523	[dev.garbage] runtime: add bit and cache ctz64 (count trailing zero) Add to each span a 64 bit cache (allocCache) of the allocBits at freeindex. allocCache is shifted such that the lowest bit corresponds to the bit freeindex. allocBits uses a 0 to indicate an object is free, on the other hand allocCache uses a 1 to indicate an object is free. This facilitates ctz64 (count trailing zero) which counts the number of 0s trailing the least significant 1. This is also the index of the least significant 1. Each span maintains a freeindex indicating the boundary between allocated objects and unallocated objects. allocCache is shifted as freeindex is incremented such that the low bit in allocCache corresponds to the bit a freeindex in the allocBits array. Currently ctz64 is written in Go using a for loop so it is not very efficient. Use of the hardware instruction will follow. With this in mind comparisons of the garbage benchmark are as follows. 1.6 release 2.8 seconds dev:garbage branch 3.1 seconds. Profiling shows the go implementation of ctz64 takes up 1% of the total time. Change-Id: If084ed9c3b1eda9f3c6ab2e794625cb870b8167f Reviewed-on: https://go-review.googlesource.com/20200 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:54 +00:00
Rick Hudson	e4ac2d4acc	[dev.garbage] runtime: replace ref with allocCount This is a renaming of the field ref to the more appropriate allocCount. The field holds the number of objects in the span that are currently allocated. Some throws strings were adjusted to more accurately convey the meaning of allocCount. Change-Id: I10daf44e3e9cc24a10912638c7de3c1984ef8efe Reviewed-on: https://go-review.googlesource.com/19518 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:49 +00:00
Rick Hudson	3479b065d4	[dev.garbage] runtime: allocate directly from GC mark bits Instead of building a freelist from the mark bits generated by the GC this CL allocates directly from the mark bits. The approach moves the mark bits from the pointer/no pointer heap structures into their own per span data structures. The mark/allocation vectors consist of a single mark bit per object. Two vectors are maintained, one for allocation and one for the GC's mark phase. During the GC cycle's sweep phase the interpretation of the vectors is swapped. The mark vector becomes the allocation vector and the old allocation vector is cleared and becomes the mark vector that the next GC cycle will use. Marked entries in the allocation vector indicate that the object is not free. Each allocation vector maintains a boundary between areas of the span already allocated from and areas not yet allocated from. As objects are allocated this boundary is moved until it reaches the end of the span. At this point further allocations will be done from another span. Since we no longer sweep a span inspecting each freed object the responsibility for maintaining pointer/scalar bits in the heapBitMap containing is now the responsibility of the the routines doing the actual allocation. This CL is functionally complete and ready for performance tuning. Change-Id: I336e0fc21eef1066e0b68c7067cc71b9f3d50e04 Reviewed-on: https://go-review.googlesource.com/19470 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:47 +00:00
Rick Hudson	aed861038f	[dev.garbage] runtime: add stackfreelist The freelist for normal objects and the freelist for stacks share the same mspan field for holding the list head but are operated on by different code sequences. This overloading complicates the use of bit vectors for allocation of normal objects. This change refactors the use of the stackfreelist out from the use of freelist. Change-Id: I5b155b5b8a1fcd8e24c12ee1eb0800ad9b6b4fa0 Reviewed-on: https://go-review.googlesource.com/19315 Reviewed-by: Austin Clements <austin@google.com>	2016-04-27 21:54:39 +00:00
Rick Hudson	2ac8bdc52a	[dev.garbage] runtime: bitmap allocation data structs The bitmap allocation data structure prototypes. Before this is released these underlying data structures need to be more performant but the signatures of helper functions utilizing these structures will remain stable. Change-Id: I5ace12f2fb512a7038a52bbde2bfb7e98783bcbe Reviewed-on: https://go-review.googlesource.com/19221 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-04-27 21:54:35 +00:00
Austin Clements	2cdcb6f829	runtime: scavenge memory on physical page-aligned boundaries Currently the scavenger marks memory unused in multiples of the allocator page size (8K). This is safe as long as the true physical page size is 4K (or 8K), as it is on many platforms. However, on ARM64, PPC64x, and MIPS64, the physical page size is larger than 8K, so if we attempt to mark memory unused, the kernel will round the boundaries of the region out to all pages covered by the requested region, and we'll release a larger region of memory than intended. As a result, the scavenger is currently disabled on these platforms. Fix this by first rounding the region to be marked unused in to multiples of the physical page size, so that when we ask the kernel to mark it unused, it releases exactly the requested region. Fixes #9993. Change-Id: I96d5fdc2f77f9d69abadcea29bcfe55e68288cb1 Reviewed-on: https://go-review.googlesource.com/22066 Reviewed-by: Rick Hudson <rlh@golang.org>	2016-04-16 21:42:43 +00:00
Austin Clements	61f56e925e	runtime: fix pagesInUse accounting When we grow the heap, we create a temporary "in use" span for the memory acquired from the OS and then free that span to link it into the heap. Hence, we (1) increase pagesInUse when we make the temporary span so that (2) freeing the span will correctly decrease it. However, currently step (1) increases pagesInUse by the number of pages requested from the heap, while step (2) decreases it by the number of pages requested from the OS (the size of the temporary span). These aren't necessarily the same, since we round up the number of pages we request from the OS, so steps 1 and 2 don't necessarily cancel out like they're supposed to. Over time, this can add up and cause pagesInUse to underflow and wrap around to 2^64. The garbage collector computes the sweep ratio from this, so if this happens, the sweep ratio becomes effectively infinite, causing the first allocation on each P in a sweep cycle to sweep the entire heap. This makes sweeping effectively STW. Fix this by increasing pagesInUse in step 1 by the number of pages requested from the OS, so that the two steps correctly cancel out. We add a test that checks that the running total matches the actual state of the heap. Fixes #15022. For 1.6.x. Change-Id: Iefd9d6abe37d0d447cbdbdf9941662e4f18eeffc Reviewed-on: https://go-review.googlesource.com/21280 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2016-04-04 15:33:26 +00:00
Matthew Dempsky	a03bdc3e6b	runtime: eliminate unnecessary type conversions Automated refactoring produced using github.com/mdempsky/unconvert. Change-Id: Iacf871a4f221ef17f48999a464ab2858b2bbaa90 Reviewed-on: https://go-review.googlesource.com/20071 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-07 20:53:27 +00:00
Brad Fitzpatrick	5fea2ccc77	all: single space after period. The tree's pretty inconsistent about single space vs double space after a period in documentation. Make it consistently a single space, per earlier decisions. This means contributors won't be confused by misleading precedence. This CL doesn't use go/doc to parse. It only addresses // comments. It was generated with: $ perl -i -npe 's,^(\s// .+[a-z]\.) +([A-Z]),$1 $2,' $(git grep -l -E '^\s//(.+\.) +([A-Z])') $ go test go/doc -update Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7 Reviewed-on: https://go-review.googlesource.com/20022 Reviewed-by: Rob Pike <r@golang.org> Reviewed-by: Dave Day <djd@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>	2016-03-02 00:13:47 +00:00
Austin Clements	87d939dee8	runtime: fix (sometimes major) underestimation of heap_live Currently, we update memstats.heap_live from mcache.local_cachealloc whenever we lock the heap (e.g., to obtain a fresh span or to release an unused span). However, under the right circumstances, local_cachealloc can accumulate allocations up to the size of the entire heap without flushing them to heap_live. Specifically, since span allocations from an mcentral don't lock the heap, if a large number of pages are held in an mcentral and the application continues to use and free objects of that size class (e.g., the BinaryTree17 benchmark), local_cachealloc won't be flushed until the mcentral runs out of spans. This is a problem because, unlike many of the memory statistics that are purely informative, heap_live is used to determine when the garbage collector should start and how hard it should work. This commit eliminates local_cachealloc, instead atomically updating heap_live directly. To control contention, we do this only when obtaining a span from an mcentral. Furthermore, we make heap_live conservative: allocating a span assumes that all free slots in that span will be used and accounts for these when the span is allocated, before the objects themselves are. This is important because 1) this triggers the GC earlier than necessary rather than potentially too late and 2) this leads to a conservative GC rate rather than a GC rate that is potentially too low. Alternatively, we could have flushed local_cachealloc when it passed some threshold, but this would require determining a threshold and would cause heap_live to underestimate the true value rather than overestimate. Fixes #12199. name old time/op new time/op delta BinaryTree17-12 2.88s ± 4% 2.88s ± 1% ~ (p=0.470 n=19+19) Fannkuch11-12 2.48s ± 1% 2.48s ± 1% ~ (p=0.243 n=16+19) FmtFprintfEmpty-12 50.9ns ± 2% 50.7ns ± 1% ~ (p=0.238 n=15+14) FmtFprintfString-12 175ns ± 1% 171ns ± 1% -2.48% (p=0.000 n=18+18) FmtFprintfInt-12 159ns ± 1% 158ns ± 1% -0.78% (p=0.000 n=19+18) FmtFprintfIntInt-12 270ns ± 1% 265ns ± 2% -1.67% (p=0.000 n=18+18) FmtFprintfPrefixedInt-12 235ns ± 1% 234ns ± 0% ~ (p=0.362 n=18+19) FmtFprintfFloat-12 309ns ± 1% 308ns ± 1% -0.41% (p=0.001 n=18+19) FmtManyArgs-12 1.10µs ± 1% 1.08µs ± 0% -1.96% (p=0.000 n=19+18) GobDecode-12 7.81ms ± 1% 7.80ms ± 1% ~ (p=0.425 n=18+19) GobEncode-12 6.53ms ± 1% 6.53ms ± 1% ~ (p=0.817 n=19+19) Gzip-12 312ms ± 1% 312ms ± 2% ~ (p=0.967 n=19+20) Gunzip-12 42.0ms ± 1% 41.9ms ± 1% ~ (p=0.172 n=19+19) HTTPClientServer-12 63.7µs ± 1% 63.8µs ± 1% ~ (p=0.639 n=19+19) JSONEncode-12 16.4ms ± 1% 16.4ms ± 1% ~ (p=0.954 n=19+19) JSONDecode-12 58.5ms ± 1% 57.8ms ± 1% -1.27% (p=0.000 n=18+19) Mandelbrot200-12 3.86ms ± 1% 3.88ms ± 0% +0.44% (p=0.000 n=18+18) GoParse-12 3.67ms ± 2% 3.66ms ± 1% -0.52% (p=0.001 n=18+19) RegexpMatchEasy0_32-12 100ns ± 1% 100ns ± 0% ~ (p=0.257 n=19+18) RegexpMatchEasy0_1K-12 347ns ± 1% 347ns ± 1% ~ (p=0.527 n=18+18) RegexpMatchEasy1_32-12 83.7ns ± 2% 83.1ns ± 2% ~ (p=0.096 n=18+19) RegexpMatchEasy1_1K-12 509ns ± 1% 505ns ± 1% -0.75% (p=0.000 n=18+19) RegexpMatchMedium_32-12 130ns ± 2% 129ns ± 1% ~ (p=0.962 n=20+20) RegexpMatchMedium_1K-12 39.5µs ± 2% 39.4µs ± 1% ~ (p=0.376 n=20+19) RegexpMatchHard_32-12 2.04µs ± 0% 2.04µs ± 1% ~ (p=0.195 n=18+17) RegexpMatchHard_1K-12 61.4µs ± 1% 61.4µs ± 1% ~ (p=0.885 n=19+19) Revcomp-12 540ms ± 2% 542ms ± 4% ~ (p=0.552 n=19+17) Template-12 69.6ms ± 1% 71.2ms ± 1% +2.39% (p=0.000 n=20+20) TimeParse-12 357ns ± 1% 357ns ± 1% ~ (p=0.883 n=18+20) TimeFormat-12 379ns ± 1% 362ns ± 1% -4.53% (p=0.000 n=18+19) [Geo mean] 62.0µs 61.8µs -0.44% name old time/op new time/op delta XBenchGarbage-12 5.89ms ± 2% 5.81ms ± 2% -1.41% (p=0.000 n=19+18) Change-Id: I96b31cca6ae77c30693a891cff3fe663fa2447a0 Reviewed-on: https://go-review.googlesource.com/17748 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-12-15 16:16:08 +00:00
Michael Matloob	432cb66f16	runtime: break out system-specific constants into package sys runtime/internal/sys will hold system-, architecture- and config- specific constants. Updates #11647 Change-Id: I6db29c312556087a42e8d2bdd9af40d157c56b54 Reviewed-on: https://go-review.googlesource.com/16817 Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-12 17:04:45 +00:00
Matthew Dempsky	c17c42e8a5	runtime: rewrite lots of foo_Bar(f, ...) into f.bar(...) Applies to types fixAlloc, mCache, mCentral, mHeap, mSpan, and mSpanList. Two special cases: 1. mHeap_Scavenge() previously didn't take an mheap parameter, so it was specially handled in this CL. 2. mHeap_Free() would have collided with mheap's "free" field, so it's been renamed to (mheap).freeSpan to parallel its underlying (*mheap).freeSpanLocked method. Change-Id: I325938554cca432c166fe9d9d689af2bbd68de4b Reviewed-on: https://go-review.googlesource.com/16221 Reviewed-by: Ian Lance Taylor <iant@golang.org> Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-11-12 00:34:58 +00:00
Michael Matloob	67faca7d9c	runtime: break atomics out into package runtime/internal/atomic This change breaks out most of the atomics functions in the runtime into package runtime/internal/atomic. It adds some basic support in the toolchain for runtime packages, and also modifies linux/arm atomics to remove the dependency on the runtime's mutex. The mutexes have been replaced with spinlocks. all trybots are happy! In addition to the trybots, I've tested on the darwin/arm64 builder, on the darwin/arm builder, and on a ppc64le machine. Change-Id: I6698c8e3cf3834f55ce5824059f44d00dc8e3c2f Reviewed-on: https://go-review.googlesource.com/14204 Run-TryBot: Michael Matloob <matloob@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-11-10 17:38:04 +00:00
Austin Clements	f54bcedce1	runtime: beginning of decentralized off->mark transition This begins the conversion of the centralized GC coordinator to a decentralized state machine by introducing the internal API that triggers the first state transition from _GCoff to _GCmark (or _GCmarktermination). This change introduces the transition lock, the off->mark transition condition (which is very similar to shouldtriggergc()), and the general structure of a state transition. Since we're doing this conversion in stages, it then falls back to the GC coordinator to actually execute the cycle. We'll start moving logic out of the GC coordinator and in to transition functions next. This fixes a minor bug in gcstoptheworld debug mode where passing the heap trigger once could trigger multiple STW GCs. Updates #11970. Change-Id: I964087dd190a639eb5766398f8e1bbf8b352902f Reviewed-on: https://go-review.googlesource.com/16355 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com>	2015-11-05 21:23:17 +00:00
Dmitry Vyukov	ee0305e036	runtime: remove dead code runtime.free has long gone. Change-Id: I058f69e6481b8fa008e1951c29724731a8a3d081 Reviewed-on: https://go-review.googlesource.com/16593 Reviewed-by: Austin Clements <austin@google.com> Run-TryBot: Austin Clements <austin@google.com>	2015-11-03 19:20:21 +00:00
Dmitry Vyukov	bf606094ee	runtime: fix finalization and profiling of tiny allocations Handling of special records for tiny allocations has two problems: 1. Once we queue a finalizer we mark the object. As the result any subsequent finalizers for the same object will not be queued during this GC cycle. If we have 16 finalizers setup (the worst case), finalization will take 16 GC cycles. This is what caused misbehave of tinyfin.go. The actual flakiness was caused by the fact that fing is asynchronous and don't always run before the check. 2. If a tiny block has both finalizer and profile specials, it is possible that we both queue finalizer, preserve the object live and free the profile record. As the result heap profile can be skewed. Fix both issues by analyzing all special records for a single object at once. Also, make tinyfin test stricter and remove reliance on real time. Also, add a test for the problem 2. Currently heap profile missed about a half of live memory. Fixes #13100 Change-Id: I9ae4dc1c44893724138a4565ca5cae29f2e97544 Reviewed-on: https://go-review.googlesource.com/16591 Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: Dmitry Vyukov <dvyukov@google.com>	2015-11-03 18:57:18 +00:00
Matthew Dempsky	4ff231bca1	runtime: eliminate some unnecessary uintptr conversions arena_{start,used,end} are already uintptr, so no need to convert them to uintptr, much less to convert them to unsafe.Pointer and then to uintptr. No binary change to pkg/linux_amd64/runtime.a. Change-Id: Ia4232ed2a724c44fde7eba403c5fe8e6dccaa879 Reviewed-on: https://go-review.googlesource.com/16339 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>	2015-10-27 02:53:04 +00:00
Matthew Dempsky	1652a2c316	runtime: add mSpanList type to represent lists of mspans This CL introduces a new mSpanList type to replace the empty mspan variables that were previously used as list heads. To be type safe, the previous circular linked list data structure is now a tail queue instead. One complication of this is mSpanList_Remove needs to know the list a span is being removed from, but this appears to be computable in all circumstances. As a temporary sanity check, mSpanList_Insert and mSpanList_InsertBack record the list that an mspan has been inserted into so that mSpanList_Remove can verify that the correct list was specified. Whereas mspan is 112 bytes on amd64, mSpanList is only 16 bytes. This shrinks the size of mheap from 50216 bytes to 12584 bytes. Change-Id: I8146364753dbc3b4ab120afbb9c7b8740653c216 Reviewed-on: https://go-review.googlesource.com/15906 Run-TryBot: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com>	2015-10-22 17:12:06 +00:00
Matthew Dempsky	4c2465d47d	runtime: use unsafe.Pointer(x) instead of (unsafe.Pointer)(x) This isn't C anymore. No binary change to pkg/linux_amd64/runtime.a. Change-Id: I24d66b0f5ac888f432b874aac684b1395e7c8345 Reviewed-on: https://go-review.googlesource.com/15903 Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-10-15 21:48:37 +00:00
Austin Clements	89c341c5e9	runtime: directly track GC assist balance Currently we track the per-G GC assist balance as two monotonically increasing values: the bytes allocated by the G this cycle (gcalloc) and the scan work performed by the G this cycle (gcscanwork). The assist balance is hence assistRatiogcalloc - gcscanwork. This works, but has two important downsides: 1) It requires floating-point math to figure out if a G is in debt or not. This makes it inappropriate to check for assist debt in the hot path of mallocgc, so we only do this when a G allocates a new span. As a result, Gs can operate "in the red", leading to under-assist and extended GC cycle length. 2) Revising the assist ratio during a GC cycle can lead to an "assist burst". If you think of plotting the scan work performed versus heaps size, the assist ratio controls the slope of this line. However, in the current system, the target line always passes through 0 at the heap size that triggered GC, so if the runtime increases the assist ratio, there has to be a potentially large assist to jump from the current amount of scan work up to the new target scan work for the current heap size. This commit replaces this approach with directly tracking the GC assist balance in terms of allocation credit bytes. Allocating N bytes simply decreases this by N and assisting raises it by the amount of scan work performed divided by the assist ratio (to get back to bytes). This will make it cheap to figure out if a G is in debt, which will let us efficiently check if an assist is necessary before* performing an allocation and hence keep Gs "in the black". This also fixes assist bursts because the assist ratio is now in terms of remaining work, rather than work from the beginning of the GC cycle. Hence, the plot of scan work versus heap size becomes continuous: we can revise the slope, but this slope always starts from where we are right now, rather than where we were at the beginning of the cycle. Change-Id: Ia821c5f07f8a433e8da7f195b52adfedd58bdf2c Reviewed-on: https://go-review.googlesource.com/15408 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:52 +00:00
Austin Clements	9e77c89868	runtime: ensure minimum heap distance via heap goal Currently we ensure a minimum heap distance of 1MB when computing the assist ratio. Rather than enforcing this minimum on the heap distance, it makes more sense to enforce that the heap goal itself is at least 1MB over the live heap size at the beginning of GC. Currently the two approaches are semantically equivalent, but this will let us switch to basing the assist ratio on current heap distance rather than the initial heap distance, since we can't enforce this minimum on the current heap distance (the GC may never finish because the goal posts will always be 1MB away). Change-Id: I0027b1c26a41a0152b01e5b67bdb1140d43ee903 Reviewed-on: https://go-review.googlesource.com/15604 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-10-09 19:38:39 +00:00
Austin Clements	dac220b0a9	runtime: remove in-use page count loop from STW In order to compute the sweep ratio, the runtime needs to know how many pages belong to spans in state _MSpanInUse. Currently it finds this out by looping over all spans during mark termination. However, this takes ~1ms/heap GB, so multi-gigabyte heaps can quickly push our STW time past 10ms. Replace the loop with an actively maintained count of in-use pages. For multi-gigabyte heaps, this reduces max mark termination pause time by 75%–90% relative to tip and by 85%–95% relative to Go 1.5.1. This shifts the longest pause time for large heaps to the sweep termination phase, so it only slightly decreases max pause time, though it roughly halves mean pause time. Here are the results for the garbage benchmark: ---- max mark termination pause ---- Heap Procs after change before change 1.5.1 24GB 12 1.9ms 18ms 37ms 24GB 4 3.7ms 18ms 37ms 4GB 4 920µs 3.8ms 6.9ms Fixes #11484. Change-Id: Ia2d28bb8a1e4f1c3b8ebf79fb203f12b9bf114ac Reviewed-on: https://go-review.googlesource.com/15070 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:55:55 +00:00
Austin Clements	608c1b0d56	runtime: scan objects with finalizers concurrently This reduces pause time by ~25% relative to tip and by ~50% relative to Go 1.5.1. Currently one of the steps of STW mark termination is to loop (in parallel) over all spans to find objects with finalizers in order to mark all objects reachable from these objects and to treat the finalizer special as a root. Unfortunately, even if there are no finalizers at all, this loop takes roughly 1 ms/heap GB/core, so multi-gigabyte heaps can quickly push our STW time past 10ms. Fix this by moving this scan from mark termination to concurrent scan, where it can run in parallel with mutators. The loop itself could also be optimized, but this cost is small compared to concurrent marking. Making this scan concurrent introduces two complications: 1) The scan currently walks the specials list of each span without locking it, which is safe only with the world stopped. We fix this by speculatively checking if a span has any specials (the vast majority won't) and then locking the specials list only if there are specials to check. 2) An object can have a finalizer set after concurrent scan, in which case it won't have been marked appropriately by concurrent scan. If the finalizer is a closure and is only reachable from the special, it could be swept before it is run. Likewise, if the object is not marked yet when the finalizer is set and then becomes unreachable before it is marked, other objects reachable only from it may be swept before the finalizer function is run. We fix this issue by making addfinalizer ensure the same marking invariants as markroot does. For multi-gigabyte heaps, this reduces max pause time by 20%–30% relative to tip (depending on GOMAXPROCS) and by ~50% relative to Go 1.5.1 (where this loop was neither concurrent nor parallel). Here are the results for the garbage benchmark: ---------------- max pause ---------------- Heap Procs Concurrent scan STW parallel scan 1.5.1 24GB 12 18ms 23ms 37ms 24GB 4 18ms 25ms 37ms 4GB 4 3.8ms 4.9ms 6.9ms In all cases, 95%ile pause time is similar to the max pause time. This also improves mean STW time by 10%–30%. Fixes #11485. Change-Id: I9359d8c3d120a51d23d924b52bf853a1299b1dfd Reviewed-on: https://go-review.googlesource.com/14982 Reviewed-by: Rick Hudson <rlh@golang.org> Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>	2015-10-02 19:55:48 +00:00
Austin Clements	d57f037302	runtime: don't recheck heap trigger for periodic GC `88e945f` introduced a non-speculative double check of the heap trigger before actually starting a concurrent GC. This was necessary to fix a race for heap-triggered GC, but broke sysmon-triggered periodic GC, since the heap check will of course fail for periodically triggered GC. Fix this by telling startGC whether or not this GC was triggered by heap size or a timer and only doing the heap size double check for GCs triggered by heap size. Fixes #12026. Change-Id: I7c3f6ec364545c36d619f2b4b3bf3b758e3bcbd6 Reviewed-on: https://go-review.googlesource.com/13168 Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-05 17:28:56 +00:00
Austin Clements	1fb01a88f9	runtime: revise assist ratio aggressively At the start of a GC cycle, the garbage collector computes the assist ratio based on the total scannable heap size. This was intended to be conservative; after all, this assumes the entire heap may be reachable and hence needs to be scanned. But it only assumes that the current entire heap may be reachable. It fails to account for heap allocated during the GC cycle. If the trigger ratio is very low (near zero), and most of the heap is reachable when GC starts (which is likely if the trigger ratio is near zero), then it's possible for the mutator to create new, reachable heap fast enough that the assists won't keep up based on the assist ratio computed at the beginning of the cycle. As a result, the heap can grow beyond the heap goal (by hundreds of megs in stress tests like in issue #11911). We already have some vestigial logic for dealing with situations like this; it just doesn't run often enough. Currently, every 10 ms during the GC cycle, the GC revises the assist ratio. This was put in before we switched to a conservative assist ratio (when we really were using estimates of scannable heap), and it turns out to be exactly what we need now. However, every 10 ms is far too infrequent for a rapidly allocating mutator. This commit reuses this logic, but replaces the 10 ms timer with revising the assist ratio every time the heap is locked, which coincides precisely with when the statistics used to compute the assist ratio are updated. Fixes #11911. Change-Id: I377b231ab064946228378fa10422a46d1b50f4c5 Reviewed-on: https://go-review.googlesource.com/13047 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:48 +00:00
Austin Clements	fc9ca85f4c	runtime: make sweep proportional to spans bytes allocated Proportional concurrent sweep is currently based on a ratio of spans to be swept per bytes of object allocation. However, proportional sweeping is performed during span allocation, not object allocation, in order to minimize contention and overhead. Since objects are allocated from spans after those spans are allocated, the system tends to operate in debt, which means when the next GC cycle starts, there is often sweep debt remaining, so GC has to finish the sweep, which delays the start of the cycle and delays enabling mutator assists. For example, it's quite likely that many Ps will simultaneously refill their span caches immediately after a GC cycle (because GC flushes the span caches), but at this point, there has been very little object allocation since the end of GC, so very little sweeping is done. The Ps then allocate objects from these cached spans, which drives up the bytes of object allocation, but since these allocations are coming from cached spans, nothing considers whether more sweeping has to happen. If the sweep ratio is high enough (which can happen if the next GC trigger is very close to the retained heap size), this can easily represent a sweep debt of thousands of pages. Fix this by making proportional sweep proportional to the number of bytes of spans allocated, rather than the number of bytes of objects allocated. Prior to allocating a span, both the small object path and the large object path ensure credit for allocating that span, so the system operates in the black, rather than in the red. Combined with the previous commit, this should eliminate all sweeping from GC start up. On the stress test in issue #11911, this reduces the time spent sweeping during GC (and delaying start up) by several orders of magnitude: mean 99%ile max pre fix 1 ms 11 ms 144 ms post fix 270 ns 735 ns 916 ns Updates #11911. Change-Id: I89223712883954c9d6ec2a7a51ecb97172097df3 Reviewed-on: https://go-review.googlesource.com/13044 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>	2015-08-04 18:54:44 +00:00
Austin Clements	d57056ba26	runtime: don't free stack spans during GC Memory for stacks is manually managed by the runtime and, currently (with one exception) we free stack spans immediately when the last stack on a span is freed. However, the garbage collector assumes that spans can never transition from non-free to free during scan or mark. This disagreement makes it possible for the garbage collector to mark uninitialized objects and is blocking us from re-enabling the bad pointer test in the garbage collector (issue #9880). For example, the following sequence will result in marking an uninitialized object: 1. scanobject loads a pointer slot out of the object it's scanning. This happens to be one of the special pointers from the heap into a stack. Call the pointer p and suppose it points into X's stack. 2. X, running on another thread, grows its stack and frees its old stack. 3. The old stack happens to be large or was the last stack in its span, so X frees this span, setting it to state _MSpanFree. 4. The span gets reused as a heap span. 5. scanobject calls heapBitsForObject, which loads the span containing p, which is now in state _MSpanInUse, but doesn't necessarily have an object at p. The not-object at p gets marked, and at this point all sorts of things can go wrong. We already have a partial solution to this. When shrinking a stack, we put the old stack on a queue to be freed at the end of garbage collection. This was done to address exactly this problem, but wasn't a complete solution. This commit generalizes this solution to both shrinking and growing stacks. For stacks that fit in the stack pool, we simply don't free the span, even if its reference count reaches zero. It's fine to reuse the span for other stacks, and this enables that. At the end of GC, we sweep for cached stack spans with a zero reference count and free them. For larger stacks, we simply queue the stack span to be freed at the end of GC. Ideally, we would reuse these large stack spans the way we can small stack spans, but that's a more invasive change that will have to wait until after the freeze. Fixes #11267. Change-Id: Ib7f2c5da4845cc0268e8dc098b08465116972a71 Reviewed-on: https://go-review.googlesource.com/11502 Reviewed-by: Russ Cox <rsc@golang.org>	2015-06-29 15:33:40 +00:00
Austin Clements	eabdd05892	runtime: document memory ordering for h_spans h_spans can be accessed concurrently without synchronization from other threads, which means it needs the appropriate memory barriers on weakly ordered machines. It happens to already have the necessary memory barriers because all accesses to h_spans are currently protected by the heap lock and the unlocks happen in exactly the places where release barriers are needed, but it's easy to imagine that this could change in the future. Document the fact that we're depending on the barrier implied by the unlock. Related to issue #9984. Change-Id: I1bc3c95cd73361b041c8c95cd4bb92daf8c1f94a Reviewed-on: https://go-review.googlesource.com/11361 Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-06-23 18:28:46 +00:00
Austin Clements	9a3112bcae	runtime: one more Map{Bits,Spans} before arena_used update In order to avoid a race with a concurrent write barrier or garbage collector thread, any update to arena_used must be preceded by mapping the corresponding heap bitmap and spans array memory. Otherwise, the concurrent access may observe that a pointer falls within the heap arena, but then attempt to access unmapped memory to look up its span or heap bits. Commit `d57c889` fixed all of the places where we updated arena_used immediately before mapping the heap bitmap and spans, but it missed the one place where we update arena_used and depend on later code to update it again and map the bitmap and spans. This creates a window where the original race can still happen. This commit fixes this by mapping the heap bitmap and spans before this arena_used update as well. This code path is only taken when expanding the heap reservation on 32-bit over a hole in the address space, so these extra mmap calls should have negligible impact. Fixes #10212, #11324. Change-Id: Id67795e6c7563eb551873bc401e5cc997aaa2bd8 Reviewed-on: https://go-review.googlesource.com/11340 Run-TryBot: Austin Clements <austin@google.com> Reviewed-by: Rick Hudson <rlh@golang.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com>	2015-06-22 18:54:38 +00:00
Russ Cox	d57c889ae8	runtime: wait to update arena_used until after mapping bitmap This avoids a race with gcmarkwb_m that was leading to faults. Fixes #10212. Change-Id: I6fcf8d09f2692227063ce29152cb57366ea22487 Reviewed-on: https://go-review.googlesource.com/10816 Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com>	2015-06-11 18:15:21 +00:00
Ainar Garipov	7f9f70e5b6	all: fix misprints in comments These were found by grepping the comments from the go code and feeding the output to aspell. Change-Id: Id734d6c8d1938ec3c36bd94a4dbbad577e3ad395 Reviewed-on: https://go-review.googlesource.com/10941 Reviewed-by: Aamir Khan <syst3m.w0rm@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>	2015-06-11 14:18:57 +00:00
Russ Cox	001438bdfe	runtime: fix callwritebarrier Given a call frame F of size N where the return values start at offset R, callwritebarrier was instructing heapBitsBulkBarrier to scan the block of memory [F+R, F+R+N). It should only scan [F+R, F+N). The extra N-R bytes scanned might lead into the next allocated block in memory. Because the scan was consulting the heap bitmap for type information, scanning into the next block normally "just worked" in the sense of not crashing. Scanning the extra N-R bytes of memory is a problem mainly because it causes the GC to consider pointers that might otherwise not be considered, leading it to retain objects that should actually be freed. This is very difficult to detect. Luckily, juju turned up a case where the heap bitmap and the memory were out of sync for the block immediately after the call frame, so that heapBitsBulkBarrier saw an obvious non-pointer where it expected a pointer, causing a loud crash. Why is there a non-pointer in memory that the heap bitmap records as a pointer? That is more difficult to answer. At least one way that it could happen is that allocations containing no pointers at all do not update the heap bitmap. So if heapBitsBulkBarrier walked out of the current object and into a no-pointer object and consulted those bitmap bits, it would be misled. This doesn't happen in general because all the paths to heapBitsBulkBarrier first check for the no-pointer case. This may or may not be what happened, but it's the only scenario I've been able to construct. I tried for quite a while to write a simple test for this and could not. It does fix the juju crash, and it is clearly an improvement over the old code. Fixes #10844. Change-Id: I53982c93ef23ef93155c4086bbd95a4c4fdaac9a Reviewed-on: https://go-review.googlesource.com/10317 Reviewed-by: Austin Clements <austin@google.com>	2015-05-21 19:14:03 +00:00
Russ Cox	363fd1dd1b	runtime: move a few atomic fields up Moving them up makes them properly aligned on 32-bit systems. There are some odd fields above them right now (like fixalloc and mutex maybe). Change-Id: I57851a5bbb2e7cc339712f004f99bb6c0cce0ca5 Reviewed-on: https://go-review.googlesource.com/9889 Reviewed-by: Austin Clements <austin@google.com>	2015-05-11 16:08:57 +00:00
Russ Cox	1635ab7dfe	runtime: remove wbshadow mode The write barrier shadow heap was very useful for developing the write barriers initially, but it's no longer used, clunky, and dragging the rest of the implementation down. The gccheckmark mode will find bugs due to missed barriers when they result in missed marks; wbshadow mode found the missed barriers more aggressively, but it required an entire separate copy of the heap. The gccheckmark mode requires no extra memory, making it more useful in practice. Compared to previous CL: name old mean new mean delta BinaryTree17 5.91s × (0.96,1.06) 5.72s × (0.97,1.03) -3.12% (p=0.000) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.91% (p=0.000) FmtFprintfEmpty 89.0ns × (0.93,1.10) 86.6ns × (0.96,1.11) ~ (p=0.077) FmtFprintfString 298ns × (0.98,1.06) 283ns × (0.99,1.04) -4.90% (p=0.000) FmtFprintfInt 286ns × (0.98,1.03) 283ns × (0.98,1.04) -1.09% (p=0.032) FmtFprintfIntInt 498ns × (0.97,1.06) 480ns × (0.99,1.02) -3.65% (p=0.000) FmtFprintfPrefixedInt 408ns × (0.98,1.02) 396ns × (0.99,1.01) -3.00% (p=0.000) FmtFprintfFloat 587ns × (0.98,1.01) 562ns × (0.99,1.01) -4.34% (p=0.000) FmtManyArgs 1.94µs × (0.99,1.02) 1.89µs × (0.99,1.01) -2.85% (p=0.000) GobDecode 15.8ms × (0.98,1.03) 15.7ms × (0.99,1.02) ~ (p=0.251) GobEncode 12.0ms × (0.96,1.09) 11.8ms × (0.98,1.03) -1.87% (p=0.024) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.688) Gunzip 143ms × (1.00,1.01) 143ms × (1.00,1.01) ~ (p=0.203) HTTPClientServer 90.3µs × (0.98,1.01) 89.1µs × (0.99,1.02) -1.30% (p=0.000) JSONEncode 31.6ms × (0.99,1.01) 31.7ms × (0.98,1.02) ~ (p=0.219) JSONDecode 107ms × (1.00,1.01) 111ms × (0.99,1.01) +3.58% (p=0.000) Mandelbrot200 6.03ms × (1.00,1.01) 6.01ms × (1.00,1.00) ~ (p=0.077) GoParse 6.53ms × (0.99,1.03) 6.54ms × (0.99,1.02) ~ (p=0.585) RegexpMatchEasy0_32 161ns × (1.00,1.01) 161ns × (0.98,1.05) ~ (p=0.948) RegexpMatchEasy0_1K 541ns × (0.99,1.01) 559ns × (0.98,1.01) +3.32% (p=0.000) RegexpMatchEasy1_32 138ns × (1.00,1.00) 137ns × (0.99,1.01) -0.55% (p=0.001) RegexpMatchEasy1_1K 887ns × (0.99,1.01) 878ns × (0.99,1.01) -0.98% (p=0.000) RegexpMatchMedium_32 253ns × (0.99,1.01) 252ns × (0.99,1.01) -0.39% (p=0.001) RegexpMatchMedium_1K 72.8µs × (1.00,1.00) 72.7µs × (1.00,1.00) ~ (p=0.485) RegexpMatchHard_32 3.85µs × (1.00,1.01) 3.85µs × (1.00,1.01) ~ (p=0.283) RegexpMatchHard_1K 117µs × (1.00,1.01) 117µs × (1.00,1.00) ~ (p=0.175) Revcomp 922ms × (0.97,1.08) 903ms × (0.98,1.05) -2.15% (p=0.021) Template 126ms × (0.99,1.01) 126ms × (0.99,1.01) ~ (p=0.943) TimeParse 628ns × (0.99,1.01) 634ns × (0.99,1.01) +0.92% (p=0.000) TimeFormat 668ns × (0.99,1.01) 698ns × (0.98,1.03) +4.53% (p=0.000) It's nice that the microbenchmarks are the ones helped the most, because those were the ones hurt the most by the conversion from 4-bit to 2-bit heap bitmaps. This CL brings the overall effect of that process to (compared to CL 9706 patch set 1): name old mean new mean delta BinaryTree17 5.87s × (0.94,1.09) 5.72s × (0.97,1.03) -2.57% (p=0.011) Fannkuch11 4.32s × (1.00,1.00) 4.36s × (1.00,1.00) +0.87% (p=0.000) FmtFprintfEmpty 89.1ns × (0.95,1.16) 86.6ns × (0.96,1.11) ~ (p=0.090) FmtFprintfString 283ns × (0.98,1.02) 283ns × (0.99,1.04) ~ (p=0.681) FmtFprintfInt 284ns × (0.98,1.04) 283ns × (0.98,1.04) ~ (p=0.620) FmtFprintfIntInt 486ns × (0.98,1.03) 480ns × (0.99,1.02) -1.27% (p=0.002) FmtFprintfPrefixedInt 400ns × (0.99,1.02) 396ns × (0.99,1.01) -0.84% (p=0.001) FmtFprintfFloat 566ns × (0.99,1.01) 562ns × (0.99,1.01) -0.80% (p=0.000) FmtManyArgs 1.91µs × (0.99,1.02) 1.89µs × (0.99,1.01) -1.10% (p=0.000) GobDecode 15.5ms × (0.98,1.05) 15.7ms × (0.99,1.02) +1.55% (p=0.005) GobEncode 11.9ms × (0.97,1.03) 11.8ms × (0.98,1.03) -0.97% (p=0.048) Gzip 648ms × (0.99,1.01) 647ms × (0.99,1.01) ~ (p=0.627) Gunzip 143ms × (1.00,1.00) 143ms × (1.00,1.01) ~ (p=0.482) HTTPClientServer 89.2µs × (0.99,1.02) 89.1µs × (0.99,1.02) ~ (p=0.740) JSONEncode 32.3ms × (0.97,1.06) 31.7ms × (0.98,1.02) -1.95% (p=0.002) JSONDecode 106ms × (0.99,1.01) 111ms × (0.99,1.01) +4.22% (p=0.000) Mandelbrot200 6.02ms × (1.00,1.00) 6.01ms × (1.00,1.00) ~ (p=0.417) GoParse 6.57ms × (0.97,1.06) 6.54ms × (0.99,1.02) ~ (p=0.404) RegexpMatchEasy0_32 162ns × (1.00,1.00) 161ns × (0.98,1.05) ~ (p=0.088) RegexpMatchEasy0_1K 561ns × (0.99,1.02) 559ns × (0.98,1.01) -0.47% (p=0.034) RegexpMatchEasy1_32 145ns × (0.95,1.04) 137ns × (0.99,1.01) -5.56% (p=0.000) RegexpMatchEasy1_1K 864ns × (0.99,1.04) 878ns × (0.99,1.01) +1.57% (p=0.000) RegexpMatchMedium_32 255ns × (0.99,1.04) 252ns × (0.99,1.01) -1.43% (p=0.001) RegexpMatchMedium_1K 73.9µs × (0.98,1.04) 72.7µs × (1.00,1.00) -1.55% (p=0.004) RegexpMatchHard_32 3.92µs × (0.98,1.04) 3.85µs × (1.00,1.01) -1.80% (p=0.003) RegexpMatchHard_1K 120µs × (0.98,1.04) 117µs × (1.00,1.00) -2.13% (p=0.001) Revcomp 936ms × (0.95,1.08) 903ms × (0.98,1.05) -3.58% (p=0.002) Template 130ms × (0.98,1.04) 126ms × (0.99,1.01) -2.98% (p=0.000) TimeParse 638ns × (0.98,1.05) 634ns × (0.99,1.01) ~ (p=0.198) TimeFormat 674ns × (0.99,1.01) 698ns × (0.98,1.03) +3.69% (p=0.000) Change-Id: Ia0e9b50b1d75a3c0c7556184cd966305574fe07c Reviewed-on: https://go-review.googlesource.com/9706 Reviewed-by: Rick Hudson <rlh@golang.org>	2015-05-11 14:55:11 +00:00

1 2

70 commits