go/src/runtime/malloc_test.go

370 lines
8.7 KiB
Go
Raw Normal View History

// Copyright 2013 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package runtime_test
import (
runtime: do not scan stack by frames during garbage collection Walking the stack by frames is ~3x more expensive than not, and since it didn't end up being precise, there is not enough benefit to outweigh the cost. This is the conservative choice: this CL makes the stack scanning behavior the same as it was in Go 1.1. Add benchmarks to package runtime so that we have them when we re-enable this feature during the Go 1.3 development. benchmark old ns/op new ns/op delta BenchmarkGoroutineSelect 3194909 1272092 -60.18% BenchmarkGoroutineBlocking 3120282 866366 -72.23% BenchmarkGoroutineForRange 3256179 939902 -71.13% BenchmarkGoroutineIdle 2005571 482982 -75.92% The Go 1 benchmarks, just to add more data. As far as I can tell the changes are mainly noise. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4409403046 4414734932 +0.12% BenchmarkFannkuch11 3407708965 3378306120 -0.86% BenchmarkFmtFprintfEmpty 100 99 -0.60% BenchmarkFmtFprintfString 242 239 -1.24% BenchmarkFmtFprintfInt 204 206 +0.98% BenchmarkFmtFprintfIntInt 320 316 -1.25% BenchmarkFmtFprintfPrefixedInt 295 299 +1.36% BenchmarkFmtFprintfFloat 442 435 -1.58% BenchmarkFmtManyArgs 1246 1216 -2.41% BenchmarkGobDecode 10186951 10051210 -1.33% BenchmarkGobEncode 16504381 16445650 -0.36% BenchmarkGzip 447030885 447056865 +0.01% BenchmarkGunzip 111056154 111696305 +0.58% BenchmarkHTTPClientServer 89973 93040 +3.41% BenchmarkJSONEncode 28174182 27933893 -0.85% BenchmarkJSONDecode 106353777 110443817 +3.85% BenchmarkMandelbrot200 4822289 4806083 -0.34% BenchmarkGoParse 6102436 6142734 +0.66% BenchmarkRegexpMatchEasy0_32 133 132 -0.75% BenchmarkRegexpMatchEasy0_1K 372 373 +0.27% BenchmarkRegexpMatchEasy1_32 113 111 -1.77% BenchmarkRegexpMatchEasy1_1K 964 940 -2.49% BenchmarkRegexpMatchMedium_32 202 205 +1.49% BenchmarkRegexpMatchMedium_1K 68862 68858 -0.01% BenchmarkRegexpMatchHard_32 3480 3407 -2.10% BenchmarkRegexpMatchHard_1K 108255 112614 +4.03% BenchmarkRevcomp 751393035 743929976 -0.99% BenchmarkTemplate 139637041 135402220 -3.03% BenchmarkTimeParse 479 475 -0.84% BenchmarkTimeFormat 460 466 +1.30% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 75.34 76.36 1.01x BenchmarkGobEncode 46.50 46.67 1.00x BenchmarkGzip 43.41 43.41 1.00x BenchmarkGunzip 174.73 173.73 0.99x BenchmarkJSONEncode 68.87 69.47 1.01x BenchmarkJSONDecode 18.25 17.57 0.96x BenchmarkGoParse 9.49 9.43 0.99x BenchmarkRegexpMatchEasy0_32 239.58 241.74 1.01x BenchmarkRegexpMatchEasy0_1K 2749.74 2738.00 1.00x BenchmarkRegexpMatchEasy1_32 282.49 286.32 1.01x BenchmarkRegexpMatchEasy1_1K 1062.00 1088.96 1.03x BenchmarkRegexpMatchMedium_32 4.93 4.86 0.99x BenchmarkRegexpMatchMedium_1K 14.87 14.87 1.00x BenchmarkRegexpMatchHard_32 9.19 9.39 1.02x BenchmarkRegexpMatchHard_1K 9.46 9.09 0.96x BenchmarkRevcomp 338.26 341.65 1.01x BenchmarkTemplate 13.90 14.33 1.03x Fixes #6482. R=golang-dev, dave, r CC=golang-dev https://golang.org/cl/14257043
2013-10-02 11:59:53 -04:00
"flag"
"fmt"
"internal/race"
runtime: use sparse mappings for the heap This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-12-19 22:05:23 -08:00
"internal/testenv"
"os"
"os/exec"
"reflect"
. "runtime"
runtime: use sparse mappings for the heap This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-12-19 22:05:23 -08:00
"strings"
"testing"
runtime: do not scan stack by frames during garbage collection Walking the stack by frames is ~3x more expensive than not, and since it didn't end up being precise, there is not enough benefit to outweigh the cost. This is the conservative choice: this CL makes the stack scanning behavior the same as it was in Go 1.1. Add benchmarks to package runtime so that we have them when we re-enable this feature during the Go 1.3 development. benchmark old ns/op new ns/op delta BenchmarkGoroutineSelect 3194909 1272092 -60.18% BenchmarkGoroutineBlocking 3120282 866366 -72.23% BenchmarkGoroutineForRange 3256179 939902 -71.13% BenchmarkGoroutineIdle 2005571 482982 -75.92% The Go 1 benchmarks, just to add more data. As far as I can tell the changes are mainly noise. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4409403046 4414734932 +0.12% BenchmarkFannkuch11 3407708965 3378306120 -0.86% BenchmarkFmtFprintfEmpty 100 99 -0.60% BenchmarkFmtFprintfString 242 239 -1.24% BenchmarkFmtFprintfInt 204 206 +0.98% BenchmarkFmtFprintfIntInt 320 316 -1.25% BenchmarkFmtFprintfPrefixedInt 295 299 +1.36% BenchmarkFmtFprintfFloat 442 435 -1.58% BenchmarkFmtManyArgs 1246 1216 -2.41% BenchmarkGobDecode 10186951 10051210 -1.33% BenchmarkGobEncode 16504381 16445650 -0.36% BenchmarkGzip 447030885 447056865 +0.01% BenchmarkGunzip 111056154 111696305 +0.58% BenchmarkHTTPClientServer 89973 93040 +3.41% BenchmarkJSONEncode 28174182 27933893 -0.85% BenchmarkJSONDecode 106353777 110443817 +3.85% BenchmarkMandelbrot200 4822289 4806083 -0.34% BenchmarkGoParse 6102436 6142734 +0.66% BenchmarkRegexpMatchEasy0_32 133 132 -0.75% BenchmarkRegexpMatchEasy0_1K 372 373 +0.27% BenchmarkRegexpMatchEasy1_32 113 111 -1.77% BenchmarkRegexpMatchEasy1_1K 964 940 -2.49% BenchmarkRegexpMatchMedium_32 202 205 +1.49% BenchmarkRegexpMatchMedium_1K 68862 68858 -0.01% BenchmarkRegexpMatchHard_32 3480 3407 -2.10% BenchmarkRegexpMatchHard_1K 108255 112614 +4.03% BenchmarkRevcomp 751393035 743929976 -0.99% BenchmarkTemplate 139637041 135402220 -3.03% BenchmarkTimeParse 479 475 -0.84% BenchmarkTimeFormat 460 466 +1.30% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 75.34 76.36 1.01x BenchmarkGobEncode 46.50 46.67 1.00x BenchmarkGzip 43.41 43.41 1.00x BenchmarkGunzip 174.73 173.73 0.99x BenchmarkJSONEncode 68.87 69.47 1.01x BenchmarkJSONDecode 18.25 17.57 0.96x BenchmarkGoParse 9.49 9.43 0.99x BenchmarkRegexpMatchEasy0_32 239.58 241.74 1.01x BenchmarkRegexpMatchEasy0_1K 2749.74 2738.00 1.00x BenchmarkRegexpMatchEasy1_32 282.49 286.32 1.01x BenchmarkRegexpMatchEasy1_1K 1062.00 1088.96 1.03x BenchmarkRegexpMatchMedium_32 4.93 4.86 0.99x BenchmarkRegexpMatchMedium_1K 14.87 14.87 1.00x BenchmarkRegexpMatchHard_32 9.19 9.39 1.02x BenchmarkRegexpMatchHard_1K 9.46 9.09 0.96x BenchmarkRevcomp 338.26 341.65 1.01x BenchmarkTemplate 13.90 14.33 1.03x Fixes #6482. R=golang-dev, dave, r CC=golang-dev https://golang.org/cl/14257043
2013-10-02 11:59:53 -04:00
"time"
"unsafe"
)
var testMemStatsCount int
func TestMemStats(t *testing.T) {
testMemStatsCount++
// Make sure there's at least one forced GC.
GC()
// Test that MemStats has sane values.
st := new(MemStats)
ReadMemStats(st)
nz := func(x interface{}) error {
if x != reflect.Zero(reflect.TypeOf(x)).Interface() {
return nil
}
return fmt.Errorf("zero value")
}
le := func(thresh float64) func(interface{}) error {
return func(x interface{}) error {
// These sanity tests aren't necessarily valid
// with high -test.count values, so only run
// them once.
if testMemStatsCount > 1 {
return nil
}
if reflect.ValueOf(x).Convert(reflect.TypeOf(thresh)).Float() < thresh {
return nil
}
return fmt.Errorf("insanely high value (overflow?); want <= %v", thresh)
}
}
eq := func(x interface{}) func(interface{}) error {
return func(y interface{}) error {
if x == y {
return nil
}
return fmt.Errorf("want %v", x)
}
}
// Of the uint fields, HeapReleased, HeapIdle can be 0.
// PauseTotalNs can be 0 if timer resolution is poor.
fields := map[string][]func(interface{}) error{
"Alloc": {nz, le(1e10)}, "TotalAlloc": {nz, le(1e11)}, "Sys": {nz, le(1e10)},
"Lookups": {eq(uint64(0))}, "Mallocs": {nz, le(1e10)}, "Frees": {nz, le(1e10)},
"HeapAlloc": {nz, le(1e10)}, "HeapSys": {nz, le(1e10)}, "HeapIdle": {le(1e10)},
"HeapInuse": {nz, le(1e10)}, "HeapReleased": {le(1e10)}, "HeapObjects": {nz, le(1e10)},
"StackInuse": {nz, le(1e10)}, "StackSys": {nz, le(1e10)},
"MSpanInuse": {nz, le(1e10)}, "MSpanSys": {nz, le(1e10)},
"MCacheInuse": {nz, le(1e10)}, "MCacheSys": {nz, le(1e10)},
"BuckHashSys": {nz, le(1e10)}, "GCSys": {nz, le(1e10)}, "OtherSys": {nz, le(1e10)},
"NextGC": {nz, le(1e10)}, "LastGC": {nz},
"PauseTotalNs": {le(1e11)}, "PauseNs": nil, "PauseEnd": nil,
"NumGC": {nz, le(1e9)}, "NumForcedGC": {nz, le(1e9)},
"GCCPUFraction": {le(0.99)}, "EnableGC": {eq(true)}, "DebugGC": {eq(false)},
"BySize": nil,
}
rst := reflect.ValueOf(st).Elem()
for i := 0; i < rst.Type().NumField(); i++ {
name, val := rst.Type().Field(i).Name, rst.Field(i).Interface()
checks, ok := fields[name]
if !ok {
t.Errorf("unknown MemStats field %s", name)
continue
}
for _, check := range checks {
if err := check(val); err != nil {
t.Errorf("%s = %v: %s", name, val, err)
}
}
}
if st.Sys != st.HeapSys+st.StackSys+st.MSpanSys+st.MCacheSys+
st.BuckHashSys+st.GCSys+st.OtherSys {
t.Fatalf("Bad sys value: %+v", *st)
}
if st.HeapIdle+st.HeapInuse != st.HeapSys {
t.Fatalf("HeapIdle(%d) + HeapInuse(%d) should be equal to HeapSys(%d), but isn't.", st.HeapIdle, st.HeapInuse, st.HeapSys)
}
if lpe := st.PauseEnd[int(st.NumGC+255)%len(st.PauseEnd)]; st.LastGC != lpe {
t.Fatalf("LastGC(%d) != last PauseEnd(%d)", st.LastGC, lpe)
}
var pauseTotal uint64
for _, pause := range st.PauseNs {
pauseTotal += pause
}
if int(st.NumGC) < len(st.PauseNs) {
// We have all pauses, so this should be exact.
if st.PauseTotalNs != pauseTotal {
t.Fatalf("PauseTotalNs(%d) != sum PauseNs(%d)", st.PauseTotalNs, pauseTotal)
}
for i := int(st.NumGC); i < len(st.PauseNs); i++ {
if st.PauseNs[i] != 0 {
t.Fatalf("Non-zero PauseNs[%d]: %+v", i, st)
}
if st.PauseEnd[i] != 0 {
t.Fatalf("Non-zero PauseEnd[%d]: %+v", i, st)
}
}
} else {
if st.PauseTotalNs < pauseTotal {
t.Fatalf("PauseTotalNs(%d) < sum PauseNs(%d)", st.PauseTotalNs, pauseTotal)
}
}
if st.NumForcedGC > st.NumGC {
t.Fatalf("NumForcedGC(%d) > NumGC(%d)", st.NumForcedGC, st.NumGC)
}
}
func TestStringConcatenationAllocs(t *testing.T) {
n := testing.AllocsPerRun(1e3, func() {
b := make([]byte, 10)
for i := 0; i < 10; i++ {
b[i] = byte(i) + '0'
}
s := "foo" + string(b)
if want := "foo0123456789"; s != want {
t.Fatalf("want %v, got %v", want, s)
}
})
// Only string concatenation allocates.
if n != 1 {
t.Fatalf("want 1 allocation, got %v", n)
}
}
func TestTinyAlloc(t *testing.T) {
const N = 16
var v [N]unsafe.Pointer
for i := range v {
v[i] = unsafe.Pointer(new(byte))
}
chunks := make(map[uintptr]bool, N)
for _, p := range v {
chunks[uintptr(p)&^7] = true
}
if len(chunks) == N {
t.Fatal("no bytes allocated within the same 8-byte chunk")
}
}
runtime: use sparse mappings for the heap This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-12-19 22:05:23 -08:00
type acLink struct {
x [1 << 20]byte
}
var arenaCollisionSink []*acLink
func TestArenaCollision(t *testing.T) {
testenv.MustHaveExec(t)
runtime: use sparse mappings for the heap This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-12-19 22:05:23 -08:00
// Test that mheap.sysAlloc handles collisions with other
// memory mappings.
if os.Getenv("TEST_ARENA_COLLISION") != "1" {
cmd := testenv.CleanCmdEnv(exec.Command(os.Args[0], "-test.run=TestArenaCollision", "-test.v"))
cmd.Env = append(cmd.Env, "TEST_ARENA_COLLISION=1")
out, err := cmd.CombinedOutput()
if race.Enabled {
// This test runs the runtime out of hint
// addresses, so it will start mapping the
// heap wherever it can. The race detector
// doesn't support this, so look for the
// expected failure.
if want := "too many address space collisions"; !strings.Contains(string(out), want) {
t.Fatalf("want %q, got:\n%s", want, string(out))
}
} else if !strings.Contains(string(out), "PASS\n") || err != nil {
runtime: use sparse mappings for the heap This replaces the contiguous heap arena mapping with a potentially sparse mapping that can support heap mappings anywhere in the address space. This has several advantages over the current approach: * There is no longer any limit on the size of the Go heap. (Currently it's limited to 512GB.) Hence, this fixes #10460. * It eliminates many failures modes of heap initialization and growing. In particular it eliminates any possibility of panicking with an address space conflict. This can happen for many reasons and even causes a low but steady rate of TSAN test failures because of conflicts with the TSAN runtime. See #16936 and #11993. * It eliminates the notion of "non-reserved" heap, which was added because creating huge address space reservations (particularly on 64-bit) led to huge process VSIZE. This was at best confusing and at worst conflicted badly with ulimit -v. However, the non-reserved heap logic is complicated, can race with other mappings in non-pure Go binaries (e.g., #18976), and requires that the entire heap be either reserved or non-reserved. We currently maintain the latter property, but it's quite difficult to convince yourself of that, and hence difficult to keep correct. This logic is still present, but will be removed in the next CL. * It fixes problems on 32-bit where skipping over parts of the address space leads to mapping huge (and never-to-be-used) metadata structures. See #19831. This also completely rewrites and significantly simplifies mheap.sysAlloc, which has been a source of many bugs. E.g., #21044, #20259, #18651, and #13143 (and maybe #23222). This change also makes it possible to allocate individual objects larger than 512GB. As a result, a few tests that expected huge allocations to fail needed to be changed to make even larger allocations. However, at the moment attempting to allocate a humongous object may cause the program to freeze for several minutes on Linux as we fall back to probing every page with addrspace_free. That logic (and this failure mode) will be removed in the next CL. Fixes #10460. Fixes #22204 (since it rewrites the code involved). This slightly slows down compilebench and the x/benchmarks garbage benchmark. name old time/op new time/op delta Template 184ms ± 1% 185ms ± 1% ~ (p=0.065 n=10+9) Unicode 86.9ms ± 3% 86.3ms ± 1% ~ (p=0.631 n=10+10) GoTypes 599ms ± 0% 602ms ± 0% +0.56% (p=0.000 n=10+9) Compiler 2.87s ± 1% 2.89s ± 1% +0.51% (p=0.002 n=9+10) SSA 7.29s ± 1% 7.25s ± 1% ~ (p=0.182 n=10+9) Flate 118ms ± 2% 118ms ± 1% ~ (p=0.113 n=9+9) GoParser 147ms ± 1% 148ms ± 1% +1.07% (p=0.003 n=9+10) Reflect 401ms ± 1% 404ms ± 1% +0.71% (p=0.003 n=10+9) Tar 175ms ± 1% 175ms ± 1% ~ (p=0.604 n=9+10) XML 209ms ± 1% 210ms ± 1% ~ (p=0.052 n=10+10) (https://perf.golang.org/search?q=upload:20171231.4) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.23ms ± 1% 2.25ms ± 1% +0.84% (p=0.000 n=19+19) (https://perf.golang.org/search?q=upload:20171231.3) Relative to the start of the sparse heap changes (starting at and including "runtime: fix various contiguous bitmap assumptions"), overall slowdown is roughly 1% on GC-intensive benchmarks: name old time/op new time/op delta Template 183ms ± 1% 185ms ± 1% +1.32% (p=0.000 n=9+9) Unicode 84.9ms ± 2% 86.3ms ± 1% +1.65% (p=0.000 n=9+10) GoTypes 595ms ± 1% 602ms ± 0% +1.19% (p=0.000 n=9+9) Compiler 2.86s ± 0% 2.89s ± 1% +0.91% (p=0.000 n=9+10) SSA 7.19s ± 0% 7.25s ± 1% +0.75% (p=0.000 n=8+9) Flate 117ms ± 1% 118ms ± 1% +1.10% (p=0.000 n=10+9) GoParser 146ms ± 2% 148ms ± 1% +1.48% (p=0.002 n=10+10) Reflect 398ms ± 1% 404ms ± 1% +1.51% (p=0.000 n=10+9) Tar 173ms ± 1% 175ms ± 1% +1.17% (p=0.000 n=10+10) XML 208ms ± 1% 210ms ± 1% +0.62% (p=0.011 n=10+10) [Geo mean] 369ms 373ms +1.17% (https://perf.golang.org/search?q=upload:20180101.2) name old time/op new time/op delta Garbage/benchmem-MB=64-12 2.22ms ± 1% 2.25ms ± 1% +1.51% (p=0.000 n=20+19) (https://perf.golang.org/search?q=upload:20180101.3) Change-Id: I5daf4cfec24b252e5a57001f0a6c03f22479d0f0 Reviewed-on: https://go-review.googlesource.com/85887 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Rick Hudson <rlh@golang.org>
2017-12-19 22:05:23 -08:00
t.Fatalf("%s\n(exit status %v)", string(out), err)
}
return
}
disallowed := [][2]uintptr{}
// Drop all but the next 3 hints. 64-bit has a lot of hints,
// so it would take a lot of memory to go through all of them.
KeepNArenaHints(3)
// Consume these 3 hints and force the runtime to find some
// fallback hints.
for i := 0; i < 5; i++ {
// Reserve memory at the next hint so it can't be used
// for the heap.
start, end := MapNextArenaHint()
disallowed = append(disallowed, [2]uintptr{start, end})
// Allocate until the runtime tries to use the hint we
// just mapped over.
hint := GetNextArenaHint()
for GetNextArenaHint() == hint {
ac := new(acLink)
arenaCollisionSink = append(arenaCollisionSink, ac)
// The allocation must not have fallen into
// one of the reserved regions.
p := uintptr(unsafe.Pointer(ac))
for _, d := range disallowed {
if d[0] <= p && p < d[1] {
t.Fatalf("allocation %#x in reserved region [%#x, %#x)", p, d[0], d[1])
}
}
}
}
}
var mallocSink uintptr
func BenchmarkMalloc8(b *testing.B) {
var x uintptr
for i := 0; i < b.N; i++ {
p := new(int64)
x ^= uintptr(unsafe.Pointer(p))
}
mallocSink = x
}
func BenchmarkMalloc16(b *testing.B) {
var x uintptr
for i := 0; i < b.N; i++ {
p := new([2]int64)
x ^= uintptr(unsafe.Pointer(p))
}
mallocSink = x
}
func BenchmarkMallocTypeInfo8(b *testing.B) {
var x uintptr
for i := 0; i < b.N; i++ {
p := new(struct {
p [8 / unsafe.Sizeof(uintptr(0))]*int
})
x ^= uintptr(unsafe.Pointer(p))
}
mallocSink = x
}
func BenchmarkMallocTypeInfo16(b *testing.B) {
var x uintptr
for i := 0; i < b.N; i++ {
p := new(struct {
p [16 / unsafe.Sizeof(uintptr(0))]*int
})
x ^= uintptr(unsafe.Pointer(p))
}
mallocSink = x
}
runtime: do not scan stack by frames during garbage collection Walking the stack by frames is ~3x more expensive than not, and since it didn't end up being precise, there is not enough benefit to outweigh the cost. This is the conservative choice: this CL makes the stack scanning behavior the same as it was in Go 1.1. Add benchmarks to package runtime so that we have them when we re-enable this feature during the Go 1.3 development. benchmark old ns/op new ns/op delta BenchmarkGoroutineSelect 3194909 1272092 -60.18% BenchmarkGoroutineBlocking 3120282 866366 -72.23% BenchmarkGoroutineForRange 3256179 939902 -71.13% BenchmarkGoroutineIdle 2005571 482982 -75.92% The Go 1 benchmarks, just to add more data. As far as I can tell the changes are mainly noise. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4409403046 4414734932 +0.12% BenchmarkFannkuch11 3407708965 3378306120 -0.86% BenchmarkFmtFprintfEmpty 100 99 -0.60% BenchmarkFmtFprintfString 242 239 -1.24% BenchmarkFmtFprintfInt 204 206 +0.98% BenchmarkFmtFprintfIntInt 320 316 -1.25% BenchmarkFmtFprintfPrefixedInt 295 299 +1.36% BenchmarkFmtFprintfFloat 442 435 -1.58% BenchmarkFmtManyArgs 1246 1216 -2.41% BenchmarkGobDecode 10186951 10051210 -1.33% BenchmarkGobEncode 16504381 16445650 -0.36% BenchmarkGzip 447030885 447056865 +0.01% BenchmarkGunzip 111056154 111696305 +0.58% BenchmarkHTTPClientServer 89973 93040 +3.41% BenchmarkJSONEncode 28174182 27933893 -0.85% BenchmarkJSONDecode 106353777 110443817 +3.85% BenchmarkMandelbrot200 4822289 4806083 -0.34% BenchmarkGoParse 6102436 6142734 +0.66% BenchmarkRegexpMatchEasy0_32 133 132 -0.75% BenchmarkRegexpMatchEasy0_1K 372 373 +0.27% BenchmarkRegexpMatchEasy1_32 113 111 -1.77% BenchmarkRegexpMatchEasy1_1K 964 940 -2.49% BenchmarkRegexpMatchMedium_32 202 205 +1.49% BenchmarkRegexpMatchMedium_1K 68862 68858 -0.01% BenchmarkRegexpMatchHard_32 3480 3407 -2.10% BenchmarkRegexpMatchHard_1K 108255 112614 +4.03% BenchmarkRevcomp 751393035 743929976 -0.99% BenchmarkTemplate 139637041 135402220 -3.03% BenchmarkTimeParse 479 475 -0.84% BenchmarkTimeFormat 460 466 +1.30% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 75.34 76.36 1.01x BenchmarkGobEncode 46.50 46.67 1.00x BenchmarkGzip 43.41 43.41 1.00x BenchmarkGunzip 174.73 173.73 0.99x BenchmarkJSONEncode 68.87 69.47 1.01x BenchmarkJSONDecode 18.25 17.57 0.96x BenchmarkGoParse 9.49 9.43 0.99x BenchmarkRegexpMatchEasy0_32 239.58 241.74 1.01x BenchmarkRegexpMatchEasy0_1K 2749.74 2738.00 1.00x BenchmarkRegexpMatchEasy1_32 282.49 286.32 1.01x BenchmarkRegexpMatchEasy1_1K 1062.00 1088.96 1.03x BenchmarkRegexpMatchMedium_32 4.93 4.86 0.99x BenchmarkRegexpMatchMedium_1K 14.87 14.87 1.00x BenchmarkRegexpMatchHard_32 9.19 9.39 1.02x BenchmarkRegexpMatchHard_1K 9.46 9.09 0.96x BenchmarkRevcomp 338.26 341.65 1.01x BenchmarkTemplate 13.90 14.33 1.03x Fixes #6482. R=golang-dev, dave, r CC=golang-dev https://golang.org/cl/14257043
2013-10-02 11:59:53 -04:00
type LargeStruct struct {
x [16][]byte
}
func BenchmarkMallocLargeStruct(b *testing.B) {
var x uintptr
for i := 0; i < b.N; i++ {
p := make([]LargeStruct, 2)
x ^= uintptr(unsafe.Pointer(&p[0]))
}
mallocSink = x
}
runtime: do not scan stack by frames during garbage collection Walking the stack by frames is ~3x more expensive than not, and since it didn't end up being precise, there is not enough benefit to outweigh the cost. This is the conservative choice: this CL makes the stack scanning behavior the same as it was in Go 1.1. Add benchmarks to package runtime so that we have them when we re-enable this feature during the Go 1.3 development. benchmark old ns/op new ns/op delta BenchmarkGoroutineSelect 3194909 1272092 -60.18% BenchmarkGoroutineBlocking 3120282 866366 -72.23% BenchmarkGoroutineForRange 3256179 939902 -71.13% BenchmarkGoroutineIdle 2005571 482982 -75.92% The Go 1 benchmarks, just to add more data. As far as I can tell the changes are mainly noise. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4409403046 4414734932 +0.12% BenchmarkFannkuch11 3407708965 3378306120 -0.86% BenchmarkFmtFprintfEmpty 100 99 -0.60% BenchmarkFmtFprintfString 242 239 -1.24% BenchmarkFmtFprintfInt 204 206 +0.98% BenchmarkFmtFprintfIntInt 320 316 -1.25% BenchmarkFmtFprintfPrefixedInt 295 299 +1.36% BenchmarkFmtFprintfFloat 442 435 -1.58% BenchmarkFmtManyArgs 1246 1216 -2.41% BenchmarkGobDecode 10186951 10051210 -1.33% BenchmarkGobEncode 16504381 16445650 -0.36% BenchmarkGzip 447030885 447056865 +0.01% BenchmarkGunzip 111056154 111696305 +0.58% BenchmarkHTTPClientServer 89973 93040 +3.41% BenchmarkJSONEncode 28174182 27933893 -0.85% BenchmarkJSONDecode 106353777 110443817 +3.85% BenchmarkMandelbrot200 4822289 4806083 -0.34% BenchmarkGoParse 6102436 6142734 +0.66% BenchmarkRegexpMatchEasy0_32 133 132 -0.75% BenchmarkRegexpMatchEasy0_1K 372 373 +0.27% BenchmarkRegexpMatchEasy1_32 113 111 -1.77% BenchmarkRegexpMatchEasy1_1K 964 940 -2.49% BenchmarkRegexpMatchMedium_32 202 205 +1.49% BenchmarkRegexpMatchMedium_1K 68862 68858 -0.01% BenchmarkRegexpMatchHard_32 3480 3407 -2.10% BenchmarkRegexpMatchHard_1K 108255 112614 +4.03% BenchmarkRevcomp 751393035 743929976 -0.99% BenchmarkTemplate 139637041 135402220 -3.03% BenchmarkTimeParse 479 475 -0.84% BenchmarkTimeFormat 460 466 +1.30% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 75.34 76.36 1.01x BenchmarkGobEncode 46.50 46.67 1.00x BenchmarkGzip 43.41 43.41 1.00x BenchmarkGunzip 174.73 173.73 0.99x BenchmarkJSONEncode 68.87 69.47 1.01x BenchmarkJSONDecode 18.25 17.57 0.96x BenchmarkGoParse 9.49 9.43 0.99x BenchmarkRegexpMatchEasy0_32 239.58 241.74 1.01x BenchmarkRegexpMatchEasy0_1K 2749.74 2738.00 1.00x BenchmarkRegexpMatchEasy1_32 282.49 286.32 1.01x BenchmarkRegexpMatchEasy1_1K 1062.00 1088.96 1.03x BenchmarkRegexpMatchMedium_32 4.93 4.86 0.99x BenchmarkRegexpMatchMedium_1K 14.87 14.87 1.00x BenchmarkRegexpMatchHard_32 9.19 9.39 1.02x BenchmarkRegexpMatchHard_1K 9.46 9.09 0.96x BenchmarkRevcomp 338.26 341.65 1.01x BenchmarkTemplate 13.90 14.33 1.03x Fixes #6482. R=golang-dev, dave, r CC=golang-dev https://golang.org/cl/14257043
2013-10-02 11:59:53 -04:00
var n = flag.Int("n", 1000, "number of goroutines")
func BenchmarkGoroutineSelect(b *testing.B) {
quit := make(chan struct{})
read := func(ch chan struct{}) {
for {
select {
case _, ok := <-ch:
if !ok {
return
}
case <-quit:
return
}
}
}
benchHelper(b, *n, read)
}
func BenchmarkGoroutineBlocking(b *testing.B) {
read := func(ch chan struct{}) {
for {
if _, ok := <-ch; !ok {
return
}
}
}
benchHelper(b, *n, read)
}
func BenchmarkGoroutineForRange(b *testing.B) {
read := func(ch chan struct{}) {
for range ch {
runtime: do not scan stack by frames during garbage collection Walking the stack by frames is ~3x more expensive than not, and since it didn't end up being precise, there is not enough benefit to outweigh the cost. This is the conservative choice: this CL makes the stack scanning behavior the same as it was in Go 1.1. Add benchmarks to package runtime so that we have them when we re-enable this feature during the Go 1.3 development. benchmark old ns/op new ns/op delta BenchmarkGoroutineSelect 3194909 1272092 -60.18% BenchmarkGoroutineBlocking 3120282 866366 -72.23% BenchmarkGoroutineForRange 3256179 939902 -71.13% BenchmarkGoroutineIdle 2005571 482982 -75.92% The Go 1 benchmarks, just to add more data. As far as I can tell the changes are mainly noise. benchmark old ns/op new ns/op delta BenchmarkBinaryTree17 4409403046 4414734932 +0.12% BenchmarkFannkuch11 3407708965 3378306120 -0.86% BenchmarkFmtFprintfEmpty 100 99 -0.60% BenchmarkFmtFprintfString 242 239 -1.24% BenchmarkFmtFprintfInt 204 206 +0.98% BenchmarkFmtFprintfIntInt 320 316 -1.25% BenchmarkFmtFprintfPrefixedInt 295 299 +1.36% BenchmarkFmtFprintfFloat 442 435 -1.58% BenchmarkFmtManyArgs 1246 1216 -2.41% BenchmarkGobDecode 10186951 10051210 -1.33% BenchmarkGobEncode 16504381 16445650 -0.36% BenchmarkGzip 447030885 447056865 +0.01% BenchmarkGunzip 111056154 111696305 +0.58% BenchmarkHTTPClientServer 89973 93040 +3.41% BenchmarkJSONEncode 28174182 27933893 -0.85% BenchmarkJSONDecode 106353777 110443817 +3.85% BenchmarkMandelbrot200 4822289 4806083 -0.34% BenchmarkGoParse 6102436 6142734 +0.66% BenchmarkRegexpMatchEasy0_32 133 132 -0.75% BenchmarkRegexpMatchEasy0_1K 372 373 +0.27% BenchmarkRegexpMatchEasy1_32 113 111 -1.77% BenchmarkRegexpMatchEasy1_1K 964 940 -2.49% BenchmarkRegexpMatchMedium_32 202 205 +1.49% BenchmarkRegexpMatchMedium_1K 68862 68858 -0.01% BenchmarkRegexpMatchHard_32 3480 3407 -2.10% BenchmarkRegexpMatchHard_1K 108255 112614 +4.03% BenchmarkRevcomp 751393035 743929976 -0.99% BenchmarkTemplate 139637041 135402220 -3.03% BenchmarkTimeParse 479 475 -0.84% BenchmarkTimeFormat 460 466 +1.30% benchmark old MB/s new MB/s speedup BenchmarkGobDecode 75.34 76.36 1.01x BenchmarkGobEncode 46.50 46.67 1.00x BenchmarkGzip 43.41 43.41 1.00x BenchmarkGunzip 174.73 173.73 0.99x BenchmarkJSONEncode 68.87 69.47 1.01x BenchmarkJSONDecode 18.25 17.57 0.96x BenchmarkGoParse 9.49 9.43 0.99x BenchmarkRegexpMatchEasy0_32 239.58 241.74 1.01x BenchmarkRegexpMatchEasy0_1K 2749.74 2738.00 1.00x BenchmarkRegexpMatchEasy1_32 282.49 286.32 1.01x BenchmarkRegexpMatchEasy1_1K 1062.00 1088.96 1.03x BenchmarkRegexpMatchMedium_32 4.93 4.86 0.99x BenchmarkRegexpMatchMedium_1K 14.87 14.87 1.00x BenchmarkRegexpMatchHard_32 9.19 9.39 1.02x BenchmarkRegexpMatchHard_1K 9.46 9.09 0.96x BenchmarkRevcomp 338.26 341.65 1.01x BenchmarkTemplate 13.90 14.33 1.03x Fixes #6482. R=golang-dev, dave, r CC=golang-dev https://golang.org/cl/14257043
2013-10-02 11:59:53 -04:00
}
}
benchHelper(b, *n, read)
}
func benchHelper(b *testing.B, n int, read func(chan struct{})) {
m := make([]chan struct{}, n)
for i := range m {
m[i] = make(chan struct{}, 1)
go read(m[i])
}
b.StopTimer()
b.ResetTimer()
GC()
for i := 0; i < b.N; i++ {
for _, ch := range m {
if ch != nil {
ch <- struct{}{}
}
}
time.Sleep(10 * time.Millisecond)
b.StartTimer()
GC()
b.StopTimer()
}
for _, ch := range m {
close(ch)
}
time.Sleep(10 * time.Millisecond)
}
func BenchmarkGoroutineIdle(b *testing.B) {
quit := make(chan struct{})
fn := func() {
<-quit
}
for i := 0; i < *n; i++ {
go fn()
}
GC()
b.ResetTimer()
for i := 0; i < b.N; i++ {
GC()
}
b.StopTimer()
close(quit)
time.Sleep(10 * time.Millisecond)
}