go/src/cmd/compile/internal/gc/main.go

1547 lines
44 KiB
Go
Raw Normal View History

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:generate go run mkbuiltin.go
package gc
import (
"bufio"
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
"bytes"
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
"cmd/compile/internal/logopt"
"cmd/compile/internal/ssa"
"cmd/compile/internal/types"
"cmd/internal/bio"
"cmd/internal/dwarf"
"cmd/internal/obj"
"cmd/internal/objabi"
"cmd/internal/src"
"cmd/internal/sys"
"flag"
"fmt"
"internal/goversion"
"io"
"io/ioutil"
"log"
"os"
"path"
"regexp"
"runtime"
"sort"
"strconv"
"strings"
)
var imported_unsafe bool
var (
buildid string
)
var (
cmd/compile: don't update outer variables after capturevars is complete When compiling concurrently, we walk all functions before compiling any of them. Walking functions can cause variables to switch from being non-addrtaken to addrtaken, e.g. to prepare for a runtime call. Typechecking propagates addrtaken-ness of closure variables to their outer variables, so that capturevars can decide whether to pass the variable's value or a pointer to it. When all functions are compiled immediately, as long as the containing function is compiled prior to the closure, this propagation has no effect. When compilation is deferred, though, in rare cases, this results in a change in the addrtaken-ness of a variable in the outer function, which in turn changes the compiler's output. (This is rare because in a great many cases, a temporary has been introduced, insulating the outer variable from modification.) But concurrent compilation must generate identical results. To fix this, track whether capturevars has run. If it has, there is no need to update outer variables when closure variables change. Capturevars always runs before any functions are walked or compiled. The remainder of the changes in this CL are to support the test. In particular, -d=compilelater forces the compiler to walk all functions before compiling any of them, despite being non-concurrent. This is useful because -live is fundamentally incompatible with concurrent compilation, but we want -c=1 to have no behavior changes. Fixes #20250 Change-Id: I89bcb54268a41e8588af1ac8cc37fbef856a90c2 Reviewed-on: https://go-review.googlesource.com/42853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-05-05 15:22:59 -07:00
Debug_append int
Debug_checkptr int
cmd/compile: don't update outer variables after capturevars is complete When compiling concurrently, we walk all functions before compiling any of them. Walking functions can cause variables to switch from being non-addrtaken to addrtaken, e.g. to prepare for a runtime call. Typechecking propagates addrtaken-ness of closure variables to their outer variables, so that capturevars can decide whether to pass the variable's value or a pointer to it. When all functions are compiled immediately, as long as the containing function is compiled prior to the closure, this propagation has no effect. When compilation is deferred, though, in rare cases, this results in a change in the addrtaken-ness of a variable in the outer function, which in turn changes the compiler's output. (This is rare because in a great many cases, a temporary has been introduced, insulating the outer variable from modification.) But concurrent compilation must generate identical results. To fix this, track whether capturevars has run. If it has, there is no need to update outer variables when closure variables change. Capturevars always runs before any functions are walked or compiled. The remainder of the changes in this CL are to support the test. In particular, -d=compilelater forces the compiler to walk all functions before compiling any of them, despite being non-concurrent. This is useful because -live is fundamentally incompatible with concurrent compilation, but we want -c=1 to have no behavior changes. Fixes #20250 Change-Id: I89bcb54268a41e8588af1ac8cc37fbef856a90c2 Reviewed-on: https://go-review.googlesource.com/42853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-05-05 15:22:59 -07:00
Debug_closure int
Debug_compilelater int
debug_dclstack int
Debug_libfuzzer int
cmd/compile: don't update outer variables after capturevars is complete When compiling concurrently, we walk all functions before compiling any of them. Walking functions can cause variables to switch from being non-addrtaken to addrtaken, e.g. to prepare for a runtime call. Typechecking propagates addrtaken-ness of closure variables to their outer variables, so that capturevars can decide whether to pass the variable's value or a pointer to it. When all functions are compiled immediately, as long as the containing function is compiled prior to the closure, this propagation has no effect. When compilation is deferred, though, in rare cases, this results in a change in the addrtaken-ness of a variable in the outer function, which in turn changes the compiler's output. (This is rare because in a great many cases, a temporary has been introduced, insulating the outer variable from modification.) But concurrent compilation must generate identical results. To fix this, track whether capturevars has run. If it has, there is no need to update outer variables when closure variables change. Capturevars always runs before any functions are walked or compiled. The remainder of the changes in this CL are to support the test. In particular, -d=compilelater forces the compiler to walk all functions before compiling any of them, despite being non-concurrent. This is useful because -live is fundamentally incompatible with concurrent compilation, but we want -c=1 to have no behavior changes. Fixes #20250 Change-Id: I89bcb54268a41e8588af1ac8cc37fbef856a90c2 Reviewed-on: https://go-review.googlesource.com/42853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-05-05 15:22:59 -07:00
Debug_panic int
Debug_slice int
Debug_vlog bool
Debug_wb int
Debug_pctab string
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
Debug_locationlist int
Debug_typecheckinl int
Debug_gendwarfinl int
Debug_softfloat int
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>
2019-06-24 12:59:22 -07:00
Debug_defer int
)
// Debug arguments.
// These can be specified with the -d flag, as in "-d nil"
// to set the debug_checknil variable.
// Multiple options can be comma-separated.
// Each option accepts an optional argument, as in "gcprog=2"
var debugtab = []struct {
name string
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
help string
val interface{} // must be *int or *string
}{
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
{"append", "print information about append compilation", &Debug_append},
{"checkptr", "instrument unsafe pointer conversions", &Debug_checkptr},
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
{"closure", "print information about closure compilation", &Debug_closure},
cmd/compile: don't update outer variables after capturevars is complete When compiling concurrently, we walk all functions before compiling any of them. Walking functions can cause variables to switch from being non-addrtaken to addrtaken, e.g. to prepare for a runtime call. Typechecking propagates addrtaken-ness of closure variables to their outer variables, so that capturevars can decide whether to pass the variable's value or a pointer to it. When all functions are compiled immediately, as long as the containing function is compiled prior to the closure, this propagation has no effect. When compilation is deferred, though, in rare cases, this results in a change in the addrtaken-ness of a variable in the outer function, which in turn changes the compiler's output. (This is rare because in a great many cases, a temporary has been introduced, insulating the outer variable from modification.) But concurrent compilation must generate identical results. To fix this, track whether capturevars has run. If it has, there is no need to update outer variables when closure variables change. Capturevars always runs before any functions are walked or compiled. The remainder of the changes in this CL are to support the test. In particular, -d=compilelater forces the compiler to walk all functions before compiling any of them, despite being non-concurrent. This is useful because -live is fundamentally incompatible with concurrent compilation, but we want -c=1 to have no behavior changes. Fixes #20250 Change-Id: I89bcb54268a41e8588af1ac8cc37fbef856a90c2 Reviewed-on: https://go-review.googlesource.com/42853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-05-05 15:22:59 -07:00
{"compilelater", "compile functions as late as possible", &Debug_compilelater},
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
{"disablenil", "disable nil checks", &disable_checknil},
{"dclstack", "run internal dclstack check", &debug_dclstack},
{"gcprog", "print dump of GC programs", &Debug_gcprog},
{"libfuzzer", "coverage instrumentation for libfuzzer", &Debug_libfuzzer},
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
{"nil", "print information about nil checks", &Debug_checknil},
{"panic", "do not hide any compiler panic", &Debug_panic},
{"slice", "print information about slice compilation", &Debug_slice},
{"typeassert", "print information about type assertion inlining", &Debug_typeassert},
{"wb", "print information about write barriers", &Debug_wb},
{"export", "print export data", &Debug_export},
{"pctab", "print named pc-value table", &Debug_pctab},
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
{"locationlists", "print information about DWARF location list creation", &Debug_locationlist},
{"typecheckinl", "eager typechecking of inline function bodies", &Debug_typecheckinl},
{"dwarfinl", "print information about DWARF inlined function creation", &Debug_gendwarfinl},
{"softfloat", "force compiler to emit soft-float code", &Debug_softfloat},
cmd/compile, cmd/link, runtime: make defers low-cost through inline code and extra funcdata Generate inline code at defer time to save the args of defer calls to unique (autotmp) stack slots, and generate inline code at exit time to check which defer calls were made and make the associated function/method/interface calls. We remember that a particular defer statement was reached by storing in the deferBits variable (always stored on the stack). At exit time, we check the bits of the deferBits variable to determine which defer function calls to make (in reverse order). These low-cost defers are only used for functions where no defers appear in loops. In addition, we don't do these low-cost defers if there are too many defer statements or too many exits in a function (to limit code increase). When a function uses open-coded defers, we produce extra FUNCDATA_OpenCodedDeferInfo information that specifies the number of defers, and for each defer, the stack slots where the closure and associated args have been stored. The funcdata also includes the location of the deferBits variable. Therefore, for panics, we can use this funcdata to determine exactly which defers are active, and call the appropriate functions/methods/closures with the correct arguments for each active defer. In order to unwind the stack correctly after a recover(), we need to add an extra code segment to functions with open-coded defers that simply calls deferreturn() and returns. This segment is not reachable by the normal function, but is returned to by the runtime during recovery. We set the liveness information of this deferreturn() to be the same as the liveness at the first function call during the last defer exit code (so all return values and all stack slots needed by the defer calls will be live). I needed to increase the stackguard constant from 880 to 896, because of a small amount of new code in deferreturn(). The -N flag disables open-coded defers. '-d defer' prints out the kind of defer being used at each defer statement (heap-allocated, stack-allocated, or open-coded). Cost of defer statement [ go test -run NONE -bench BenchmarkDefer$ runtime ] With normal (stack-allocated) defers only: 35.4 ns/op With open-coded defers: 5.6 ns/op Cost of function call alone (remove defer keyword): 4.4 ns/op Text size increase (including funcdata) for go binary without/with open-coded defers: 0.09% The average size increase (including funcdata) for only the functions that use open-coded defers is 1.1%. The cost of a panic followed by a recover got noticeably slower, since panic processing now requires a scan of the stack for open-coded defer frames. This scan is required, even if no frames are using open-coded defers: Cost of panic and recover [ go test -run NONE -bench BenchmarkPanicRecover runtime ] Without open-coded defers: 62.0 ns/op With open-coded defers: 255 ns/op A CGO Go-to-C-to-Go benchmark got noticeably faster because of open-coded defers: CGO Go-to-C-to-Go benchmark [cd misc/cgo/test; go test -run NONE -bench BenchmarkCGoCallback ] Without open-coded defers: 443 ns/op With open-coded defers: 347 ns/op Updates #14939 (defer performance) Updates #34481 (design doc) Change-Id: I63b1a60d1ebf28126f55ee9fd7ecffe9cb23d1ff Reviewed-on: https://go-review.googlesource.com/c/go/+/202340 Reviewed-by: Austin Clements <austin@google.com>
2019-06-24 12:59:22 -07:00
{"defer", "print information about defer compilation", &Debug_defer},
}
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
const debugHelpHeader = `usage: -d arg[,arg]* and arg is <key>[=<value>]
<key> is one of:
`
const debugHelpFooter = `
<value> is key-specific.
Key "checkptr" supports values:
"0": instrumentation disabled
"1": conversions involving unsafe.Pointer are instrumented
"2": conversions to unsafe.Pointer force heap allocation
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
Key "pctab" supports values:
"pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata"
`
func usage() {
fmt.Fprintf(os.Stderr, "usage: compile [options] file.go...\n")
objabi.Flagprint(os.Stderr)
Exit(2)
}
func hidePanic() {
if Debug_panic == 0 && nsavederrors+nerrors > 0 {
// If we've already complained about things
// in the program, don't bother complaining
// about a panic too; let the user clean up
// the code and try again.
if err := recover(); err != nil {
errorexit()
}
}
}
// supportsDynlink reports whether or not the code generator for the given
// architecture supports the -shared and -dynlink flags.
func supportsDynlink(arch *sys.Arch) bool {
return arch.InFamily(sys.AMD64, sys.ARM, sys.ARM64, sys.I386, sys.PPC64, sys.S390X)
}
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
// timing data for compiler phases
var timings Timings
var benchfile string
cmd/compile: improve coverage of nowritebarrierrec check The current go:nowritebarrierrec checker has two problems that limit its coverage: 1. It doesn't understand that systemstack calls its argument, which means there are several cases where we fail to detect prohibited write barriers. 2. It only observes calls in the AST, so calls constructed during lowering by SSA aren't followed. This CL completely rewrites this checker to address these issues. The current checker runs entirely after walk and uses visitBottomUp, which introduces several problems for checking across systemstack. First, visitBottomUp itself doesn't understand systemstack calls, so the callee may be ordered after the caller, causing the checker to fail to propagate constraints. Second, many systemstack calls are passed a closure, which is quite difficult to resolve back to the function definition after transformclosure and walk have run. Third, visitBottomUp works exclusively on the AST, so it can't observe calls created by SSA. To address these problems, this commit splits the check into two phases and rewrites it to use a call graph generated during SSA lowering. The first phase runs before transformclosure/walk and simply records systemstack arguments when they're easy to get. Then, it modifies genssa to record static call edges at the point where we're lowering to Progs (which is the latest point at which position information is conveniently available). Finally, the second phase runs after all functions have been lowered and uses a direct BFS walk of the call graph (combining systemstack calls with static calls) to find prohibited write barriers and construct nice error messages. Fixes #22384. For #22460. Change-Id: I39668f7f2366ab3c1ab1a71eaf25484d25349540 Reviewed-on: https://go-review.googlesource.com/72773 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-22 16:36:27 -04:00
var nowritebarrierrecCheck *nowritebarrierrecChecker
// Main parses flags and Go source files specified in the command-line
// arguments, type-checks the parsed Go package, compiles functions to machine
// code, and finally writes the compiled package definition to disk.
func Main(archInit func(*Arch)) {
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "init")
defer hidePanic()
archInit(&thearch)
Ctxt = obj.Linknew(thearch.LinkArch)
Ctxt.DiagFunc = yyerror
Ctxt.DiagFlush = flusherrors
Ctxt.Bso = bufio.NewWriter(os.Stdout)
// UseBASEntries is preferred because it shaves about 2% off build time, but LLDB, dsymutil, and dwarfdump
// on Darwin don't support it properly, especially since macOS 10.14 (Mojave). This is exposed as a flag
// to allow testing with LLVM tools on Linux, and to help with reporting this bug to the LLVM project.
// See bugs 31188 and 21945 (CLs 170638, 98075, 72371).
Ctxt.UseBASEntries = Ctxt.Headtype != objabi.Hdarwin
localpkg = types.NewPkg("", "")
localpkg.Prefix = "\"\""
// We won't know localpkg's height until after import
// processing. In the mean time, set to MaxPkgHeight to ensure
// height comparisons at least work until then.
localpkg.Height = types.MaxPkgHeight
// pseudo-package, for scoping
builtinpkg = types.NewPkg("go.builtin", "") // TODO(gri) name this package go.builtin?
builtinpkg.Prefix = "go.builtin" // not go%2ebuiltin
// pseudo-package, accessed by import "unsafe"
unsafepkg = types.NewPkg("unsafe", "unsafe")
// Pseudo-package that contains the compiler's builtin
// declarations for package runtime. These are declared in a
// separate package to avoid conflicts with package runtime's
// actual declarations, which may differ intentionally but
// insignificantly.
Runtimepkg = types.NewPkg("go.runtime", "runtime")
Runtimepkg.Prefix = "runtime"
// pseudo-packages used in symbol tables
itabpkg = types.NewPkg("go.itab", "go.itab")
itabpkg.Prefix = "go.itab" // not go%2eitab
itablinkpkg = types.NewPkg("go.itablink", "go.itablink")
itablinkpkg.Prefix = "go.itablink" // not go%2eitablink
trackpkg = types.NewPkg("go.track", "go.track")
trackpkg.Prefix = "go.track" // not go%2etrack
// pseudo-package used for map zero values
mappkg = types.NewPkg("go.map", "go.map")
mappkg.Prefix = "go.map"
// pseudo-package used for methods with anonymous receivers
gopkg = types.NewPkg("go", "")
Wasm := objabi.GOARCH == "wasm"
// Whether the limit for stack-allocated objects is much smaller than normal.
// This can be helpful for diagnosing certain causes of GC latency. See #27732.
smallFrames := false
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
jsonLogOpt := ""
flag.BoolVar(&compiling_runtime, "+", false, "compiling runtime")
flag.BoolVar(&compiling_std, "std", false, "compiling standard library")
objabi.Flagcount("%", "debug non-static initializers", &Debug['%'])
objabi.Flagcount("B", "disable bounds checking", &Debug['B'])
objabi.Flagcount("C", "disable printing of columns in error messages", &Debug['C']) // TODO(gri) remove eventually
flag.StringVar(&localimport, "D", "", "set relative `path` for local imports")
objabi.Flagcount("E", "debug symbol export", &Debug['E'])
objabi.Flagfn1("I", "add `directory` to import search path", addidir)
objabi.Flagcount("K", "debug missing line numbers", &Debug['K'])
objabi.Flagcount("L", "show full file names in error messages", &Debug['L'])
objabi.Flagcount("N", "disable optimizations", &Debug['N'])
objabi.Flagcount("S", "print assembly listing", &Debug['S'])
objabi.AddVersionFlag() // -V
objabi.Flagcount("W", "debug parse tree after type checking", &Debug['W'])
flag.StringVar(&asmhdr, "asmhdr", "", "write assembly header to `file`")
flag.StringVar(&buildid, "buildid", "", "record `id` as the build id in the export metadata")
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
flag.IntVar(&nBackendWorkers, "c", 1, "concurrency during compilation, 1 means no concurrency")
flag.BoolVar(&pure_go, "complete", false, "compiling complete package (no C or assembly)")
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
flag.StringVar(&debugstr, "d", "", "print debug information about items in `list`; try -d help")
flag.BoolVar(&flagDWARF, "dwarf", !Wasm, "generate DWARF symbols")
cmd/compile: turn on DWARF locations lists for ssa vars This changes the default setting for -dwarflocationlists from false to true, removes the flag from ssa/debug_test.go, and updates runtime/runtime-gdb_test.go to match a change in debugging output for composite variables. Current benchmarks (perflock, -count 10) benchstat -geomean before.log after.log name old time/op new time/op delta Template 175ms ± 0% 182ms ± 1% +3.68% (p=0.000 n=8+9) Unicode 82.0ms ± 2% 82.8ms ± 1% +0.96% (p=0.019 n=9+9) GoTypes 590ms ± 1% 611ms ± 1% +3.42% (p=0.000 n=9+10) Compiler 2.85s ± 0% 2.95s ± 1% +3.60% (p=0.000 n=9+10) SSA 6.42s ± 1% 6.70s ± 1% +4.31% (p=0.000 n=10+9) Flate 113ms ± 2% 117ms ± 1% +3.11% (p=0.000 n=10+9) GoParser 140ms ± 1% 145ms ± 1% +3.47% (p=0.000 n=10+9) Reflect 384ms ± 0% 398ms ± 1% +3.56% (p=0.000 n=8+9) Tar 165ms ± 1% 171ms ± 1% +3.33% (p=0.000 n=9+9) XML 207ms ± 2% 214ms ± 1% +3.41% (p=0.000 n=9+9) StdCmd 11.8s ± 2% 12.4s ± 2% +4.41% (p=0.000 n=10+9) [Geo mean] 489ms 506ms +3.38% name old user-ns/op new user-ns/op delta Template 247M ± 4% 254M ± 4% +2.76% (p=0.040 n=10+10) Unicode 118M ±16% 121M ±11% ~ (p=0.364 n=10+10) GoTypes 805M ± 2% 824M ± 2% +2.37% (p=0.003 n=9+8) Compiler 3.92G ± 2% 4.01G ± 2% +2.20% (p=0.001 n=9+9) SSA 9.63G ± 4% 10.00G ± 2% +3.81% (p=0.000 n=10+9) Flate 155M ±10% 154M ± 7% ~ (p=0.718 n=9+10) GoParser 184M ±11% 190M ± 7% ~ (p=0.220 n=10+9) Reflect 506M ± 4% 528M ± 2% +4.27% (p=0.000 n=10+10) Tar 224M ± 4% 227M ± 5% ~ (p=0.207 n=10+9) XML 272M ± 7% 286M ± 4% +5.23% (p=0.010 n=10+9) [Geo mean] 489M 502M +2.76% name old text-bytes new text-bytes delta HelloSize 672k ± 0% 672k ± 0% ~ (all equal) CmdGoSize 7.21M ± 0% 7.21M ± 0% ~ (all equal) [Geo mean] 2.20M 2.20M +0.00% name old data-bytes new data-bytes delta HelloSize 9.88k ± 0% 9.88k ± 0% ~ (all equal) CmdGoSize 248k ± 0% 248k ± 0% ~ (all equal) [Geo mean] 49.5k 49.5k +0.00% name old bss-bytes new bss-bytes delta HelloSize 125k ± 0% 125k ± 0% ~ (all equal) CmdGoSize 144k ± 0% 144k ± 0% ~ (all equal) [Geo mean] 135k 135k +0.00% name old exe-bytes new exe-bytes delta HelloSize 1.10M ± 0% 1.30M ± 0% +17.82% (p=0.000 n=10+10) CmdGoSize 11.6M ± 0% 13.5M ± 0% +16.90% (p=0.000 n=10+10) [Geo mean] 3.57M 4.19M +17.36% Change-Id: I250055813cadd25cebee8da1f9a7f995a6eae432 Reviewed-on: https://go-review.googlesource.com/100738 Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-14 16:54:33 -04:00
flag.BoolVar(&Ctxt.Flag_locationlists, "dwarflocationlists", true, "add location lists to DWARF in optimized mode")
flag.IntVar(&genDwarfInline, "gendwarfinl", 2, "generate DWARF inline info records")
objabi.Flagcount("e", "no limit on number of errors reported", &Debug['e'])
objabi.Flagcount("h", "halt on error", &Debug['h'])
objabi.Flagfn1("importmap", "add `definition` of the form source=actual to import map", addImportMap)
objabi.Flagfn1("importcfg", "read import configuration from `file`", readImportCfg)
flag.StringVar(&flag_installsuffix, "installsuffix", "", "set pkg directory `suffix`")
objabi.Flagcount("j", "debug runtime-initialized variables", &Debug['j'])
objabi.Flagcount("l", "disable inlining", &Debug['l'])
flag.StringVar(&flag_lang, "lang", "", "release to compile for")
cmd/compile: add -linkobj flag to allow writing object file in two parts This flag is experimental and the semantics may change even after Go 1.7 is released. There are no changes to code not using the flag. The first part is for reading by future compiles. The second part is for reading by the final link step. Splitting the file this way allows distributed build systems to ship the compile-input part only to compile steps and the linker-input part only to linker steps. The first part is basically just the export data, and the second part is basically everything else. The overall files still have the same broad structure, so that existing tools will work with both halves. It's just that various pieces are empty in the two halves. This also copies the two bits of data the linker needed from export data into the object header proper, so that the linker doesn't need any export data at all. That eliminates a TODO that was left for switching to the binary export data. (Now the linker doesn't need to know about the switch.) The default is still to write out a combined output file. Nothing changes unless you pass -linkobj to the compiler. There is no support in the go command for -linkobj, since the go command doesn't copy objects around. The expectation is that other build systems (like bazel, say) might take advantage of this. The header adjustment and the option for the split output was intended as part of the zip archives, but the zip archives have been cut from Go 1.7. Doing this to the current archives both unblocks one step in the switch to binary export data and enables alternate build systems to experiment with the new flag using the Go 1.7 release. Change-Id: I8b6eab25b8a22b0a266ba0ac6d31e594f3d117f3 Reviewed-on: https://go-review.googlesource.com/22500 Run-TryBot: Russ Cox <rsc@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Ian Lance Taylor <iant@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2016-04-26 21:50:59 -04:00
flag.StringVar(&linkobj, "linkobj", "", "write linker-specific object to `file`")
objabi.Flagcount("live", "debug liveness analysis", &debuglive)
objabi.Flagcount("m", "print optimization decisions", &Debug['m'])
if sys.MSanSupported(objabi.GOOS, objabi.GOARCH) {
flag.BoolVar(&flag_msan, "msan", false, "build code compatible with C/C++ memory sanitizer")
}
flag.BoolVar(&nolocalimports, "nolocalimports", false, "reject local (relative) imports")
flag.StringVar(&outfile, "o", "", "write output to `file`")
flag.StringVar(&myimportpath, "p", "", "set expected package import `path`")
flag.BoolVar(&writearchive, "pack", false, "write to file.a instead of file.o")
objabi.Flagcount("r", "debug generated wrappers", &Debug['r'])
if sys.RaceDetectorSupported(objabi.GOOS, objabi.GOARCH) {
flag.BoolVar(&flag_race, "race", false, "enable race detector")
}
if enableTrace {
flag.BoolVar(&trace, "t", false, "trace type-checking")
}
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
flag.StringVar(&pathPrefix, "trimpath", "", "remove `prefix` from recorded source file paths")
flag.BoolVar(&Debug_vlog, "v", false, "increase debug verbosity")
objabi.Flagcount("w", "debug type checking", &Debug['w'])
flag.BoolVar(&use_writebarrier, "wb", true, "enable write barrier")
var flag_shared bool
var flag_dynlink bool
if supportsDynlink(thearch.LinkArch.Arch) {
flag.BoolVar(&flag_shared, "shared", false, "generate code that can be linked into a shared library")
flag.BoolVar(&flag_dynlink, "dynlink", false, "support references to Go symbols defined in other shared libraries")
[dev.link] cmd: reference symbols by name when linking against Go shared library When building a program that links against Go shared libraries, it needs to reference symbols defined in the shared library. At compile time, we don't know where the shared library boundary is. If we reference a symbol in package p by index, and package p is actually part of a shared library, we cannot resolve the index at link time, as the linker doesn't see the object file of p. So when linking against Go shared libraries, always use named reference for now. To do this, the compiler needs to know whether we will be linking against Go shared libraries. The -dynlink flag kind of indicates that (as the document says), but currently it is actually overloaded: it is also used when building a plugin or a shared library, which is self-contained (if -linkshared is not otherwise specified) and could use index for symbol reference. So we introduce another compiler flag, -linkshared, specifically for linking against Go shared libraries. The go command will pass this flag if its -linkshared flag is specified ("go build -linkshared"). There may be better way to handle this. For example, we can put the symbol indices in a special section in the shared library that the linker can read. Or we can generate some per-package description file to include the indices. (Currently we generate a .shlibname file for each package that is included in a shared library, which contains the path of the library. We could consider extending this.) That said, this CL is a stop-gap solution. And it is no worse than the old object files. If we were to redesign the build system so that the shared library boundary is known at compile time, we could use indices for symbol references that do not cross shared library boundary, as well as doing other things better. Change-Id: I9c02aad36518051cc4785dbe25c4b4cef8f3faeb Reviewed-on: https://go-review.googlesource.com/c/go/+/201818 Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>
2019-10-16 21:42:18 -04:00
flag.BoolVar(&Ctxt.Flag_linkshared, "linkshared", false, "generate code that will be linked against Go shared libraries")
}
flag.StringVar(&cpuprofile, "cpuprofile", "", "write cpu profile to `file`")
flag.StringVar(&memprofile, "memprofile", "", "write memory profile to `file`")
flag.Int64Var(&memprofilerate, "memprofilerate", 0, "set runtime.MemProfileRate to `rate`")
var goversion string
flag.StringVar(&goversion, "goversion", "", "required version of the runtime")
var symabisPath string
flag.StringVar(&symabisPath, "symabis", "", "read symbol ABIs from `file`")
flag.StringVar(&traceprofile, "traceprofile", "", "write an execution trace to `file`")
flag.StringVar(&blockprofile, "blockprofile", "", "write block profile to `file`")
flag.StringVar(&mutexprofile, "mutexprofile", "", "write mutex profile to `file`")
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
flag.StringVar(&benchfile, "bench", "", "append benchmark times to `file`")
flag.BoolVar(&smallFrames, "smallframes", false, "reduce the size limit for stack allocated objects")
flag.BoolVar(&Ctxt.UseBASEntries, "dwarfbasentries", Ctxt.UseBASEntries, "use base address selection entries in DWARF")
flag.BoolVar(&Ctxt.Flag_newobj, "newobj", false, "use new object file format")
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
flag.StringVar(&jsonLogOpt, "json", "", "version,destination for JSON compiler/optimizer logging")
objabi.Flagparse(usage)
// Record flags that affect the build result. (And don't
// record flags that don't, since that would cause spurious
// changes in the binary.)
recordFlags("B", "N", "l", "msan", "race", "shared", "dynlink", "dwarflocationlists", "dwarfbasentries", "smallframes", "newobj")
if smallFrames {
maxStackVarSize = 128 * 1024
maxImplicitStackVarSize = 16 * 1024
}
Ctxt.Flag_shared = flag_dynlink || flag_shared
Ctxt.Flag_dynlink = flag_dynlink
Ctxt.Flag_optimize = Debug['N'] == 0
Ctxt.Debugasm = Debug['S']
Ctxt.Debugvlog = Debug_vlog
cmd/compile: add flag to disable DWARF generation DWARF generation has non-trivial cost, and in some cases is not necessary. Provide an option to opt out. Alloc impact of disabling DWARF generation: name old alloc/op new alloc/op delta Template 38.7MB ± 0% 37.6MB ± 0% -2.77% (p=0.016 n=5+4) Unicode 29.8MB ± 0% 29.8MB ± 0% -0.16% (p=0.032 n=5+5) GoTypes 113MB ± 0% 110MB ± 0% -2.38% (p=0.008 n=5+5) Compiler 463MB ± 0% 457MB ± 0% -1.34% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.23GB ± 0% -1.64% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.0MB ± 0% -1.05% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 30.9MB ± 0% -2.74% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 76.7MB ± 0% -1.90% (p=0.008 n=5+5) Tar 26.5MB ± 0% 26.0MB ± 0% -2.04% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.1MB ± 0% -2.86% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 377k ± 0% 360k ± 1% -4.46% (p=0.008 n=5+5) Unicode 321k ± 0% 320k ± 0% ~ (p=0.151 n=5+5) GoTypes 1.14M ± 0% 1.10M ± 0% -4.13% (p=0.008 n=5+5) Compiler 4.26M ± 0% 4.13M ± 0% -3.14% (p=0.008 n=5+5) SSA 9.70M ± 0% 9.33M ± 0% -3.89% (p=0.008 n=5+5) Flate 233k ± 0% 228k ± 0% -2.40% (p=0.008 n=5+5) GoParser 316k ± 0% 302k ± 0% -4.48% (p=0.008 n=5+5) Reflect 980k ± 0% 945k ± 0% -3.62% (p=0.008 n=5+5) Tar 249k ± 0% 241k ± 0% -3.19% (p=0.008 n=5+5) XML 391k ± 0% 376k ± 0% -3.95% (p=0.008 n=5+5) Change-Id: I97dbfb6b40195d1e0b91be097a4bf0e7f65b26af Reviewed-on: https://go-review.googlesource.com/40857 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-16 05:53:06 -07:00
if flagDWARF {
Ctxt.DebugInfo = debuginfo
Ctxt.GenAbstractFunc = genAbstractFunc
Ctxt.DwFixups = obj.NewDwarfFixupTable(Ctxt)
} else {
// turn off inline generation if no dwarf at all
genDwarfInline = 0
Ctxt.Flag_locationlists = false
cmd/compile: add flag to disable DWARF generation DWARF generation has non-trivial cost, and in some cases is not necessary. Provide an option to opt out. Alloc impact of disabling DWARF generation: name old alloc/op new alloc/op delta Template 38.7MB ± 0% 37.6MB ± 0% -2.77% (p=0.016 n=5+4) Unicode 29.8MB ± 0% 29.8MB ± 0% -0.16% (p=0.032 n=5+5) GoTypes 113MB ± 0% 110MB ± 0% -2.38% (p=0.008 n=5+5) Compiler 463MB ± 0% 457MB ± 0% -1.34% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.23GB ± 0% -1.64% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.0MB ± 0% -1.05% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 30.9MB ± 0% -2.74% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 76.7MB ± 0% -1.90% (p=0.008 n=5+5) Tar 26.5MB ± 0% 26.0MB ± 0% -2.04% (p=0.008 n=5+5) XML 42.4MB ± 0% 41.1MB ± 0% -2.86% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 377k ± 0% 360k ± 1% -4.46% (p=0.008 n=5+5) Unicode 321k ± 0% 320k ± 0% ~ (p=0.151 n=5+5) GoTypes 1.14M ± 0% 1.10M ± 0% -4.13% (p=0.008 n=5+5) Compiler 4.26M ± 0% 4.13M ± 0% -3.14% (p=0.008 n=5+5) SSA 9.70M ± 0% 9.33M ± 0% -3.89% (p=0.008 n=5+5) Flate 233k ± 0% 228k ± 0% -2.40% (p=0.008 n=5+5) GoParser 316k ± 0% 302k ± 0% -4.48% (p=0.008 n=5+5) Reflect 980k ± 0% 945k ± 0% -3.62% (p=0.008 n=5+5) Tar 249k ± 0% 241k ± 0% -3.19% (p=0.008 n=5+5) XML 391k ± 0% 376k ± 0% -3.95% (p=0.008 n=5+5) Change-Id: I97dbfb6b40195d1e0b91be097a4bf0e7f65b26af Reviewed-on: https://go-review.googlesource.com/40857 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-16 05:53:06 -07:00
}
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
if flag.NArg() < 1 && debugstr != "help" && debugstr != "ssa/help" {
usage()
}
if goversion != "" && goversion != runtime.Version() {
fmt.Printf("compile: version %q does not match go tool version %q\n", runtime.Version(), goversion)
Exit(2)
}
checkLang()
if symabisPath != "" {
readSymABIs(symabisPath, myimportpath)
}
thearch.LinkArch.Init(Ctxt)
if outfile == "" {
p := flag.Arg(0)
if i := strings.LastIndex(p, "/"); i >= 0 {
p = p[i+1:]
}
if runtime.GOOS == "windows" {
if i := strings.LastIndex(p, `\`); i >= 0 {
p = p[i+1:]
}
}
if i := strings.LastIndex(p, "."); i >= 0 {
p = p[:i]
}
suffix := ".o"
if writearchive {
suffix = ".a"
}
outfile = p + suffix
}
startProfile()
if flag_race && flag_msan {
log.Fatal("cannot use both -race and -msan")
}
if (flag_race || flag_msan) && objabi.GOOS != "windows" {
// -race and -msan imply -d=checkptr for now (except on windows).
// TODO(mdempsky): Re-evaluate before Go 1.14. See #34964.
Debug_checkptr = 1
}
if ispkgin(omit_pkgs) {
flag_race = false
flag_msan = false
}
if flag_race {
racepkg = types.NewPkg("runtime/race", "")
}
if flag_msan {
msanpkg = types.NewPkg("runtime/msan", "")
}
if flag_race || flag_msan {
instrumenting = true
}
if compiling_runtime && Debug['N'] != 0 {
log.Fatal("cannot disable optimizations while compiling runtime")
}
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
if nBackendWorkers < 1 {
log.Fatalf("-c must be at least 1, got %d", nBackendWorkers)
}
if nBackendWorkers > 1 && !concurrentBackendAllowed() {
log.Fatalf("cannot use concurrent backend compilation with provided flags; invoked as %v", os.Args)
}
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if Ctxt.Flag_locationlists && len(Ctxt.Arch.DWARFRegisters) == 0 {
log.Fatalf("location lists requested but register mapping not available on %v", Ctxt.Arch.Name)
}
// parse -d argument
if debugstr != "" {
Split:
for _, name := range strings.Split(debugstr, ",") {
if name == "" {
continue
}
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
// display help about the -d option itself and quit
if name == "help" {
fmt.Print(debugHelpHeader)
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
maxLen := len("ssa/help")
for _, t := range debugtab {
if len(t.name) > maxLen {
maxLen = len(t.name)
}
}
for _, t := range debugtab {
fmt.Printf("\t%-*s\t%s\n", maxLen, t.name, t.help)
}
// ssa options have their own help
fmt.Printf("\t%-*s\t%s\n", maxLen, "ssa/help", "print help about SSA debugging")
fmt.Print(debugHelpFooter)
cmd/compile: provide a way to auto-discover -d debug keys Currently one needs to refer to the sources to have a list of accepted debug keys. We can copy what 'ssa/help' does and introspect the list of debug keys to print a more detailed help: $ go tool compile -d help usage: -d arg[,arg]* and arg is <key>[=<value>] <key> is one of: append print information about append compilation closure print information about closure compilation disablenil disable nil checks dclstack run internal dclstack check gcprog print dump of GC programs nil print information about nil checks panic do not hide any compiler panic slice print information about slice compilation typeassert print information about type assertion inlining wb print information about write barriers export print export data pctab print named pc-value table ssa/help print help about SSA debugging <value> is key-specific. Key "pctab" supports values: "pctospadj", "pctofile", "pctoline", "pctoinline", "pctopcdata" For '-d help' to be discoverable, a hint is given in the -d flag description. A last thing, today at least one go file needs to be provided to get to the code printing ssa/help. $ go tool compile -d ssa/help foo.go Add a check so one can just do '-d help' or '-d ssa/help' Caught by trybot: I needed to update fmt_test.go as I'm introducing the usage of %-*s in a format string. Fixes #20041 Change-Id: Ib2858b038c1bcbe644aa3b1a371009710c6d957d Reviewed-on: https://go-review.googlesource.com/41091 Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2017-04-19 19:24:27 +01:00
os.Exit(0)
}
val, valstring, haveInt := 1, "", true
if i := strings.IndexAny(name, "=:"); i >= 0 {
var err error
name, valstring = name[:i], name[i+1:]
val, err = strconv.Atoi(valstring)
if err != nil {
val, haveInt = 1, false
}
}
for _, t := range debugtab {
if t.name != name {
continue
}
switch vp := t.val.(type) {
case nil:
// Ignore
case *string:
*vp = valstring
case *int:
if !haveInt {
log.Fatalf("invalid debug value %v", name)
}
*vp = val
default:
panic("bad debugtab type")
}
continue Split
}
// special case for ssa for now
if strings.HasPrefix(name, "ssa/") {
// expect form ssa/phase/flag
// e.g. -d=ssa/generic_cse/time
// _ in phase name also matches space
phase := name[4:]
flag := "debug" // default flag is debug
if i := strings.Index(phase, "/"); i >= 0 {
flag = phase[i+1:]
phase = phase[:i]
}
err := ssa.PhaseOption(phase, flag, val, valstring)
if err != "" {
log.Fatalf(err)
}
continue Split
}
log.Fatalf("unknown debug key -d %s\n", name)
}
}
if compiling_runtime {
// Runtime can't use -d=checkptr, at least not yet.
Debug_checkptr = 0
// Fuzzing the runtime isn't interesting either.
Debug_libfuzzer = 0
}
// set via a -d flag
Ctxt.Debugpcln = Debug_pctab
if flagDWARF {
dwarf.EnableLogging(Debug_gendwarfinl != 0)
}
if Debug_softfloat != 0 {
thearch.SoftFloat = true
}
// enable inlining. for now:
// default: inlining on. (debug['l'] == 1)
// -l: inlining off (debug['l'] == 0)
// -l=2, -l=3: inlining on again, with extra debugging (debug['l'] > 1)
if Debug['l'] <= 1 {
Debug['l'] = 1 - Debug['l']
}
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
if jsonLogOpt != "" { // parse version,destination from json logging optimization.
logopt.LogJsonOption(jsonLogOpt)
}
ssaDump = os.Getenv("GOSSAFUNC")
if ssaDump != "" {
if strings.HasSuffix(ssaDump, "+") {
ssaDump = ssaDump[:len(ssaDump)-1]
ssaDumpStdout = true
}
spl := strings.Split(ssaDump, ":")
if len(spl) > 1 {
ssaDump = spl[0]
ssaDumpCFG = spl[1]
}
}
cmd/compile: enable scopes unconditionally This revives Alessandro Arzilli's CL to enable scopes whenever any dwarf is emitted (with optimization or not), adds a test that detects this changes and shows that it creates more truthful debugging output. Reverted change to ssa/debug_test tests made when scopes were disabled during dwarflocationlist development. Also included are updates to the Delve test output (it had fallen out of sync; creating test output for one updates it for all) and minor naming changes in ssa/debug_test. Compile-time/space changes (relative to tip including dwarflocationlists): benchstat -geomean after.log scopes.log name old time/op new time/op delta Template 182ms ± 1% 182ms ± 1% ~ (p=0.666 n=9+9) Unicode 82.8ms ± 1% 86.6ms ±14% ~ (p=0.211 n=9+10) GoTypes 611ms ± 1% 616ms ± 2% +0.97% (p=0.001 n=10+9) Compiler 2.95s ± 1% 2.95s ± 0% ~ (p=0.573 n=10+8) SSA 6.70s ± 1% 6.81s ± 1% +1.68% (p=0.000 n=9+10) Flate 117ms ± 1% 118ms ± 1% +0.60% (p=0.036 n=9+8) GoParser 145ms ± 1% 145ms ± 1% ~ (p=1.000 n=9+9) Reflect 398ms ± 1% 396ms ± 1% ~ (p=0.053 n=9+10) Tar 171ms ± 1% 171ms ± 1% ~ (p=0.356 n=9+10) XML 214ms ± 1% 214ms ± 1% ~ (p=0.605 n=9+9) StdCmd 12.4s ± 2% 12.4s ± 1% ~ (p=1.000 n=9+9) [Geo mean] 506ms 509ms +0.71% name old user-ns/op new user-ns/op delta Template 254M ± 4% 249M ± 6% ~ (p=0.155 n=10+10) Unicode 121M ±11% 124M ± 6% ~ (p=0.516 n=10+10) GoTypes 824M ± 2% 869M ± 5% +5.49% (p=0.001 n=8+10) Compiler 4.01G ± 2% 4.02G ± 1% ~ (p=0.561 n=9+9) SSA 10.0G ± 2% 10.2G ± 2% +2.29% (p=0.000 n=9+10) Flate 154M ± 7% 154M ± 7% ~ (p=0.960 n=10+9) GoParser 190M ± 7% 196M ± 6% ~ (p=0.064 n=9+10) Reflect 528M ± 2% 517M ± 3% -1.97% (p=0.025 n=10+10) Tar 227M ± 5% 232M ± 3% ~ (p=0.061 n=9+10) XML 286M ± 4% 283M ± 4% ~ (p=0.343 n=9+9) [Geo mean] 502M 508M +1.09% name old text-bytes new text-bytes delta HelloSize 672k ± 0% 672k ± 0% +0.01% (p=0.000 n=10+10) CmdGoSize 7.21M ± 0% 7.21M ± 0% -0.00% (p=0.000 n=10+10) [Geo mean] 2.20M 2.20M +0.00% name old data-bytes new data-bytes delta HelloSize 9.88k ± 0% 9.88k ± 0% ~ (all equal) CmdGoSize 248k ± 0% 248k ± 0% ~ (all equal) [Geo mean] 49.5k 49.5k +0.00% name old bss-bytes new bss-bytes delta HelloSize 125k ± 0% 125k ± 0% ~ (all equal) CmdGoSize 144k ± 0% 144k ± 0% -0.04% (p=0.000 n=10+10) [Geo mean] 135k 135k -0.02% name old exe-bytes new exe-bytes delta HelloSize 1.30M ± 0% 1.34M ± 0% +3.15% (p=0.000 n=10+10) CmdGoSize 13.5M ± 0% 13.9M ± 0% +2.70% (p=0.000 n=10+10) [Geo mean] 4.19M 4.31M +2.92% Change-Id: Id53b8d57bd00440142ccbd39b95710e14e083fb5 Reviewed-on: https://go-review.googlesource.com/101217 Reviewed-by: Heschi Kreinick <heschi@google.com>
2018-03-16 15:15:50 -04:00
trackScopes = flagDWARF
Widthptr = thearch.LinkArch.PtrSize
Widthreg = thearch.LinkArch.RegSize
// initialize types package
// (we need to do this to break dependencies that otherwise
// would lead to import cycles)
types.Widthptr = Widthptr
types.Dowidth = dowidth
types.Fatalf = Fatalf
types.Sconv = func(s *types.Sym, flag, mode int) string {
return sconv(s, FmtFlag(flag), fmtMode(mode))
}
types.Tconv = func(t *types.Type, flag, mode, depth int) string {
return tconv(t, FmtFlag(flag), fmtMode(mode), depth)
}
types.FormatSym = func(sym *types.Sym, s fmt.State, verb rune, mode int) {
symFormat(sym, s, verb, fmtMode(mode))
}
types.FormatType = func(t *types.Type, s fmt.State, verb rune, mode int) {
typeFormat(t, s, verb, fmtMode(mode))
}
types.TypeLinkSym = func(t *types.Type) *obj.LSym {
return typenamesym(t).Linksym()
}
types.FmtLeft = int(FmtLeft)
types.FmtUnsigned = int(FmtUnsigned)
types.FErr = FErr
types.Ctxt = Ctxt
initUniverse()
dclcontext = PEXTERN
nerrors = 0
autogeneratedPos = makePos(src.NewFileBase("<autogenerated>", "<autogenerated>"), 1, 0)
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "loadsys")
loadsys()
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "parse")
lines := parseFiles(flag.Args())
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Stop()
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
timings.AddEvent(int64(lines), "lines")
finishUniverse()
recordPackageName()
typecheckok = true
// Process top-level declarations in phases.
// Phase 1: const, type, and names and types of funcs.
// This will gather all the information about types
// and methods but doesn't depend on any of it.
cmd/compile: reintroduce work-around for cyclic alias declarations This change re-introduces (temporarily) a work-around for recursive alias type declarations, originally in https://golang.org/cl/35831/ (intended as fix for #18640). The work-around was removed later for a more comprehensive cycle detection check. That check contained a subtle error which made the code appear to work, while in fact creating incorrect types internally. See #25838 for details. By re-introducing the original work-around, we eliminate problems with many simple recursive type declarations involving aliases; specifically cases such as #27232 and #27267. However, the more general problem remains. This CL also fixes the subtle error (incorrect variable use when analyzing a type cycle) mentioned above and now issues a fatal error with a reference to the relevant issue (rather than crashing later during the compilation). While not great, this is better than the current status. The long-term solution will need to address these cycles (see #25838). As a consequence, several old test cases are not accepted anymore by the compiler since they happened to work accidentally only. This CL disables parts or all code of those test cases. The issues are: #18640, #23823, and #24939. One of the new test cases (fixedbugs/issue27232.go) exposed a go/types issue. The test case is excluded from the go/types test suite and an issue was filed (#28576). Updates #18640. Updates #23823. Updates #24939. Updates #25838. Updates #28576. Fixes #27232. Fixes #27267. Change-Id: I6c2d10da98bfc6f4f445c755fcaab17fc7b214c5 Reviewed-on: https://go-review.googlesource.com/c/147286 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-02 23:28:26 -07:00
//
// We also defer type alias declarations until phase 2
// to avoid cycles like #18640.
// TODO(gri) Remove this again once we have a fix for #25838.
// Don't use range--typecheck can add closures to xtop.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "typecheck", "top1")
for i := 0; i < len(xtop); i++ {
n := xtop[i]
cmd/compile: reintroduce work-around for cyclic alias declarations This change re-introduces (temporarily) a work-around for recursive alias type declarations, originally in https://golang.org/cl/35831/ (intended as fix for #18640). The work-around was removed later for a more comprehensive cycle detection check. That check contained a subtle error which made the code appear to work, while in fact creating incorrect types internally. See #25838 for details. By re-introducing the original work-around, we eliminate problems with many simple recursive type declarations involving aliases; specifically cases such as #27232 and #27267. However, the more general problem remains. This CL also fixes the subtle error (incorrect variable use when analyzing a type cycle) mentioned above and now issues a fatal error with a reference to the relevant issue (rather than crashing later during the compilation). While not great, this is better than the current status. The long-term solution will need to address these cycles (see #25838). As a consequence, several old test cases are not accepted anymore by the compiler since they happened to work accidentally only. This CL disables parts or all code of those test cases. The issues are: #18640, #23823, and #24939. One of the new test cases (fixedbugs/issue27232.go) exposed a go/types issue. The test case is excluded from the go/types test suite and an issue was filed (#28576). Updates #18640. Updates #23823. Updates #24939. Updates #25838. Updates #28576. Fixes #27232. Fixes #27267. Change-Id: I6c2d10da98bfc6f4f445c755fcaab17fc7b214c5 Reviewed-on: https://go-review.googlesource.com/c/147286 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-02 23:28:26 -07:00
if op := n.Op; op != ODCL && op != OAS && op != OAS2 && (op != ODCLTYPE || !n.Left.Name.Param.Alias) {
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
xtop[i] = typecheck(n, ctxStmt)
}
}
// Phase 2: Variable assignments.
// To check interface assignments, depends on phase 1.
// Don't use range--typecheck can add closures to xtop.
timings.Start("fe", "typecheck", "top2")
for i := 0; i < len(xtop); i++ {
n := xtop[i]
cmd/compile: reintroduce work-around for cyclic alias declarations This change re-introduces (temporarily) a work-around for recursive alias type declarations, originally in https://golang.org/cl/35831/ (intended as fix for #18640). The work-around was removed later for a more comprehensive cycle detection check. That check contained a subtle error which made the code appear to work, while in fact creating incorrect types internally. See #25838 for details. By re-introducing the original work-around, we eliminate problems with many simple recursive type declarations involving aliases; specifically cases such as #27232 and #27267. However, the more general problem remains. This CL also fixes the subtle error (incorrect variable use when analyzing a type cycle) mentioned above and now issues a fatal error with a reference to the relevant issue (rather than crashing later during the compilation). While not great, this is better than the current status. The long-term solution will need to address these cycles (see #25838). As a consequence, several old test cases are not accepted anymore by the compiler since they happened to work accidentally only. This CL disables parts or all code of those test cases. The issues are: #18640, #23823, and #24939. One of the new test cases (fixedbugs/issue27232.go) exposed a go/types issue. The test case is excluded from the go/types test suite and an issue was filed (#28576). Updates #18640. Updates #23823. Updates #24939. Updates #25838. Updates #28576. Fixes #27232. Fixes #27267. Change-Id: I6c2d10da98bfc6f4f445c755fcaab17fc7b214c5 Reviewed-on: https://go-review.googlesource.com/c/147286 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-02 23:28:26 -07:00
if op := n.Op; op == ODCL || op == OAS || op == OAS2 || op == ODCLTYPE && n.Left.Name.Param.Alias {
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
xtop[i] = typecheck(n, ctxStmt)
}
}
// Phase 3: Type check function bodies.
// Don't use range--typecheck can add closures to xtop.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "typecheck", "func")
var fcount int64
for i := 0; i < len(xtop); i++ {
n := xtop[i]
if op := n.Op; op == ODCLFUNC || op == OCLOSURE {
Curfn = n
decldepth = 1
saveerrors()
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
typecheckslice(Curfn.Nbody.Slice(), ctxStmt)
checkreturn(Curfn)
if nerrors != 0 {
Curfn.Nbody.Set(nil) // type errors; do not compile
}
cmd/compile: eliminate dead code in if statements after typechecking This is a more thorough and cleaner fix than doing dead code elimination separately during inlining, escape analysis, and export. Unfortunately, it does add another full walk of the AST. The performance impact is very small, but not non-zero. If a label or goto is present in the dead code, it is not eliminated. This restriction can be removed once label/goto checking occurs much earlier in the compiler. In practice, it probably doesn't matter much. Updates #19699 Fixes #19705 name old alloc/op new alloc/op delta Template 39.2MB ± 0% 39.3MB ± 0% +0.28% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 29.8MB ± 0% ~ (p=1.000 n=5+5) GoTypes 113MB ± 0% 113MB ± 0% -0.55% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.25GB ± 0% +0.02% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.3MB ± 0% -0.24% (p=0.032 n=5+5) GoParser 31.7MB ± 0% 31.8MB ± 0% +0.31% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.3MB ± 0% ~ (p=0.421 n=5+5) Tar 26.6MB ± 0% 26.7MB ± 0% +0.21% (p=0.008 n=5+5) XML 42.2MB ± 0% 42.2MB ± 0% ~ (p=0.056 n=5+5) name old allocs/op new allocs/op delta Template 385k ± 0% 387k ± 0% +0.51% (p=0.016 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=1.000 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=1.000 n=5+5) SSA 9.71M ± 0% 9.72M ± 0% +0.10% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 315k ± 0% 317k ± 0% +0.71% (p=0.008 n=5+5) Reflect 980k ± 0% 983k ± 0% +0.30% (p=0.032 n=5+5) Tar 251k ± 0% 252k ± 0% +0.55% (p=0.016 n=5+5) XML 392k ± 0% 393k ± 0% +0.30% (p=0.008 n=5+5) Change-Id: Ia10ff4bbf5c6eae782582cc9cbc9785494d4fb83 Reviewed-on: https://go-review.googlesource.com/38773 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-27 11:38:20 -07:00
// Now that we've checked whether n terminates,
// we can eliminate some obviously dead code.
deadcode(Curfn)
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
fcount++
}
}
// With all types checked, it's now safe to verify map keys. One single
// check past phase 9 isn't sufficient, as we may exit with other errors
// before then, thus skipping map key errors.
checkMapKeys()
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.AddEvent(fcount, "funcs")
if nsavederrors+nerrors != 0 {
errorexit()
}
// Phase 4: Decide how to capture closed variables.
// This needs to run before escape analysis,
// because variables captured by value do not escape.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "capturevars")
for _, n := range xtop {
if n.Op == ODCLFUNC && n.Func.Closure != nil {
Curfn = n
capturevars(n)
}
}
cmd/compile: don't update outer variables after capturevars is complete When compiling concurrently, we walk all functions before compiling any of them. Walking functions can cause variables to switch from being non-addrtaken to addrtaken, e.g. to prepare for a runtime call. Typechecking propagates addrtaken-ness of closure variables to their outer variables, so that capturevars can decide whether to pass the variable's value or a pointer to it. When all functions are compiled immediately, as long as the containing function is compiled prior to the closure, this propagation has no effect. When compilation is deferred, though, in rare cases, this results in a change in the addrtaken-ness of a variable in the outer function, which in turn changes the compiler's output. (This is rare because in a great many cases, a temporary has been introduced, insulating the outer variable from modification.) But concurrent compilation must generate identical results. To fix this, track whether capturevars has run. If it has, there is no need to update outer variables when closure variables change. Capturevars always runs before any functions are walked or compiled. The remainder of the changes in this CL are to support the test. In particular, -d=compilelater forces the compiler to walk all functions before compiling any of them, despite being non-concurrent. This is useful because -live is fundamentally incompatible with concurrent compilation, but we want -c=1 to have no behavior changes. Fixes #20250 Change-Id: I89bcb54268a41e8588af1ac8cc37fbef856a90c2 Reviewed-on: https://go-review.googlesource.com/42853 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2017-05-05 15:22:59 -07:00
capturevarscomplete = true
Curfn = nil
if nsavederrors+nerrors != 0 {
errorexit()
}
// Phase 5: Inlining
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "inlining")
if Debug_typecheckinl != 0 {
// Typecheck imported function bodies if debug['l'] > 1,
// otherwise lazily when used or re-exported.
for _, n := range importlist {
if n.Func.Inl != nil {
saveerrors()
typecheckinl(n)
}
}
if nsavederrors+nerrors != 0 {
errorexit()
}
}
if Debug['l'] != 0 {
// Find functions that can be inlined and clone them before walk expands them.
visitBottomUp(xtop, func(list []*Node, recursive bool) {
for _, n := range list {
if !recursive {
caninl(n)
} else {
if Debug['m'] > 1 {
fmt.Printf("%v: cannot inline %v: recursive\n", n.Line(), n.Func.Nname)
}
}
inlcalls(n)
}
})
}
// Phase 6: Escape analysis.
// Required for moving heap allocations onto stack,
// which in turn is required by the closure implementation,
// which stores the addresses of stack variables into the closure.
// If the closure does not escape, it needs to be on the stack
// or else the stack copier will not update it.
// Large values are also moved off stack in escape analysis;
// because large values may contain pointers, it must happen early.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "escapes")
escapes(xtop)
// Collect information for go:nowritebarrierrec
// checking. This must happen before transformclosure.
// We'll do the final check after write barriers are
// inserted.
if compiling_runtime {
nowritebarrierrecCheck = newNowritebarrierrecChecker()
}
cmd/compile: improve coverage of nowritebarrierrec check The current go:nowritebarrierrec checker has two problems that limit its coverage: 1. It doesn't understand that systemstack calls its argument, which means there are several cases where we fail to detect prohibited write barriers. 2. It only observes calls in the AST, so calls constructed during lowering by SSA aren't followed. This CL completely rewrites this checker to address these issues. The current checker runs entirely after walk and uses visitBottomUp, which introduces several problems for checking across systemstack. First, visitBottomUp itself doesn't understand systemstack calls, so the callee may be ordered after the caller, causing the checker to fail to propagate constraints. Second, many systemstack calls are passed a closure, which is quite difficult to resolve back to the function definition after transformclosure and walk have run. Third, visitBottomUp works exclusively on the AST, so it can't observe calls created by SSA. To address these problems, this commit splits the check into two phases and rewrites it to use a call graph generated during SSA lowering. The first phase runs before transformclosure/walk and simply records systemstack arguments when they're easy to get. Then, it modifies genssa to record static call edges at the point where we're lowering to Progs (which is the latest point at which position information is conveniently available). Finally, the second phase runs after all functions have been lowered and uses a direct BFS walk of the call graph (combining systemstack calls with static calls) to find prohibited write barriers and construct nice error messages. Fixes #22384. For #22460. Change-Id: I39668f7f2366ab3c1ab1a71eaf25484d25349540 Reviewed-on: https://go-review.googlesource.com/72773 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-22 16:36:27 -04:00
// Phase 7: Transform closure bodies to properly reference captured variables.
// This needs to happen before walk, because closures must be transformed
// before walk reaches a call of a closure.
timings.Start("fe", "xclosures")
for _, n := range xtop {
if n.Op == ODCLFUNC && n.Func.Closure != nil {
Curfn = n
transformclosure(n)
}
}
// Prepare for SSA compilation.
// This must be before peekitabs, because peekitabs
// can trigger function compilation.
initssaconfig()
// Just before compilation, compile itabs found on
// the right side of OCONVIFACE so that methods
// can be de-virtualized during compilation.
Curfn = nil
peekitabs()
// Phase 8: Compile top level functions.
// Don't use range--walk can add functions to xtop.
timings.Start("be", "compilefuncs")
fcount = 0
for i := 0; i < len(xtop); i++ {
n := xtop[i]
if n.Op == ODCLFUNC {
funccompile(n)
fcount++
}
}
timings.AddEvent(fcount, "funcs")
if nsavederrors+nerrors == 0 {
fninit(xtop)
}
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
compileFunctions()
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
if nowritebarrierrecCheck != nil {
// Write barriers are now known. Check the
// call graph.
nowritebarrierrecCheck.check()
nowritebarrierrecCheck = nil
}
// Finalize DWARF inline routine DIEs, then explicitly turn off
// DWARF inlining gen so as to avoid problems with generated
// method wrappers.
if Ctxt.DwFixups != nil {
Ctxt.DwFixups.Finalize(myimportpath, Debug_gendwarfinl != 0)
Ctxt.DwFixups = nil
genDwarfInline = 0
}
// Phase 9: Check external declarations.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("be", "externaldcls")
for i, n := range externdcl {
if n.Op == ONAME {
cmd/compile: bulk rename This change does a bulk rename of several identifiers in the compiler. See #27167 and https://docs.google.com/document/d/19_ExiylD9MRfeAjKIfEsMU1_RGhuxB9sA0b5Zv7byVI/ for context and for discussion of these particular renames. Commands run to generate this change: gorename -from '"cmd/compile/internal/gc".OPROC' -to OGO gorename -from '"cmd/compile/internal/gc".OCOM' -to OBITNOT gorename -from '"cmd/compile/internal/gc".OMINUS' -to ONEG gorename -from '"cmd/compile/internal/gc".OIND' -to ODEREF gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTR' -to OBYTES2STR gorename -from '"cmd/compile/internal/gc".OARRAYBYTESTRTMP' -to OBYTES2STRTMP gorename -from '"cmd/compile/internal/gc".OARRAYRUNESTR' -to ORUNES2STR gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTE' -to OSTR2BYTES gorename -from '"cmd/compile/internal/gc".OSTRARRAYBYTETMP' -to OSTR2BYTESTMP gorename -from '"cmd/compile/internal/gc".OSTRARRAYRUNE' -to OSTR2RUNES gorename -from '"cmd/compile/internal/gc".Etop' -to ctxStmt gorename -from '"cmd/compile/internal/gc".Erv' -to ctxExpr gorename -from '"cmd/compile/internal/gc".Ecall' -to ctxCallee gorename -from '"cmd/compile/internal/gc".Efnstruct' -to ctxMultiOK gorename -from '"cmd/compile/internal/gc".Easgn' -to ctxAssign gorename -from '"cmd/compile/internal/gc".Ecomplit' -to ctxCompLit Not altered: parameters and local variables (mostly in typecheck.go) named top, which should probably now be called ctx (and which should probably have a named type). Also not altered: Field called Top in gc.Func. gorename -from '"cmd/compile/internal/gc".Node.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/gc".Node.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/gc".nodeIsddd' -to nodeIsDDD gorename -from '"cmd/compile/internal/types".Field.Isddd' -to IsDDD gorename -from '"cmd/compile/internal/types".Field.SetIsddd' -to SetIsDDD gorename -from '"cmd/compile/internal/types".fieldIsddd' -to fieldIsDDD Not altered: function gc.hasddd, params and local variables called isddd Also not altered: fmt.go prints nodes using "isddd(%v)". cd cmd/compile/internal/gc; go generate I then manually found impacted comments using exact string match and fixed them up by hand. The comment changes were trivial. Passes toolstash-check. Fixes #27167. If this experiment is deemed a success, we will open a new tracking issue for renames to do at the end of the 1.13 cycles. Change-Id: I2dc541533d2ab0d06cb3d31d65df205ecfb151e8 Reviewed-on: https://go-review.googlesource.com/c/150140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2018-11-18 08:34:38 -08:00
externdcl[i] = typecheck(externdcl[i], ctxExpr)
}
}
// Check the map keys again, since we typechecked the external
// declarations.
checkMapKeys()
if nerrors+nsavederrors != 0 {
errorexit()
}
// Write object data to disk.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("be", "dumpobj")
dumpdata()
Ctxt.NumberSyms(false)
dumpobj()
if asmhdr != "" {
dumpasmhdr()
}
// Check whether any of the functions we have compiled have gigantic stack frames.
sort.Slice(largeStackFrames, func(i, j int) bool {
return largeStackFrames[i].pos.Before(largeStackFrames[j].pos)
})
for _, large := range largeStackFrames {
if large.callee != 0 {
yyerrorl(large.pos, "stack frame too large (>1GB): %d MB locals + %d MB args + %d MB callee", large.locals>>20, large.args>>20, large.callee>>20)
} else {
yyerrorl(large.pos, "stack frame too large (>1GB): %d MB locals + %d MB args", large.locals>>20, large.args>>20)
}
}
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
if len(compilequeue) != 0 {
Fatalf("%d uncompiled functions", len(compilequeue))
}
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
logopt.FlushLoggedOpts(Ctxt, myimportpath)
if nerrors+nsavederrors != 0 {
errorexit()
}
flusherrors()
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Stop()
if benchfile != "" {
if err := writebench(benchfile); err != nil {
log.Fatalf("cannot write benchmark data: %v", err)
}
}
}
func writebench(filename string) error {
f, err := os.OpenFile(filename, os.O_WRONLY|os.O_CREATE|os.O_APPEND, 0666)
if err != nil {
return err
}
var buf bytes.Buffer
fmt.Fprintln(&buf, "commit:", objabi.Version)
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
fmt.Fprintln(&buf, "goos:", runtime.GOOS)
fmt.Fprintln(&buf, "goarch:", runtime.GOARCH)
timings.Write(&buf, "BenchmarkCompile:"+myimportpath+":")
n, err := f.Write(buf.Bytes())
if err != nil {
return err
}
if n != buf.Len() {
panic("bad writer")
}
return f.Close()
}
var (
importMap = map[string]string{}
packageFile map[string]string // nil means not in use
)
func addImportMap(s string) {
if strings.Count(s, "=") != 1 {
log.Fatal("-importmap argument must be of the form source=actual")
}
i := strings.Index(s, "=")
source, actual := s[:i], s[i+1:]
if source == "" || actual == "" {
log.Fatal("-importmap argument must be of the form source=actual; source and actual must be non-empty")
}
importMap[source] = actual
}
func readImportCfg(file string) {
packageFile = map[string]string{}
data, err := ioutil.ReadFile(file)
if err != nil {
log.Fatalf("-importcfg: %v", err)
}
for lineNum, line := range strings.Split(string(data), "\n") {
lineNum++ // 1-based
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
continue
}
var verb, args string
if i := strings.Index(line, " "); i < 0 {
verb = line
} else {
verb, args = line[:i], strings.TrimSpace(line[i+1:])
}
var before, after string
if i := strings.Index(args, "="); i >= 0 {
before, after = args[:i], args[i+1:]
}
switch verb {
default:
log.Fatalf("%s:%d: unknown directive %q", file, lineNum, verb)
case "importmap":
if before == "" || after == "" {
log.Fatalf(`%s:%d: invalid importmap: syntax is "importmap old=new"`, file, lineNum)
}
importMap[before] = after
case "packagefile":
if before == "" || after == "" {
log.Fatalf(`%s:%d: invalid packagefile: syntax is "packagefile path=filename"`, file, lineNum)
}
packageFile[before] = after
}
}
}
// symabiDefs and symabiRefs record the defined and referenced ABIs of
// symbols required by non-Go code. These are keyed by link symbol
// name, where the local package prefix is always `"".`
var symabiDefs, symabiRefs map[string]obj.ABI
// readSymABIs reads a symabis file that specifies definitions and
// references of text symbols by ABI.
//
// The symabis format is a set of lines, where each line is a sequence
// of whitespace-separated fields. The first field is a verb and is
// either "def" for defining a symbol ABI or "ref" for referencing a
// symbol using an ABI. For both "def" and "ref", the second field is
// the symbol name and the third field is the ABI name, as one of the
// named cmd/internal/obj.ABI constants.
func readSymABIs(file, myimportpath string) {
data, err := ioutil.ReadFile(file)
if err != nil {
log.Fatalf("-symabis: %v", err)
}
symabiDefs = make(map[string]obj.ABI)
symabiRefs = make(map[string]obj.ABI)
localPrefix := ""
if myimportpath != "" {
// Symbols in this package may be written either as
// "".X or with the package's import path already in
// the symbol.
localPrefix = objabi.PathToPrefix(myimportpath) + "."
}
for lineNum, line := range strings.Split(string(data), "\n") {
lineNum++ // 1-based
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
continue
}
parts := strings.Fields(line)
switch parts[0] {
case "def", "ref":
// Parse line.
if len(parts) != 3 {
log.Fatalf(`%s:%d: invalid symabi: syntax is "%s sym abi"`, file, lineNum, parts[0])
}
sym, abi := parts[1], parts[2]
if abi != "ABI0" { // Only supported external ABI right now
log.Fatalf(`%s:%d: invalid symabi: unknown abi "%s"`, file, lineNum, abi)
}
// If the symbol is already prefixed with
// myimportpath, rewrite it to start with ""
// so it matches the compiler's internal
// symbol names.
if localPrefix != "" && strings.HasPrefix(sym, localPrefix) {
sym = `"".` + sym[len(localPrefix):]
}
// Record for later.
if parts[0] == "def" {
symabiDefs[sym] = obj.ABI0
} else {
symabiRefs[sym] = obj.ABI0
}
default:
log.Fatalf(`%s:%d: invalid symabi type "%s"`, file, lineNum, parts[0])
}
}
}
func saveerrors() {
nsavederrors += nerrors
nerrors = 0
}
func arsize(b *bufio.Reader, name string) int {
var buf [ArhdrSize]byte
if _, err := io.ReadFull(b, buf[:]); err != nil {
return -1
}
aname := strings.Trim(string(buf[0:16]), " ")
if !strings.HasPrefix(aname, name) {
return -1
}
asize := strings.Trim(string(buf[48:58]), " ")
i, _ := strconv.Atoi(asize)
return i
}
var idirs []string
func addidir(dir string) {
if dir != "" {
idirs = append(idirs, dir)
}
}
func isDriveLetter(b byte) bool {
return 'a' <= b && b <= 'z' || 'A' <= b && b <= 'Z'
}
// is this path a local name? begins with ./ or ../ or /
func islocalname(name string) bool {
return strings.HasPrefix(name, "/") ||
runtime.GOOS == "windows" && len(name) >= 3 && isDriveLetter(name[0]) && name[1] == ':' && name[2] == '/' ||
strings.HasPrefix(name, "./") || name == "." ||
strings.HasPrefix(name, "../") || name == ".."
}
func findpkg(name string) (file string, ok bool) {
if islocalname(name) {
if nolocalimports {
return "", false
}
if packageFile != nil {
file, ok = packageFile[name]
return file, ok
}
// try .a before .6. important for building libraries:
// if there is an array.6 in the array.a library,
// want to find all of array.a, not just array.6.
file = fmt.Sprintf("%s.a", name)
if _, err := os.Stat(file); err == nil {
return file, true
}
file = fmt.Sprintf("%s.o", name)
if _, err := os.Stat(file); err == nil {
return file, true
}
return "", false
}
// local imports should be canonicalized already.
// don't want to see "encoding/../encoding/base64"
// as different from "encoding/base64".
if q := path.Clean(name); q != name {
yyerror("non-canonical import path %q (should be %q)", name, q)
return "", false
}
if packageFile != nil {
file, ok = packageFile[name]
return file, ok
}
for _, dir := range idirs {
file = fmt.Sprintf("%s/%s.a", dir, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
file = fmt.Sprintf("%s/%s.o", dir, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
}
if objabi.GOROOT != "" {
suffix := ""
suffixsep := ""
if flag_installsuffix != "" {
suffixsep = "_"
suffix = flag_installsuffix
} else if flag_race {
suffixsep = "_"
suffix = "race"
} else if flag_msan {
suffixsep = "_"
suffix = "msan"
}
file = fmt.Sprintf("%s/pkg/%s_%s%s%s/%s.a", objabi.GOROOT, objabi.GOOS, objabi.GOARCH, suffixsep, suffix, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
file = fmt.Sprintf("%s/pkg/%s_%s%s%s/%s.o", objabi.GOROOT, objabi.GOOS, objabi.GOARCH, suffixsep, suffix, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
}
return "", false
}
// loadsys loads the definitions for the low-level runtime functions,
// so that the compiler can generate calls to them,
// but does not make them visible to user code.
func loadsys() {
types.Block = 1
inimport = true
typecheckok = true
typs := runtimeTypes()
for _, d := range runtimeDecls {
sym := Runtimepkg.Lookup(d.name)
typ := typs[d.typ]
switch d.tag {
case funcTag:
importfunc(Runtimepkg, src.NoXPos, sym, typ)
case varTag:
importvar(Runtimepkg, src.NoXPos, sym, typ)
default:
Fatalf("unhandled declaration tag %v", d.tag)
}
}
typecheckok = false
inimport = false
}
// myheight tracks the local package's height based on packages
// imported so far.
var myheight int
func importfile(f *Val) *types.Pkg {
path_, ok := f.U.(string)
if !ok {
yyerror("import path must be a string")
return nil
}
if len(path_) == 0 {
yyerror("import path is empty")
return nil
}
if isbadimport(path_, false) {
return nil
}
// The package name main is no longer reserved,
// but we reserve the import path "main" to identify
// the main package, just as we reserve the import
// path "math" to identify the standard math package.
if path_ == "main" {
yyerror("cannot import \"main\"")
errorexit()
}
if myimportpath != "" && path_ == myimportpath {
yyerror("import %q while compiling that package (import cycle)", path_)
errorexit()
}
if mapped, ok := importMap[path_]; ok {
path_ = mapped
}
if path_ == "unsafe" {
imported_unsafe = true
return unsafepkg
}
if islocalname(path_) {
if path_[0] == '/' {
yyerror("import path cannot be absolute path")
return nil
}
prefix := Ctxt.Pathname
if localimport != "" {
prefix = localimport
}
path_ = path.Join(prefix, path_)
if isbadimport(path_, true) {
return nil
}
}
file, found := findpkg(path_)
if !found {
yyerror("can't find import: %q", path_)
errorexit()
}
importpkg := types.NewPkg(path_, "")
if importpkg.Imported {
return importpkg
}
importpkg.Imported = true
imp, err := bio.Open(file)
if err != nil {
yyerror("can't open import: %q: %v", path_, err)
errorexit()
}
defer imp.Close()
// check object header
p, err := imp.ReadString('\n')
if err != nil {
yyerror("import %s: reading input: %v", file, err)
errorexit()
}
if p == "!<arch>\n" { // package archive
// package export block should be first
sz := arsize(imp.Reader, "__.PKGDEF")
if sz <= 0 {
yyerror("import %s: not a package file", file)
errorexit()
}
p, err = imp.ReadString('\n')
if err != nil {
yyerror("import %s: reading input: %v", file, err)
errorexit()
}
}
if !strings.HasPrefix(p, "go object ") {
yyerror("import %s: not a go object file: %s", file, p)
errorexit()
}
q := fmt.Sprintf("%s %s %s %s\n", objabi.GOOS, objabi.GOARCH, objabi.Version, objabi.Expstring())
if p[10:] != q {
yyerror("import %s: object is [%s] expected [%s]", file, p[10:], q)
errorexit()
}
// process header lines
for {
p, err = imp.ReadString('\n')
if err != nil {
yyerror("import %s: reading input: %v", file, err)
errorexit()
}
if p == "\n" {
break // header ends with blank line
}
}
// assume files move (get installed) so don't record the full path
if packageFile != nil {
// If using a packageFile map, assume path_ can be recorded directly.
Ctxt.AddImport(path_)
} else {
// For file "/Users/foo/go/pkg/darwin_amd64/math.a" record "math.a".
Ctxt.AddImport(file[len(file)-len(path_)-len(".a"):])
}
// In the importfile, if we find:
// $$\n (textual format): not supported anymore
// $$B\n (binary format) : import directly, then feed the lexer a dummy statement
// look for $$
var c byte
for {
c, err = imp.ReadByte()
if err != nil {
break
}
if c == '$' {
c, err = imp.ReadByte()
if c == '$' || err != nil {
break
}
}
}
// get character after $$
if err == nil {
c, _ = imp.ReadByte()
}
switch c {
case '\n':
yyerror("cannot import %s: old export format no longer supported (recompile library)", path_)
return nil
case 'B':
cmd/compile: export inlined function bodies Completed implementation for exporting inlined functions using the new binary export format. This change passes (export GO_GCFLAGS=-newexport; make all.bash) but for gc's builtin_test.go which we need to adjust before enabling this code by default. For a high-level description of the export format see the comment at the top of bexport.go. Major changes: 1) The export format for the platform independent export data changed: When we export inlined function bodies, additional objects (other functions, types, etc.) that are referred to by the function bodies will need to be exported. While this doesn't affect the platform-independent portion directly, it adds more objects to the exportlist while we are exporting. Instead of trying to sort the objects into groups, just export objects as they appear in the export list. This is slightly less compact (one extra byte per object), but it is simpler and much more flexible. 2) The export format contains now three sections: 1) The plat- form independent objects, 2) the objects pulled in for export via inlined function bodies, and 3) the inlined function bodies. 3) Completed the exporting and importing code for inlined function bodies. The format is completely compiler-specific and easily changeable w/o affecting other tools. There is still quite a bit of room for denser encoding. This can happen at any time in the future. This change contains also the adjustments for go/internal/gcimporter, necessary because of the export format change 1) mentioned above. For #13241. Change-Id: I86bca0bd984b12ccf13d0d30892e6e25f6d04ed5 Reviewed-on: https://go-review.googlesource.com/21172 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 17:21:32 -07:00
if Debug_export != 0 {
fmt.Printf("importing %s (%s)\n", path_, file)
}
imp.ReadByte() // skip \n after $$B
cmd/compile: add indexed export format This CL introduces a new indexed data format for package export data. This improves on the previous (sequential) binary format by allowing the compiler to selectively (and lazily) load only the data that's actually needed for compilation. In large Go projects, the package export data can become very large due to transitive type declaration dependencies and inline function/method bodies. By lazily loading these declarations and bodies as needed, we avoid wasting time and memory processing unnecessary and/or redundant data. In the benchmarks below, "old" is -iexport=false and "new" is -iexport=true. The suffixes indicate the compiler concurrency (-c) and inlining (-l) settings used for the build (using -gcflags=all=-foo). Benchmarks were run on an HP Z620. Juju is "go build -a github.com/juju/juju/cmd/...": name old real-time/op new real-time/op delta Juju/c=1/l=0 44.0s ± 1% 38.7s ± 9% -11.97% (p=0.001 n=7+7) Juju/c=1/l=4 53.7s ± 3% 45.3s ± 4% -15.53% (p=0.001 n=7+7) Juju/c=4/l=0 39.7s ± 8% 32.0s ± 4% -19.38% (p=0.001 n=7+7) Juju/c=4/l=4 46.3s ± 4% 38.0s ± 4% -18.06% (p=0.001 n=7+7) name old user-time/op new user-time/op delta Juju/c=1/l=0 371s ± 1% 300s ± 0% -19.07% (p=0.001 n=7+6) Juju/c=1/l=4 482s ± 0% 374s ± 1% -22.37% (p=0.001 n=7+7) Juju/c=4/l=0 410s ± 1% 340s ± 1% -17.19% (p=0.001 n=7+7) Juju/c=4/l=4 532s ± 1% 424s ± 1% -20.26% (p=0.001 n=7+7) name old sys-time/op new sys-time/op delta Juju/c=1/l=0 33.4s ± 1% 28.4s ± 2% -15.02% (p=0.001 n=7+7) Juju/c=1/l=4 40.7s ± 2% 32.8s ± 3% -19.51% (p=0.001 n=7+7) Juju/c=4/l=0 39.8s ± 2% 34.4s ± 2% -13.74% (p=0.001 n=7+7) Juju/c=4/l=4 48.4s ± 2% 40.4s ± 2% -16.50% (p=0.001 n=7+7) Kubelet is "go build -a k8s.io/kubernetes/cmd/kubelet": name old real-time/op new real-time/op delta Kubelet/c=1/l=0 42.0s ± 1% 34.8s ± 1% -17.27% (p=0.008 n=5+5) Kubelet/c=1/l=4 55.4s ± 3% 45.4s ± 3% -18.06% (p=0.002 n=6+6) Kubelet/c=4/l=0 37.4s ± 3% 29.9s ± 1% -20.25% (p=0.004 n=6+5) Kubelet/c=4/l=4 48.1s ± 2% 39.0s ± 5% -18.93% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Kubelet/c=1/l=0 291s ± 1% 233s ± 1% -19.96% (p=0.002 n=6+6) Kubelet/c=1/l=4 385s ± 1% 298s ± 1% -22.51% (p=0.002 n=6+6) Kubelet/c=4/l=0 325s ± 0% 268s ± 1% -17.48% (p=0.004 n=5+6) Kubelet/c=4/l=4 429s ± 1% 343s ± 1% -20.08% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Kubelet/c=1/l=0 25.1s ± 2% 20.9s ± 4% -16.69% (p=0.002 n=6+6) Kubelet/c=1/l=4 31.2s ± 3% 24.4s ± 0% -21.67% (p=0.010 n=6+4) Kubelet/c=4/l=0 30.2s ± 2% 25.6s ± 1% -15.34% (p=0.002 n=6+6) Kubelet/c=4/l=4 37.3s ± 1% 30.9s ± 2% -17.11% (p=0.002 n=6+6) Change-Id: Ie43eb3bbe1392cbb61c86792a17a57b33b9561f0 Reviewed-on: https://go-review.googlesource.com/106796 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-01 01:55:55 -07:00
c, err = imp.ReadByte()
if err != nil {
yyerror("import %s: reading input: %v", file, err)
errorexit()
}
// Indexed format is distinguished by an 'i' byte,
// whereas previous export formats started with 'c', 'd', or 'v'.
if c != 'i' {
yyerror("import %s: unexpected package format byte: %v", file, c)
errorexit()
cmd/compile: add indexed export format This CL introduces a new indexed data format for package export data. This improves on the previous (sequential) binary format by allowing the compiler to selectively (and lazily) load only the data that's actually needed for compilation. In large Go projects, the package export data can become very large due to transitive type declaration dependencies and inline function/method bodies. By lazily loading these declarations and bodies as needed, we avoid wasting time and memory processing unnecessary and/or redundant data. In the benchmarks below, "old" is -iexport=false and "new" is -iexport=true. The suffixes indicate the compiler concurrency (-c) and inlining (-l) settings used for the build (using -gcflags=all=-foo). Benchmarks were run on an HP Z620. Juju is "go build -a github.com/juju/juju/cmd/...": name old real-time/op new real-time/op delta Juju/c=1/l=0 44.0s ± 1% 38.7s ± 9% -11.97% (p=0.001 n=7+7) Juju/c=1/l=4 53.7s ± 3% 45.3s ± 4% -15.53% (p=0.001 n=7+7) Juju/c=4/l=0 39.7s ± 8% 32.0s ± 4% -19.38% (p=0.001 n=7+7) Juju/c=4/l=4 46.3s ± 4% 38.0s ± 4% -18.06% (p=0.001 n=7+7) name old user-time/op new user-time/op delta Juju/c=1/l=0 371s ± 1% 300s ± 0% -19.07% (p=0.001 n=7+6) Juju/c=1/l=4 482s ± 0% 374s ± 1% -22.37% (p=0.001 n=7+7) Juju/c=4/l=0 410s ± 1% 340s ± 1% -17.19% (p=0.001 n=7+7) Juju/c=4/l=4 532s ± 1% 424s ± 1% -20.26% (p=0.001 n=7+7) name old sys-time/op new sys-time/op delta Juju/c=1/l=0 33.4s ± 1% 28.4s ± 2% -15.02% (p=0.001 n=7+7) Juju/c=1/l=4 40.7s ± 2% 32.8s ± 3% -19.51% (p=0.001 n=7+7) Juju/c=4/l=0 39.8s ± 2% 34.4s ± 2% -13.74% (p=0.001 n=7+7) Juju/c=4/l=4 48.4s ± 2% 40.4s ± 2% -16.50% (p=0.001 n=7+7) Kubelet is "go build -a k8s.io/kubernetes/cmd/kubelet": name old real-time/op new real-time/op delta Kubelet/c=1/l=0 42.0s ± 1% 34.8s ± 1% -17.27% (p=0.008 n=5+5) Kubelet/c=1/l=4 55.4s ± 3% 45.4s ± 3% -18.06% (p=0.002 n=6+6) Kubelet/c=4/l=0 37.4s ± 3% 29.9s ± 1% -20.25% (p=0.004 n=6+5) Kubelet/c=4/l=4 48.1s ± 2% 39.0s ± 5% -18.93% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Kubelet/c=1/l=0 291s ± 1% 233s ± 1% -19.96% (p=0.002 n=6+6) Kubelet/c=1/l=4 385s ± 1% 298s ± 1% -22.51% (p=0.002 n=6+6) Kubelet/c=4/l=0 325s ± 0% 268s ± 1% -17.48% (p=0.004 n=5+6) Kubelet/c=4/l=4 429s ± 1% 343s ± 1% -20.08% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Kubelet/c=1/l=0 25.1s ± 2% 20.9s ± 4% -16.69% (p=0.002 n=6+6) Kubelet/c=1/l=4 31.2s ± 3% 24.4s ± 0% -21.67% (p=0.010 n=6+4) Kubelet/c=4/l=0 30.2s ± 2% 25.6s ± 1% -15.34% (p=0.002 n=6+6) Kubelet/c=4/l=4 37.3s ± 1% 30.9s ± 2% -17.11% (p=0.002 n=6+6) Change-Id: Ie43eb3bbe1392cbb61c86792a17a57b33b9561f0 Reviewed-on: https://go-review.googlesource.com/106796 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-01 01:55:55 -07:00
}
iimport(importpkg, imp)
default:
yyerror("no import in %q", path_)
errorexit()
}
if importpkg.Height >= myheight {
myheight = importpkg.Height + 1
}
return importpkg
}
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
func pkgnotused(lineno src.XPos, path string, name string) {
// If the package was imported with a name other than the final
// import path element, show it explicitly in the error message.
// Note that this handles both renamed imports and imports of
// packages containing unconventional package declarations.
// Note that this uses / always, even on Windows, because Go import
// paths always use forward slashes.
elem := path
if i := strings.LastIndex(elem, "/"); i >= 0 {
elem = elem[i+1:]
}
if name == "" || elem == name {
yyerrorl(lineno, "imported and not used: %q", path)
} else {
yyerrorl(lineno, "imported and not used: %q as %s", path, name)
}
}
func mkpackage(pkgname string) {
if localpkg.Name == "" {
if pkgname == "_" {
yyerror("invalid package name _")
}
localpkg.Name = pkgname
} else {
if pkgname != localpkg.Name {
yyerror("package %s; expected %s", pkgname, localpkg.Name)
}
}
}
func clearImports() {
type importedPkg struct {
pos src.XPos
path string
name string
}
var unused []importedPkg
for _, s := range localpkg.Syms {
n := asNode(s.Def)
if n == nil {
continue
}
if n.Op == OPACK {
// throw away top-level package name left over
// from previous file.
// leave s->block set to cause redeclaration
// errors if a conflicting top-level name is
// introduced by a different file.
if !n.Name.Used() && nsyntaxerrors == 0 {
unused = append(unused, importedPkg{n.Pos, n.Name.Pkg.Path, s.Name})
}
s.Def = nil
continue
}
if IsAlias(s) {
// throw away top-level name left over
// from previous import . "x"
if n.Name != nil && n.Name.Pack != nil && !n.Name.Pack.Name.Used() && nsyntaxerrors == 0 {
unused = append(unused, importedPkg{n.Name.Pack.Pos, n.Name.Pack.Name.Pkg.Path, ""})
n.Name.Pack.Name.SetUsed(true)
}
s.Def = nil
continue
}
}
sort.Slice(unused, func(i, j int) bool { return unused[i].pos.Before(unused[j].pos) })
for _, pkg := range unused {
pkgnotused(pkg.pos, pkg.path, pkg.name)
}
}
func IsAlias(sym *types.Sym) bool {
return sym.Def != nil && asNode(sym.Def).Sym != sym
}
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
// By default, assume any debug flags are incompatible with concurrent compilation.
// A few are safe and potentially in common use for normal compiles, though; mark them as such here.
var concurrentFlagOK = [256]bool{
'B': true, // disabled bounds checking
'C': true, // disable printing of columns in error messages
'e': true, // no limit on errors; errors all come from non-concurrent code
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
'I': true, // add `directory` to import search path
'N': true, // disable optimizations
'l': true, // disable inlining
'w': true, // all printing happens before compilation
'W': true, // all printing happens before compilation
'S': true, // printing disassembly happens at the end (but see concurrentBackendAllowed below)
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
}
func concurrentBackendAllowed() bool {
for i, x := range Debug {
if x != 0 && !concurrentFlagOK[i] {
return false
}
}
// Debug['S'] by itself is ok, because all printing occurs
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
// while writing the object file, and that is non-concurrent.
// Adding Debug_vlog, however, causes Debug['S'] to also print
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
// while flushing the plist, which happens concurrently.
if Debug_vlog || debugstr != "" || debuglive > 0 {
return false
}
// TODO: Test and delete this condition.
if objabi.Fieldtrack_enabled != 0 {
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
return false
}
// TODO: fix races and enable the following flags
if Ctxt.Flag_shared || Ctxt.Flag_dynlink || flag_race {
return false
}
return true
}
// recordFlags records the specified command-line flags to be placed
// in the DWARF info.
func recordFlags(flags ...string) {
if myimportpath == "" {
// We can't record the flags if we don't know what the
// package name is.
return
}
type BoolFlag interface {
IsBoolFlag() bool
}
type CountFlag interface {
IsCountFlag() bool
}
var cmd bytes.Buffer
for _, name := range flags {
f := flag.Lookup(name)
if f == nil {
continue
}
getter := f.Value.(flag.Getter)
if getter.String() == f.DefValue {
// Flag has default value, so omit it.
continue
}
if bf, ok := f.Value.(BoolFlag); ok && bf.IsBoolFlag() {
val, ok := getter.Get().(bool)
if ok && val {
fmt.Fprintf(&cmd, " -%s", f.Name)
continue
}
}
if cf, ok := f.Value.(CountFlag); ok && cf.IsCountFlag() {
val, ok := getter.Get().(int)
if ok && val == 1 {
fmt.Fprintf(&cmd, " -%s", f.Name)
continue
}
}
fmt.Fprintf(&cmd, " -%s=%v", f.Name, getter.Get())
}
if cmd.Len() == 0 {
return
}
s := Ctxt.Lookup(dwarf.CUInfoPrefix + "producer." + myimportpath)
s.Type = objabi.SDWARFINFO
// Sometimes (for example when building tests) we can link
// together two package main archives. So allow dups.
s.Set(obj.AttrDuplicateOK, true)
Ctxt.Data = append(Ctxt.Data, s)
s.P = cmd.Bytes()[1:]
}
// recordPackageName records the name of the package being
// compiled, so that the linker can save it in the compile unit's DIE.
func recordPackageName() {
s := Ctxt.Lookup(dwarf.CUInfoPrefix + "packagename." + myimportpath)
s.Type = objabi.SDWARFINFO
// Sometimes (for example when building tests) we can link
// together two package main archives. So allow dups.
s.Set(obj.AttrDuplicateOK, true)
Ctxt.Data = append(Ctxt.Data, s)
s.P = []byte(localpkg.Name)
}
// flag_lang is the language version we are compiling for, set by the -lang flag.
var flag_lang string
// currentLang returns the current language version.
func currentLang() string {
return fmt.Sprintf("go1.%d", goversion.Version)
}
// goVersionRE is a regular expression that matches the valid
// arguments to the -lang flag.
var goVersionRE = regexp.MustCompile(`^go([1-9][0-9]*)\.(0|[1-9][0-9]*)$`)
// A lang is a language version broken into major and minor numbers.
type lang struct {
major, minor int
}
// langWant is the desired language version set by the -lang flag.
// If the -lang flag is not set, this is the zero value, meaning that
// any language version is supported.
var langWant lang
// langSupported reports whether language version major.minor is
// supported in a particular package.
func langSupported(major, minor int, pkg *types.Pkg) bool {
if pkg == nil {
// TODO(mdempsky): Set Pkg for local types earlier.
pkg = localpkg
}
if pkg != localpkg {
// Assume imported packages passed type-checking.
return true
}
if langWant.major == 0 && langWant.minor == 0 {
return true
}
return langWant.major > major || (langWant.major == major && langWant.minor >= minor)
}
// checkLang verifies that the -lang flag holds a valid value, and
// exits if not. It initializes data used by langSupported.
func checkLang() {
if flag_lang == "" {
return
}
var err error
langWant, err = parseLang(flag_lang)
if err != nil {
log.Fatalf("invalid value %q for -lang: %v", flag_lang, err)
}
if def := currentLang(); flag_lang != def {
defVers, err := parseLang(def)
if err != nil {
log.Fatalf("internal error parsing default lang %q: %v", def, err)
}
if langWant.major > defVers.major || (langWant.major == defVers.major && langWant.minor > defVers.minor) {
log.Fatalf("invalid value %q for -lang: max known version is %q", flag_lang, def)
}
}
}
// parseLang parses a -lang option into a langVer.
func parseLang(s string) (lang, error) {
matches := goVersionRE.FindStringSubmatch(s)
if matches == nil {
return lang{}, fmt.Errorf(`should be something like "go1.12"`)
}
major, err := strconv.Atoi(matches[1])
if err != nil {
return lang{}, err
}
minor, err := strconv.Atoi(matches[2])
if err != nil {
return lang{}, err
}
return lang{major: major, minor: minor}, nil
}