go/src/cmd/compile/internal/gc/main.go

1058 lines
28 KiB
Go
Raw Normal View History

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
//go:generate go run mkbuiltin.go
package gc
import (
"bufio"
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
"bytes"
"cmd/compile/internal/base"
[dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated] If we want to break up package gc at all, we will need to move the compiler IR it defines into a separate package that can be imported by packages that gc itself imports. This CL does that. It also removes the TINT8 etc aliases so that all code is clear about which package things are coming from. This CL is automatically generated by the script below. See the comments in the script for details about the changes. [git-generate] cd src/cmd/compile/internal/gc rf ' # These names were never fully qualified # when the types package was added. # Do it now, to avoid confusion about where they live. inline -rm \ Txxx \ TINT8 \ TUINT8 \ TINT16 \ TUINT16 \ TINT32 \ TUINT32 \ TINT64 \ TUINT64 \ TINT \ TUINT \ TUINTPTR \ TCOMPLEX64 \ TCOMPLEX128 \ TFLOAT32 \ TFLOAT64 \ TBOOL \ TPTR \ TFUNC \ TSLICE \ TARRAY \ TSTRUCT \ TCHAN \ TMAP \ TINTER \ TFORW \ TANY \ TSTRING \ TUNSAFEPTR \ TIDEAL \ TNIL \ TBLANK \ TFUNCARGS \ TCHANARGS \ NTYPE \ BADWIDTH # esc.go and escape.go do not need to be split. # Append esc.go onto the end of escape.go. mv esc.go escape.go # Pull out the type format installation from func Main, # so it can be carried into package ir. mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats # Names that need to be exported for use by code left in gc. mv Isconst IsConst mv asNode AsNode mv asNodes AsNodes mv asTypesNode AsTypesNode mv basicnames BasicTypeNames mv builtinpkg BuiltinPkg mv consttype ConstType mv dumplist DumpList mv fdumplist FDumpList mv fmtMode FmtMode mv goopnames OpNames mv inspect Inspect mv inspectList InspectList mv localpkg LocalPkg mv nblank BlankNode mv numImport NumImport mv opprec OpPrec mv origSym OrigSym mv stmtwithinit StmtWithInit mv dump DumpAny mv fdump FDumpAny mv nod Nod mv nodl NodAt mv newname NewName mv newnamel NewNameAt mv assertRepresents AssertValidTypeForConst mv represents ValidTypeForConst mv nodlit NewLiteral # Types and fields that need to be exported for use by gc. mv nowritebarrierrecCallSym SymAndPos mv SymAndPos.lineno SymAndPos.Pos mv SymAndPos.target SymAndPos.Sym mv Func.lsym Func.LSym mv Func.setWBPos Func.SetWBPos mv Func.numReturns Func.NumReturns mv Func.numDefers Func.NumDefers mv Func.nwbrCalls Func.NWBRCalls # initLSym is an algorithm left behind in gc, # not an operation on Func itself. mv Func.initLSym initLSym mv nodeQueue NodeQueue mv NodeQueue.empty NodeQueue.Empty mv NodeQueue.popLeft NodeQueue.PopLeft mv NodeQueue.pushRight NodeQueue.PushRight # Many methods on Node are actually algorithms that # would apply to any node implementation. # Those become plain functions. mv Node.funcname FuncName mv Node.isBlank IsBlank mv Node.isGoConst isGoConst mv Node.isNil IsNil mv Node.isParamHeapCopy isParamHeapCopy mv Node.isParamStackCopy isParamStackCopy mv Node.isSimpleName isSimpleName mv Node.mayBeShared MayBeShared mv Node.pkgFuncName PkgFuncName mv Node.backingArrayPtrLen backingArrayPtrLen mv Node.isterminating isTermNode mv Node.labeledControl labeledControl mv Nodes.isterminating isTermNodes mv Nodes.sigerr fmtSignature mv Node.MethodName methodExprName mv Node.MethodFunc methodExprFunc mv Node.IsMethod IsMethod # Every node will need to implement RawCopy; # Copy and SepCopy algorithms will use it. mv Node.rawcopy Node.RawCopy mv Node.copy Copy mv Node.sepcopy SepCopy # Extract Node.Format method body into func FmtNode, # but leave method wrapper behind. mv Node.Format:0,$ FmtNode # Formatting helpers that will apply to all node implementations. mv Node.Line Line mv Node.exprfmt exprFmt mv Node.jconv jconvFmt mv Node.modeString modeString mv Node.nconv nconvFmt mv Node.nodedump nodeDumpFmt mv Node.nodefmt nodeFmt mv Node.stmtfmt stmtFmt # Constant support needed for code moving to ir. mv okforconst OKForConst mv vconv FmtConst mv int64Val Int64Val mv float64Val Float64Val mv Node.ValueInterface ConstValue # Organize code into files. mv LocalPkg BuiltinPkg ir.go mv NumImport InstallTypeFormats Line fmt.go mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \ AsNode AsTypesNode BlankNode OrigSym \ Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \ IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \ Node.RawCopy SepCopy Copy \ IsNil IsBlank IsMethod \ Node.Typ Node.StorageClass node.go mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go # Move files to new ir package. mv bitset.go class_string.go dump.go fmt.go \ ir.go node.go op_string.go val.go \ sizeof_test.go cmd/compile/internal/ir ' : # fix mkbuiltin.go to generate the changes made to builtin.go during rf sed -i '' ' s/\[T/[types.T/g s/\*Node/*ir.Node/g /internal\/types/c \ fmt.Fprintln(&b, `import (`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \ fmt.Fprintln(&b, `)`) ' mkbuiltin.go gofmt -w mkbuiltin.go : # update cmd/dist to add internal/ir cd ../../../dist sed -i '' '/compile.internal.gc/a\ "cmd/compile/internal/ir", ' buildtool.go gofmt -w buildtool.go : # update cmd/compile TestFormats cd ../.. go install std cmd cd cmd/compile go test -u || go test # first one updates but fails; second passes Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db Reviewed-on: https://go-review.googlesource.com/c/go/+/273008 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-19 21:09:22 -05:00
"cmd/compile/internal/ir"
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
"cmd/compile/internal/logopt"
"cmd/compile/internal/ssa"
"cmd/compile/internal/types"
"cmd/internal/bio"
"cmd/internal/dwarf"
"cmd/internal/goobj"
"cmd/internal/obj"
"cmd/internal/objabi"
"cmd/internal/src"
"flag"
"fmt"
[dev.regabi] cmd/compile: replace Val with go/constant.Value This replaces the compiler's legacy constant representation with go/constant, which is used by go/types. This should ease integrating with the new go/types-based type checker in the future. Performance difference is mixed, but there's still room for improvement. name old time/op new time/op delta Template 280ms ± 6% 281ms ± 6% ~ (p=0.488 n=592+587) Unicode 132ms ±11% 129ms ±11% -2.61% (p=0.000 n=592+591) GoTypes 865ms ± 3% 866ms ± 3% +0.16% (p=0.019 n=572+577) Compiler 3.60s ± 3% 3.60s ± 3% ~ (p=0.083 n=578+582) SSA 8.27s ± 2% 8.28s ± 2% +0.14% (p=0.002 n=575+580) Flate 177ms ± 8% 176ms ± 8% ~ (p=0.133 n=580+590) GoParser 238ms ± 7% 237ms ± 6% ~ (p=0.569 n=587+591) Reflect 542ms ± 4% 543ms ± 4% ~ (p=0.064 n=581+579) Tar 244ms ± 6% 244ms ± 6% ~ (p=0.880 n=586+584) XML 322ms ± 5% 322ms ± 5% ~ (p=0.449 n=589+590) LinkCompiler 454ms ± 6% 453ms ± 6% ~ (p=0.249 n=585+583) ExternalLinkCompiler 1.35s ± 4% 1.35s ± 4% ~ (p=0.968 n=590+588) LinkWithoutDebugCompiler 279ms ± 7% 280ms ± 7% ~ (p=0.270 n=589+586) [Geo mean] 535ms 534ms -0.17% name old user-time/op new user-time/op delta Template 599ms ±22% 602ms ±21% ~ (p=0.377 n=588+590) Unicode 410ms ±43% 376ms ±39% -8.36% (p=0.000 n=596+586) GoTypes 1.96s ±15% 1.97s ±17% +0.70% (p=0.031 n=596+594) Compiler 7.47s ± 9% 7.50s ± 8% +0.38% (p=0.031 n=591+583) SSA 16.2s ± 4% 16.2s ± 5% ~ (p=0.617 n=531+531) Flate 298ms ±25% 292ms ±30% -2.14% (p=0.001 n=594+596) GoParser 379ms ±20% 381ms ±21% ~ (p=0.312 n=578+584) Reflect 1.24s ±20% 1.25s ±23% +0.88% (p=0.031 n=592+596) Tar 471ms ±23% 473ms ±21% ~ (p=0.616 n=593+587) XML 674ms ±20% 681ms ±21% +1.03% (p=0.050 n=584+587) LinkCompiler 842ms ±10% 839ms ±10% ~ (p=0.074 n=587+590) ExternalLinkCompiler 1.65s ± 7% 1.65s ± 7% ~ (p=0.767 n=590+585) LinkWithoutDebugCompiler 378ms ±11% 379ms ±12% ~ (p=0.677 n=591+586) [Geo mean] 1.02s 1.02s -0.52% name old alloc/op new alloc/op delta Template 37.4MB ± 0% 37.4MB ± 0% +0.06% (p=0.000 n=589+585) Unicode 29.6MB ± 0% 28.6MB ± 0% -3.11% (p=0.000 n=574+566) GoTypes 120MB ± 0% 120MB ± 0% -0.01% (p=0.000 n=594+593) Compiler 568MB ± 0% 568MB ± 0% -0.02% (p=0.000 n=588+591) SSA 1.45GB ± 0% 1.45GB ± 0% -0.16% (p=0.000 n=596+592) Flate 22.6MB ± 0% 22.5MB ± 0% -0.36% (p=0.000 n=593+595) GoParser 30.1MB ± 0% 30.1MB ± 0% -0.01% (p=0.000 n=590+594) Reflect 77.8MB ± 0% 77.8MB ± 0% ~ (p=0.631 n=584+591) Tar 34.1MB ± 0% 34.1MB ± 0% -0.04% (p=0.000 n=584+588) XML 43.6MB ± 0% 43.6MB ± 0% +0.07% (p=0.000 n=593+591) LinkCompiler 98.6MB ± 0% 98.6MB ± 0% ~ (p=0.096 n=590+589) ExternalLinkCompiler 89.6MB ± 0% 89.6MB ± 0% ~ (p=0.695 n=590+587) LinkWithoutDebugCompiler 57.2MB ± 0% 57.2MB ± 0% ~ (p=0.674 n=590+589) [Geo mean] 78.5MB 78.3MB -0.28% name old allocs/op new allocs/op delta Template 379k ± 0% 380k ± 0% +0.33% (p=0.000 n=593+590) Unicode 344k ± 0% 338k ± 0% -1.67% (p=0.000 n=594+589) GoTypes 1.30M ± 0% 1.31M ± 0% +0.19% (p=0.000 n=592+591) Compiler 5.40M ± 0% 5.41M ± 0% +0.23% (p=0.000 n=587+585) SSA 14.2M ± 0% 14.2M ± 0% +0.08% (p=0.000 n=594+591) Flate 231k ± 0% 230k ± 0% -0.42% (p=0.000 n=588+589) GoParser 314k ± 0% 315k ± 0% +0.16% (p=0.000 n=587+594) Reflect 975k ± 0% 976k ± 0% +0.10% (p=0.000 n=590+594) Tar 344k ± 0% 345k ± 0% +0.24% (p=0.000 n=595+590) XML 422k ± 0% 424k ± 0% +0.57% (p=0.000 n=590+589) LinkCompiler 538k ± 0% 538k ± 0% -0.00% (p=0.045 n=592+587) ExternalLinkCompiler 593k ± 0% 593k ± 0% ~ (p=0.171 n=588+587) LinkWithoutDebugCompiler 172k ± 0% 172k ± 0% ~ (p=0.996 n=590+585) [Geo mean] 685k 685k -0.02% name old maxRSS/op new maxRSS/op delta Template 53.7M ± 8% 53.8M ± 8% ~ (p=0.666 n=576+574) Unicode 54.4M ±12% 55.0M ±10% +1.15% (p=0.000 n=591+588) GoTypes 95.1M ± 4% 95.1M ± 4% ~ (p=0.948 n=589+591) Compiler 334M ± 6% 334M ± 6% ~ (p=0.875 n=592+593) SSA 792M ± 5% 791M ± 5% ~ (p=0.067 n=592+591) Flate 39.9M ±11% 40.0M ±10% ~ (p=0.131 n=596+596) GoParser 45.2M ±11% 45.3M ±11% ~ (p=0.353 n=592+590) Reflect 76.1M ± 5% 76.2M ± 5% ~ (p=0.114 n=594+594) Tar 49.4M ±10% 49.6M ± 9% +0.57% (p=0.015 n=590+593) XML 57.4M ± 9% 57.7M ± 8% +0.67% (p=0.000 n=592+580) LinkCompiler 183M ± 2% 183M ± 2% ~ (p=0.229 n=587+591) ExternalLinkCompiler 187M ± 2% 187M ± 3% ~ (p=0.362 n=571+562) LinkWithoutDebugCompiler 143M ± 3% 143M ± 3% ~ (p=0.350 n=584+586) [Geo mean] 103M 103M +0.23% Passes toolstash-check. Fixes #4617. Change-Id: Id4f6759b4afc5e002770091d0d4f6e272ee6cbdd Reviewed-on: https://go-review.googlesource.com/c/go/+/272654 Reviewed-by: Robert Griesemer <gri@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-11-13 23:36:48 -08:00
"go/constant"
"internal/goversion"
"io"
"io/ioutil"
"log"
"os"
"path"
"regexp"
"runtime"
"sort"
"strconv"
"strings"
)
func hidePanic() {
if base.Debug.Panic == 0 && base.Errors() > 0 {
// If we've already complained about things
// in the program, don't bother complaining
// about a panic too; let the user clean up
// the code and try again.
if err := recover(); err != nil {
if err == "-h" {
panic(err)
}
base.ErrorExit()
}
}
}
// Target is the package being compiled.
var Target *ir.Package
// Main parses flags and Go source files specified in the command-line
// arguments, type-checks the parsed Go package, compiles functions to machine
// code, and finally writes the compiled package definition to disk.
func Main(archInit func(*Arch)) {
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "init")
defer hidePanic()
archInit(&thearch)
base.Ctxt = obj.Linknew(thearch.LinkArch)
base.Ctxt.DiagFunc = base.Errorf
base.Ctxt.DiagFlush = base.FlushErrors
base.Ctxt.Bso = bufio.NewWriter(os.Stdout)
// UseBASEntries is preferred because it shaves about 2% off build time, but LLDB, dsymutil, and dwarfdump
// on Darwin don't support it properly, especially since macOS 10.14 (Mojave). This is exposed as a flag
// to allow testing with LLVM tools on Linux, and to help with reporting this bug to the LLVM project.
// See bugs 31188 and 21945 (CLs 170638, 98075, 72371).
base.Ctxt.UseBASEntries = base.Ctxt.Headtype != objabi.Hdarwin
types.LocalPkg = types.NewPkg("", "")
types.LocalPkg.Prefix = "\"\""
// We won't know localpkg's height until after import
// processing. In the mean time, set to MaxPkgHeight to ensure
// height comparisons at least work until then.
types.LocalPkg.Height = types.MaxPkgHeight
// pseudo-package, for scoping
types.BuiltinPkg = types.NewPkg("go.builtin", "") // TODO(gri) name this package go.builtin?
types.BuiltinPkg.Prefix = "go.builtin" // not go%2ebuiltin
// pseudo-package, accessed by import "unsafe"
unsafepkg = types.NewPkg("unsafe", "unsafe")
// Pseudo-package that contains the compiler's builtin
// declarations for package runtime. These are declared in a
// separate package to avoid conflicts with package runtime's
// actual declarations, which may differ intentionally but
// insignificantly.
Runtimepkg = types.NewPkg("go.runtime", "runtime")
Runtimepkg.Prefix = "runtime"
// pseudo-packages used in symbol tables
itabpkg = types.NewPkg("go.itab", "go.itab")
itabpkg.Prefix = "go.itab" // not go%2eitab
itablinkpkg = types.NewPkg("go.itablink", "go.itablink")
itablinkpkg.Prefix = "go.itablink" // not go%2eitablink
trackpkg = types.NewPkg("go.track", "go.track")
trackpkg.Prefix = "go.track" // not go%2etrack
// pseudo-package used for map zero values
mappkg = types.NewPkg("go.map", "go.map")
mappkg.Prefix = "go.map"
// pseudo-package used for methods with anonymous receivers
gopkg = types.NewPkg("go", "")
base.DebugSSA = ssa.PhaseOption
base.ParseFlags()
// Record flags that affect the build result. (And don't
// record flags that don't, since that would cause spurious
// changes in the binary.)
recordFlags("B", "N", "l", "msan", "race", "shared", "dynlink", "dwarflocationlists", "dwarfbasentries", "smallframes", "spectre")
if !enableTrace && base.Flag.LowerT {
log.Fatalf("compiler not built with support for -t")
}
// Enable inlining (after recordFlags, to avoid recording the rewritten -l). For now:
// default: inlining on. (Flag.LowerL == 1)
// -l: inlining off (Flag.LowerL == 0)
// -l=2, -l=3: inlining on again, with extra debugging (Flag.LowerL > 1)
if base.Flag.LowerL <= 1 {
base.Flag.LowerL = 1 - base.Flag.LowerL
}
if base.Flag.SmallFrames {
maxStackVarSize = 128 * 1024
maxImplicitStackVarSize = 16 * 1024
}
if base.Flag.Dwarf {
base.Ctxt.DebugInfo = debuginfo
base.Ctxt.GenAbstractFunc = genAbstractFunc
base.Ctxt.DwFixups = obj.NewDwarfFixupTable(base.Ctxt)
} else {
// turn off inline generation if no dwarf at all
base.Flag.GenDwarfInl = 0
base.Ctxt.Flag_locationlists = false
}
if base.Ctxt.Flag_locationlists && len(base.Ctxt.Arch.DWARFRegisters) == 0 {
log.Fatalf("location lists requested but register mapping not available on %v", base.Ctxt.Arch.Name)
}
checkLang()
if base.Flag.SymABIs != "" {
readSymABIs(base.Flag.SymABIs, base.Ctxt.Pkgpath)
}
if ispkgin(omit_pkgs) {
base.Flag.Race = false
base.Flag.MSan = false
}
thearch.LinkArch.Init(base.Ctxt)
startProfile()
if base.Flag.Race {
racepkg = types.NewPkg("runtime/race", "")
}
if base.Flag.MSan {
msanpkg = types.NewPkg("runtime/msan", "")
}
if base.Flag.Race || base.Flag.MSan {
instrumenting = true
}
if base.Flag.Dwarf {
dwarf.EnableLogging(base.Debug.DwarfInl != 0)
[dev.debug] cmd/compile: better DWARF with optimizations on Debuggers use DWARF information to find local variables on the stack and in registers. Prior to this CL, the DWARF information for functions claimed that all variables were on the stack at all times. That's incorrect when optimizations are enabled, and results in debuggers showing data that is out of date or complete gibberish. After this CL, the compiler is capable of representing variable locations more accurately, and attempts to do so. Due to limitations of the SSA backend, it's not possible to be completely correct. There are a number of problems in the current design. One of the easier to understand is that variable names currently must be attached to an SSA value, but not all assignments in the source code actually result in machine code. For example: type myint int var a int b := myint(int) and b := (*uint64)(unsafe.Pointer(a)) don't generate machine code because the underlying representation is the same, so the correct value of b will not be set when the user would expect. Generating the more precise debug information is behind a flag, dwarflocationlists. Because of the issues described above, setting the flag may not make the debugging experience much better, and may actually make it worse in cases where the variable actually is on the stack and the more complicated analysis doesn't realize it. A number of changes are included: - Add a new pseudo-instruction, RegKill, which indicates that the value in the register has been clobbered. - Adjust regalloc to emit RegKills in the right places. Significantly, this means that phis are mixed with StoreReg and RegKills after regalloc. - Track variable decomposition in ssa.LocalSlots. - After the SSA backend is done, analyze the result and build location lists for each LocalSlot. - After assembly is done, update the location lists with the assembled PC offsets, recompose variables, and build DWARF location lists. Emit the list as a new linker symbol, one per function. - In the linker, aggregate the location lists into a .debug_loc section. TODO: - currently disabled for non-X86/AMD64 because there are no data tables. go build -toolexec 'toolstash -cmp' -a std succeeds. With -dwarflocationlists false: before: f02812195637909ff675782c0b46836a8ff01976 after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec benchstat -geomean /tmp/220352263 /tmp/621364410 completed 15 of 15, estimated time remaining 0s (eta 3:52PM) name old time/op new time/op delta Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14) Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15) GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14) Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15) GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15) Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13) Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15) XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15) [Geo mean] 206ms 377ms +82.86% name old user-time/op new user-time/op delta Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15) Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14) GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15) Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15) GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15) Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13) Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15) XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15) [Geo mean] 317ms 583ms +83.72% name old alloc/op new alloc/op delta Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15) Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15) GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14) Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15) GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15) Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15) Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15) XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15) [Geo mean] 42.1MB 75.0MB +78.05% name old allocs/op new allocs/op delta Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15) Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14) GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14) Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15) GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15) Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15) Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15) XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15) [Geo mean] 439k 755k +72.01% name old text-bytes new text-bytes delta HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15) name old data-bytes new data-bytes delta HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal) Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8 Reviewed-on: https://go-review.googlesource.com/41770 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
if base.Debug.SoftFloat != 0 {
thearch.SoftFloat = true
}
if base.Flag.JSON != "" { // parse version,destination from json logging optimization.
logopt.LogJsonOption(base.Flag.JSON)
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
}
ir.EscFmt = escFmt
IsIntrinsicCall = isIntrinsicCall
SSADumpInline = ssaDumpInline
initSSAEnv()
initSSATables()
Widthptr = thearch.LinkArch.PtrSize
Widthreg = thearch.LinkArch.RegSize
MaxWidth = thearch.MAXWIDTH
types.TypeLinkSym = func(t *types.Type) *obj.LSym {
return typenamesym(t).Linksym()
}
Target = new(ir.Package)
NeedFuncSym = makefuncsym
NeedITab = func(t, iface *types.Type) { itabname(t, iface) }
NeedRuntimeType = addsignat // TODO(rsc): typenamesym for lock?
autogeneratedPos = makePos(src.NewFileBase("<autogenerated>", "<autogenerated>"), 1, 0)
types.TypeLinkSym = func(t *types.Type) *obj.LSym {
return typenamesym(t).Linksym()
}
TypecheckInit()
// Parse input.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "parse")
lines := parseFiles(flag.Args())
cgoSymABIs()
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Stop()
[dev.inline] cmd/internal/src: replace src.Pos with syntax.Pos This replaces the src.Pos LineHist-based position tracking with the syntax.Pos implementation and updates all uses. The LineHist table is not used anymore - the respective code is still there but should be removed eventually. CL forthcoming. Passes toolstash -cmp when comparing to the master repo (with the exception of a couple of swapped assembly instructions, likely due to different instruction scheduling because the line-based sorting has changed; though this is won't affect correctness). The sizes of various important compiler data structures have increased significantly (see the various sizes_test.go files); this is probably the reason for an increase of compilation times (to be addressed). Here are the results of compilebench -count 5, run on a "quiet" machine (no apps running besides a terminal): name old time/op new time/op delta Template 256ms ± 1% 280ms ±15% +9.54% (p=0.008 n=5+5) Unicode 132ms ± 1% 132ms ± 1% ~ (p=0.690 n=5+5) GoTypes 891ms ± 1% 917ms ± 2% +2.88% (p=0.008 n=5+5) Compiler 3.84s ± 2% 3.99s ± 2% +3.95% (p=0.016 n=5+5) MakeBash 47.1s ± 1% 47.2s ± 2% ~ (p=0.841 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 326M ± 2% +5.18% (p=0.008 n=5+5) Unicode 165M ± 1% 168M ± 4% ~ (p=0.421 n=5+5) GoTypes 1.14G ± 2% 1.18G ± 1% +3.47% (p=0.008 n=5+5) Compiler 5.00G ± 1% 5.16G ± 1% +3.12% (p=0.008 n=5+5) Change-Id: I241c4246cdff627d7ecb95cac23060b38f9775ec Reviewed-on: https://go-review.googlesource.com/34273 Run-TryBot: Robert Griesemer <gri@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 17:15:05 -08:00
timings.AddEvent(int64(lines), "lines")
if base.Flag.G != 0 && base.Flag.G < 3 {
// can only parse generic code for now
base.ExitIfErrors()
return
}
recordPackageName()
// Typecheck.
TypecheckPackage()
// With all user code typechecked, it's now safe to verify unused dot imports.
2020-12-13 10:35:20 -08:00
checkDotImports()
base.ExitIfErrors()
// Build init task.
if initTask := fninit(); initTask != nil {
exportsym(initTask)
}
// Inlining
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "inlining")
if base.Flag.LowerL != 0 {
InlinePackage()
}
// Devirtualize.
for _, n := range Target.Decls {
[dev.regabi] cmd/compile: use Node getters and setters [generated] Now that we have all the getters and setters defined, use them and unexport all the actual Node fields. This is the next step toward replacing Node with an interface. [git-generate] cd src/cmd/compile/internal/gc rf ' ex . ../ir ../ssa { import "cmd/compile/internal/ir" import "cmd/compile/internal/types" import "cmd/internal/src" var n, x *ir.Node var op ir.Op var t *types.Type var f *ir.Func var m *ir.Name var s *types.Sym var p src.XPos var i int64 var e uint16 var nodes ir.Nodes n.Op = op -> n.SetOp(op) n.Left = x -> n.SetLeft(x) n.Right = x -> n.SetRight(x) n.Orig = x -> n.SetOrig(x) n.Type = t -> n.SetType(t) n.Func = f -> n.SetFunc(f) n.Name = m -> n.SetName(m) n.Sym = s -> n.SetSym(s) n.Pos = p -> n.SetPos(p) n.Xoffset = i -> n.SetXoffset(i) n.Esc = e -> n.SetEsc(e) n.Ninit.Append -> n.PtrNinit().Append n.Ninit.AppendNodes -> n.PtrNinit().AppendNodes n.Ninit.MoveNodes -> n.PtrNinit().MoveNodes n.Ninit.Prepend -> n.PtrNinit().Prepend n.Ninit.Set -> n.PtrNinit().Set n.Ninit.Set1 -> n.PtrNinit().Set1 n.Ninit.Set2 -> n.PtrNinit().Set2 n.Ninit.Set3 -> n.PtrNinit().Set3 &n.Ninit -> n.PtrNinit() n.Ninit = nodes -> n.SetNinit(nodes) n.Nbody.Append -> n.PtrNbody().Append n.Nbody.AppendNodes -> n.PtrNbody().AppendNodes n.Nbody.MoveNodes -> n.PtrNbody().MoveNodes n.Nbody.Prepend -> n.PtrNbody().Prepend n.Nbody.Set -> n.PtrNbody().Set n.Nbody.Set1 -> n.PtrNbody().Set1 n.Nbody.Set2 -> n.PtrNbody().Set2 n.Nbody.Set3 -> n.PtrNbody().Set3 &n.Nbody -> n.PtrNbody() n.Nbody = nodes -> n.SetNbody(nodes) n.List.Append -> n.PtrList().Append n.List.AppendNodes -> n.PtrList().AppendNodes n.List.MoveNodes -> n.PtrList().MoveNodes n.List.Prepend -> n.PtrList().Prepend n.List.Set -> n.PtrList().Set n.List.Set1 -> n.PtrList().Set1 n.List.Set2 -> n.PtrList().Set2 n.List.Set3 -> n.PtrList().Set3 &n.List -> n.PtrList() n.List = nodes -> n.SetList(nodes) n.Rlist.Append -> n.PtrRlist().Append n.Rlist.AppendNodes -> n.PtrRlist().AppendNodes n.Rlist.MoveNodes -> n.PtrRlist().MoveNodes n.Rlist.Prepend -> n.PtrRlist().Prepend n.Rlist.Set -> n.PtrRlist().Set n.Rlist.Set1 -> n.PtrRlist().Set1 n.Rlist.Set2 -> n.PtrRlist().Set2 n.Rlist.Set3 -> n.PtrRlist().Set3 &n.Rlist -> n.PtrRlist() n.Rlist = nodes -> n.SetRlist(nodes) } ex . ../ir ../ssa { import "cmd/compile/internal/ir" var n *ir.Node n.Op -> n.GetOp() n.Left -> n.GetLeft() n.Right -> n.GetRight() n.Orig -> n.GetOrig() n.Type -> n.GetType() n.Func -> n.GetFunc() n.Name -> n.GetName() n.Sym -> n.GetSym() n.Pos -> n.GetPos() n.Xoffset -> n.GetXoffset() n.Esc -> n.GetEsc() avoid (*ir.Node).PtrNinit avoid (*ir.Node).PtrNbody avoid (*ir.Node).PtrList avoid (*ir.Node).PtrRlist n.Ninit -> n.GetNinit() n.Nbody -> n.GetNbody() n.List -> n.GetList() n.Rlist -> n.GetRlist() } ' cd ../ir rf ' mv Node.Op Node.op mv Node.GetOp Node.Op mv Node.Left Node.left mv Node.GetLeft Node.Left mv Node.Right Node.right mv Node.GetRight Node.Right mv Node.Orig Node.orig mv Node.GetOrig Node.Orig mv Node.Type Node.typ mv Node.GetType Node.Type mv Node.Func Node.fn mv Node.GetFunc Node.Func mv Node.Name Node.name mv Node.GetName Node.Name # All uses are in other Node methods already. mv Node.E Node.e mv Node.Sym Node.sym mv Node.GetSym Node.Sym mv Node.Pos Node.pos mv Node.GetPos Node.Pos mv Node.Esc Node.esc mv Node.GetEsc Node.Esc # While we are here, rename Xoffset to more idiomatic Offset. mv Node.Xoffset Node.offset mv Node.GetXoffset Node.Offset mv Node.SetXoffset Node.SetOffset # While we are here, rename Ninit, Nbody to more idiomatic Init, Body. mv Node.Ninit Node.init mv Node.GetNinit Node.Init mv Node.PtrNinit Node.PtrInit mv Node.SetNinit Node.SetInit mv Node.Nbody Node.body mv Node.GetNbody Node.Body mv Node.PtrNbody Node.PtrBody mv Node.SetNbody Node.SetBody mv Node.List Node.list mv Node.GetList Node.List mv Node.Rlist Node.rlist mv Node.GetRlist Node.Rlist # Unexport these mv Node.SetHasOpt Node.setHasOpt mv Node.SetHasVal Node.setHasVal ' Change-Id: I9894f633375c5237a29b6d6d7b89ba181b56ca3a Reviewed-on: https://go-review.googlesource.com/c/go/+/273009 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-22 09:59:15 -05:00
if n.Op() == ir.ODCLFUNC {
devirtualize(n.(*ir.Func))
}
}
Curfn = nil
// Escape analysis.
// Required for moving heap allocations onto stack,
// which in turn is required by the closure implementation,
// which stores the addresses of stack variables into the closure.
// If the closure does not escape, it needs to be on the stack
// or else the stack copier will not update it.
// Large values are also moved off stack in escape analysis;
// because large values may contain pointers, it must happen early.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("fe", "escapes")
escapes(Target.Decls)
// Collect information for go:nowritebarrierrec
// checking. This must happen before transformclosure.
// We'll do the final check after write barriers are
// inserted.
if base.Flag.CompilingRuntime {
EnableNoWriteBarrierRecCheck()
}
cmd/compile: improve coverage of nowritebarrierrec check The current go:nowritebarrierrec checker has two problems that limit its coverage: 1. It doesn't understand that systemstack calls its argument, which means there are several cases where we fail to detect prohibited write barriers. 2. It only observes calls in the AST, so calls constructed during lowering by SSA aren't followed. This CL completely rewrites this checker to address these issues. The current checker runs entirely after walk and uses visitBottomUp, which introduces several problems for checking across systemstack. First, visitBottomUp itself doesn't understand systemstack calls, so the callee may be ordered after the caller, causing the checker to fail to propagate constraints. Second, many systemstack calls are passed a closure, which is quite difficult to resolve back to the function definition after transformclosure and walk have run. Third, visitBottomUp works exclusively on the AST, so it can't observe calls created by SSA. To address these problems, this commit splits the check into two phases and rewrites it to use a call graph generated during SSA lowering. The first phase runs before transformclosure/walk and simply records systemstack arguments when they're easy to get. Then, it modifies genssa to record static call edges at the point where we're lowering to Progs (which is the latest point at which position information is conveniently available). Finally, the second phase runs after all functions have been lowered and uses a direct BFS walk of the call graph (combining systemstack calls with static calls) to find prohibited write barriers and construct nice error messages. Fixes #22384. For #22460. Change-Id: I39668f7f2366ab3c1ab1a71eaf25484d25349540 Reviewed-on: https://go-review.googlesource.com/72773 Run-TryBot: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-22 16:36:27 -04:00
// Transform closure bodies to properly reference captured variables.
// This needs to happen before walk, because closures must be transformed
// before walk reaches a call of a closure.
timings.Start("fe", "xclosures")
for _, n := range Target.Decls {
if n.Op() == ir.ODCLFUNC {
n := n.(*ir.Func)
if n.Func().OClosure != nil {
Curfn = n
transformclosure(n)
}
}
}
// Prepare for SSA compilation.
// This must be before peekitabs, because peekitabs
// can trigger function compilation.
initssaconfig()
// Just before compilation, compile itabs found on
// the right side of OCONVIFACE so that methods
// can be de-virtualized during compilation.
Curfn = nil
peekitabs()
// Compile top level functions.
// Don't use range--walk can add functions to Target.Decls.
timings.Start("be", "compilefuncs")
fcount := int64(0)
for i := 0; i < len(Target.Decls); i++ {
n := Target.Decls[i]
[dev.regabi] cmd/compile: use Node getters and setters [generated] Now that we have all the getters and setters defined, use them and unexport all the actual Node fields. This is the next step toward replacing Node with an interface. [git-generate] cd src/cmd/compile/internal/gc rf ' ex . ../ir ../ssa { import "cmd/compile/internal/ir" import "cmd/compile/internal/types" import "cmd/internal/src" var n, x *ir.Node var op ir.Op var t *types.Type var f *ir.Func var m *ir.Name var s *types.Sym var p src.XPos var i int64 var e uint16 var nodes ir.Nodes n.Op = op -> n.SetOp(op) n.Left = x -> n.SetLeft(x) n.Right = x -> n.SetRight(x) n.Orig = x -> n.SetOrig(x) n.Type = t -> n.SetType(t) n.Func = f -> n.SetFunc(f) n.Name = m -> n.SetName(m) n.Sym = s -> n.SetSym(s) n.Pos = p -> n.SetPos(p) n.Xoffset = i -> n.SetXoffset(i) n.Esc = e -> n.SetEsc(e) n.Ninit.Append -> n.PtrNinit().Append n.Ninit.AppendNodes -> n.PtrNinit().AppendNodes n.Ninit.MoveNodes -> n.PtrNinit().MoveNodes n.Ninit.Prepend -> n.PtrNinit().Prepend n.Ninit.Set -> n.PtrNinit().Set n.Ninit.Set1 -> n.PtrNinit().Set1 n.Ninit.Set2 -> n.PtrNinit().Set2 n.Ninit.Set3 -> n.PtrNinit().Set3 &n.Ninit -> n.PtrNinit() n.Ninit = nodes -> n.SetNinit(nodes) n.Nbody.Append -> n.PtrNbody().Append n.Nbody.AppendNodes -> n.PtrNbody().AppendNodes n.Nbody.MoveNodes -> n.PtrNbody().MoveNodes n.Nbody.Prepend -> n.PtrNbody().Prepend n.Nbody.Set -> n.PtrNbody().Set n.Nbody.Set1 -> n.PtrNbody().Set1 n.Nbody.Set2 -> n.PtrNbody().Set2 n.Nbody.Set3 -> n.PtrNbody().Set3 &n.Nbody -> n.PtrNbody() n.Nbody = nodes -> n.SetNbody(nodes) n.List.Append -> n.PtrList().Append n.List.AppendNodes -> n.PtrList().AppendNodes n.List.MoveNodes -> n.PtrList().MoveNodes n.List.Prepend -> n.PtrList().Prepend n.List.Set -> n.PtrList().Set n.List.Set1 -> n.PtrList().Set1 n.List.Set2 -> n.PtrList().Set2 n.List.Set3 -> n.PtrList().Set3 &n.List -> n.PtrList() n.List = nodes -> n.SetList(nodes) n.Rlist.Append -> n.PtrRlist().Append n.Rlist.AppendNodes -> n.PtrRlist().AppendNodes n.Rlist.MoveNodes -> n.PtrRlist().MoveNodes n.Rlist.Prepend -> n.PtrRlist().Prepend n.Rlist.Set -> n.PtrRlist().Set n.Rlist.Set1 -> n.PtrRlist().Set1 n.Rlist.Set2 -> n.PtrRlist().Set2 n.Rlist.Set3 -> n.PtrRlist().Set3 &n.Rlist -> n.PtrRlist() n.Rlist = nodes -> n.SetRlist(nodes) } ex . ../ir ../ssa { import "cmd/compile/internal/ir" var n *ir.Node n.Op -> n.GetOp() n.Left -> n.GetLeft() n.Right -> n.GetRight() n.Orig -> n.GetOrig() n.Type -> n.GetType() n.Func -> n.GetFunc() n.Name -> n.GetName() n.Sym -> n.GetSym() n.Pos -> n.GetPos() n.Xoffset -> n.GetXoffset() n.Esc -> n.GetEsc() avoid (*ir.Node).PtrNinit avoid (*ir.Node).PtrNbody avoid (*ir.Node).PtrList avoid (*ir.Node).PtrRlist n.Ninit -> n.GetNinit() n.Nbody -> n.GetNbody() n.List -> n.GetList() n.Rlist -> n.GetRlist() } ' cd ../ir rf ' mv Node.Op Node.op mv Node.GetOp Node.Op mv Node.Left Node.left mv Node.GetLeft Node.Left mv Node.Right Node.right mv Node.GetRight Node.Right mv Node.Orig Node.orig mv Node.GetOrig Node.Orig mv Node.Type Node.typ mv Node.GetType Node.Type mv Node.Func Node.fn mv Node.GetFunc Node.Func mv Node.Name Node.name mv Node.GetName Node.Name # All uses are in other Node methods already. mv Node.E Node.e mv Node.Sym Node.sym mv Node.GetSym Node.Sym mv Node.Pos Node.pos mv Node.GetPos Node.Pos mv Node.Esc Node.esc mv Node.GetEsc Node.Esc # While we are here, rename Xoffset to more idiomatic Offset. mv Node.Xoffset Node.offset mv Node.GetXoffset Node.Offset mv Node.SetXoffset Node.SetOffset # While we are here, rename Ninit, Nbody to more idiomatic Init, Body. mv Node.Ninit Node.init mv Node.GetNinit Node.Init mv Node.PtrNinit Node.PtrInit mv Node.SetNinit Node.SetInit mv Node.Nbody Node.body mv Node.GetNbody Node.Body mv Node.PtrNbody Node.PtrBody mv Node.SetNbody Node.SetBody mv Node.List Node.list mv Node.GetList Node.List mv Node.Rlist Node.rlist mv Node.GetRlist Node.Rlist # Unexport these mv Node.SetHasOpt Node.setHasOpt mv Node.SetHasVal Node.setHasVal ' Change-Id: I9894f633375c5237a29b6d6d7b89ba181b56ca3a Reviewed-on: https://go-review.googlesource.com/c/go/+/273009 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-22 09:59:15 -05:00
if n.Op() == ir.ODCLFUNC {
funccompile(n.(*ir.Func))
fcount++
}
}
timings.AddEvent(fcount, "funcs")
compileFunctions()
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
if base.Flag.CompilingRuntime {
// Write barriers are now known. Check the call graph.
NoWriteBarrierRecCheck()
}
// Finalize DWARF inline routine DIEs, then explicitly turn off
// DWARF inlining gen so as to avoid problems with generated
// method wrappers.
if base.Ctxt.DwFixups != nil {
base.Ctxt.DwFixups.Finalize(base.Ctxt.Pkgpath, base.Debug.DwarfInl != 0)
base.Ctxt.DwFixups = nil
base.Flag.GenDwarfInl = 0
}
// Write object data to disk.
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Start("be", "dumpobj")
dumpdata()
base.Ctxt.NumberSyms()
dumpobj()
if base.Flag.AsmHdr != "" {
dumpasmhdr()
}
CheckLargeStacks()
CheckFuncStack()
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
if len(compilequeue) != 0 {
base.Fatalf("%d uncompiled functions", len(compilequeue))
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
}
logopt.FlushLoggedOpts(base.Ctxt, base.Ctxt.Pkgpath)
base.ExitIfErrors()
cmd/compile: add framework for logging optimizer (non)actions to LSP This is intended to allow IDEs to note where the optimizer was not able to improve users' code. There may be other applications for this, for example in studying effectiveness of optimizer changes more quickly than running benchmarks, or in verifying that code changes did not accidentally disable optimizations in performance-critical code. Logging of nilcheck (bad) for amd64 is implemented as proof-of-concept. In general, the intent is that optimizations that didn't happen are what will be logged, because that is believed to be what IDE users want. Added flag -json=version,dest Check that version=0. (Future compilers will support a few recent versions, I hope that version is always <=3.) Dest is expected to be one of: /path (or \path in Windows) will create directory /path and fill it w/ json files file://path will create directory path, intended either for I:\dont\know\enough\about\windows\paths trustme_I_know_what_I_am_doing_probably_testing Not passing an absolute path name usually leads to json splattered all over source directories, or failure when those directories are not writeable. If you want a foot-gun, you have to ask for it. The JSON output is directed to subdirectories of dest, where each subdirectory is net/url.PathEscape of the package name, and each for each foo.go in the package, net/url.PathEscape(foo).json is created. The first line of foo.json contains version and context information, and subsequent lines contains LSP-conforming JSON describing the missing optimizations. Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0 Reviewed-on: https://go-review.googlesource.com/c/go/+/204338 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
base.FlushErrors()
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
timings.Stop()
if base.Flag.Bench != "" {
if err := writebench(base.Flag.Bench); err != nil {
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
log.Fatalf("cannot write benchmark data: %v", err)
}
}
}
func CheckLargeStacks() {
// Check whether any of the functions we have compiled have gigantic stack frames.
sort.Slice(largeStackFrames, func(i, j int) bool {
return largeStackFrames[i].pos.Before(largeStackFrames[j].pos)
})
for _, large := range largeStackFrames {
if large.callee != 0 {
base.ErrorfAt(large.pos, "stack frame too large (>1GB): %d MB locals + %d MB args + %d MB callee", large.locals>>20, large.args>>20, large.callee>>20)
} else {
base.ErrorfAt(large.pos, "stack frame too large (>1GB): %d MB locals + %d MB args", large.locals>>20, large.args>>20)
}
}
}
func cgoSymABIs() {
// The linker expects an ABI0 wrapper for all cgo-exported
// functions.
for _, prag := range Target.CgoPragmas {
switch prag[0] {
case "cgo_export_static", "cgo_export_dynamic":
if symabiRefs == nil {
symabiRefs = make(map[string]obj.ABI)
}
symabiRefs[prag[1]] = obj.ABI0
}
}
}
cmd/compile: allow mid-stack inlining when there is a cycle of recursion We still disallow inlining for an immediately-recursive function, but allow inlining if a function is in a recursion chain. If all functions in the recursion chain are simple, then we could inline forever down the recursion chain (eventually running out of stack on the compiler), so we add a map to keep track of the functions we have already inlined at a call site. We stop inlining when we reach a function that we have already inlined in the recursive chain. Of course, normally the inlining will have stopped earlier, because of the cost function. We could also limit the depth of inlining by a simple count (say, limit max inlining of 10 at any given site). Would that limit other opportunities too much? Added a test in test/inline.go. runtime.BenchmarkStackCopyNoCache() is also already a good test that triggers the check to stop inlining when we reach the start of the recursive chain again. For the bent benchmark suite, the performance improvement was mostly not statistically significant, but the geomean averaged out to: -0.68%. The text size increase was less than .1% for all bent benchmarks. The cmd/go text size increase was 0.02% and the cmd/compile text size increase was .1%. Fixes #29737 Change-Id: I892fa84bb07a947b3125ec8f25ed0e508bf2bdf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/226818 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-31 20:24:05 -07:00
// numNonClosures returns the number of functions in list which are not closures.
func numNonClosures(list []*ir.Func) int {
cmd/compile: allow mid-stack inlining when there is a cycle of recursion We still disallow inlining for an immediately-recursive function, but allow inlining if a function is in a recursion chain. If all functions in the recursion chain are simple, then we could inline forever down the recursion chain (eventually running out of stack on the compiler), so we add a map to keep track of the functions we have already inlined at a call site. We stop inlining when we reach a function that we have already inlined in the recursive chain. Of course, normally the inlining will have stopped earlier, because of the cost function. We could also limit the depth of inlining by a simple count (say, limit max inlining of 10 at any given site). Would that limit other opportunities too much? Added a test in test/inline.go. runtime.BenchmarkStackCopyNoCache() is also already a good test that triggers the check to stop inlining when we reach the start of the recursive chain again. For the bent benchmark suite, the performance improvement was mostly not statistically significant, but the geomean averaged out to: -0.68%. The text size increase was less than .1% for all bent benchmarks. The cmd/go text size increase was 0.02% and the cmd/compile text size increase was .1%. Fixes #29737 Change-Id: I892fa84bb07a947b3125ec8f25ed0e508bf2bdf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/226818 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-31 20:24:05 -07:00
count := 0
for _, fn := range list {
if fn.OClosure == nil {
cmd/compile: allow mid-stack inlining when there is a cycle of recursion We still disallow inlining for an immediately-recursive function, but allow inlining if a function is in a recursion chain. If all functions in the recursion chain are simple, then we could inline forever down the recursion chain (eventually running out of stack on the compiler), so we add a map to keep track of the functions we have already inlined at a call site. We stop inlining when we reach a function that we have already inlined in the recursive chain. Of course, normally the inlining will have stopped earlier, because of the cost function. We could also limit the depth of inlining by a simple count (say, limit max inlining of 10 at any given site). Would that limit other opportunities too much? Added a test in test/inline.go. runtime.BenchmarkStackCopyNoCache() is also already a good test that triggers the check to stop inlining when we reach the start of the recursive chain again. For the bent benchmark suite, the performance improvement was mostly not statistically significant, but the geomean averaged out to: -0.68%. The text size increase was less than .1% for all bent benchmarks. The cmd/go text size increase was 0.02% and the cmd/compile text size increase was .1%. Fixes #29737 Change-Id: I892fa84bb07a947b3125ec8f25ed0e508bf2bdf5 Reviewed-on: https://go-review.googlesource.com/c/go/+/226818 Run-TryBot: Dan Scales <danscales@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2020-03-31 20:24:05 -07:00
count++
}
}
return count
}
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
func writebench(filename string) error {
f, err := os.OpenFile(filename, os.O_WRONLY|os.O_CREATE|os.O_APPEND, 0666)
if err != nil {
return err
}
var buf bytes.Buffer
fmt.Fprintln(&buf, "commit:", objabi.Version)
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
fmt.Fprintln(&buf, "goos:", runtime.GOOS)
fmt.Fprintln(&buf, "goarch:", runtime.GOARCH)
timings.Write(&buf, "BenchmarkCompile:"+base.Ctxt.Pkgpath+":")
cmd/compile: add compiler phase timing Timings is a simple data structure that collects times of labeled Start/Stop events describing timed phases, which later can be written to a file. Adjacent phases with common label prefix are automatically collected in a group together with the accumulated phase time. Timing data can be appended to a file in benchmark data format using the new -bench flag: $ go build -gcflags="-bench=/dev/stdout" -o /dev/null go/types commit: devel +8847c6b Mon Aug 15 17:51:53 2016 -0700 goos: darwin goarch: amd64 BenchmarkCompile:go/types:fe:init 1 663292 ns/op 0.07 % BenchmarkCompile:go/types:fe:loadsys 1 1337371 ns/op 0.14 % BenchmarkCompile:go/types:fe:parse 1 47008869 ns/op 4.91 % 10824 lines 230254 lines/s BenchmarkCompile:go/types:fe:typecheck:top1 1 2843343 ns/op 0.30 % BenchmarkCompile:go/types:fe:typecheck:top2 1 447457 ns/op 0.05 % BenchmarkCompile:go/types:fe:typecheck:func 1 15119595 ns/op 1.58 % 427 funcs 28241 funcs/s BenchmarkCompile:go/types:fe:capturevars 1 56314 ns/op 0.01 % BenchmarkCompile:go/types:fe:inlining 1 9805767 ns/op 1.02 % BenchmarkCompile:go/types:fe:escapes 1 53598646 ns/op 5.60 % BenchmarkCompile:go/types:fe:xclosures 1 199302 ns/op 0.02 % BenchmarkCompile:go/types:fe:subtotal 1 131079956 ns/op 13.70 % BenchmarkCompile:go/types:be:compilefuncs 1 692009428 ns/op 72.33 % 427 funcs 617 funcs/s BenchmarkCompile:go/types:be:externaldcls 1 54591 ns/op 0.01 % BenchmarkCompile:go/types:be:dumpobj 1 133478173 ns/op 13.95 % BenchmarkCompile:go/types:be:subtotal 1 825542192 ns/op 86.29 % BenchmarkCompile:go/types:unaccounted 1 106101 ns/op 0.01 % BenchmarkCompile:go/types:total 1 956728249 ns/op 100.00 % For #16169. Change-Id: I93265fe0cb08e47cd413608d0824c5dd35ba7899 Reviewed-on: https://go-review.googlesource.com/24462 Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-06-24 15:03:04 -07:00
n, err := f.Write(buf.Bytes())
if err != nil {
return err
}
if n != buf.Len() {
panic("bad writer")
}
return f.Close()
}
// symabiDefs and symabiRefs record the defined and referenced ABIs of
// symbols required by non-Go code. These are keyed by link symbol
// name, where the local package prefix is always `"".`
var symabiDefs, symabiRefs map[string]obj.ABI
// readSymABIs reads a symabis file that specifies definitions and
// references of text symbols by ABI.
//
// The symabis format is a set of lines, where each line is a sequence
// of whitespace-separated fields. The first field is a verb and is
// either "def" for defining a symbol ABI or "ref" for referencing a
// symbol using an ABI. For both "def" and "ref", the second field is
// the symbol name and the third field is the ABI name, as one of the
// named cmd/internal/obj.ABI constants.
func readSymABIs(file, myimportpath string) {
data, err := ioutil.ReadFile(file)
if err != nil {
log.Fatalf("-symabis: %v", err)
}
symabiDefs = make(map[string]obj.ABI)
symabiRefs = make(map[string]obj.ABI)
localPrefix := ""
if myimportpath != "" {
// Symbols in this package may be written either as
// "".X or with the package's import path already in
// the symbol.
localPrefix = objabi.PathToPrefix(myimportpath) + "."
}
for lineNum, line := range strings.Split(string(data), "\n") {
lineNum++ // 1-based
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
continue
}
parts := strings.Fields(line)
switch parts[0] {
case "def", "ref":
// Parse line.
if len(parts) != 3 {
log.Fatalf(`%s:%d: invalid symabi: syntax is "%s sym abi"`, file, lineNum, parts[0])
}
sym, abistr := parts[1], parts[2]
abi, valid := obj.ParseABI(abistr)
if !valid {
log.Fatalf(`%s:%d: invalid symabi: unknown abi "%s"`, file, lineNum, abistr)
}
// If the symbol is already prefixed with
// myimportpath, rewrite it to start with ""
// so it matches the compiler's internal
// symbol names.
if localPrefix != "" && strings.HasPrefix(sym, localPrefix) {
sym = `"".` + sym[len(localPrefix):]
}
// Record for later.
if parts[0] == "def" {
symabiDefs[sym] = abi
} else {
symabiRefs[sym] = abi
}
default:
log.Fatalf(`%s:%d: invalid symabi type "%s"`, file, lineNum, parts[0])
}
}
}
func arsize(b *bufio.Reader, name string) int {
var buf [ArhdrSize]byte
if _, err := io.ReadFull(b, buf[:]); err != nil {
return -1
}
aname := strings.Trim(string(buf[0:16]), " ")
if !strings.HasPrefix(aname, name) {
return -1
}
asize := strings.Trim(string(buf[48:58]), " ")
i, _ := strconv.Atoi(asize)
return i
}
func isDriveLetter(b byte) bool {
return 'a' <= b && b <= 'z' || 'A' <= b && b <= 'Z'
}
// is this path a local name? begins with ./ or ../ or /
func islocalname(name string) bool {
return strings.HasPrefix(name, "/") ||
runtime.GOOS == "windows" && len(name) >= 3 && isDriveLetter(name[0]) && name[1] == ':' && name[2] == '/' ||
strings.HasPrefix(name, "./") || name == "." ||
strings.HasPrefix(name, "../") || name == ".."
}
func findpkg(name string) (file string, ok bool) {
if islocalname(name) {
if base.Flag.NoLocalImports {
return "", false
}
if base.Flag.Cfg.PackageFile != nil {
file, ok = base.Flag.Cfg.PackageFile[name]
return file, ok
}
// try .a before .6. important for building libraries:
// if there is an array.6 in the array.a library,
// want to find all of array.a, not just array.6.
file = fmt.Sprintf("%s.a", name)
if _, err := os.Stat(file); err == nil {
return file, true
}
file = fmt.Sprintf("%s.o", name)
if _, err := os.Stat(file); err == nil {
return file, true
}
return "", false
}
// local imports should be canonicalized already.
// don't want to see "encoding/../encoding/base64"
// as different from "encoding/base64".
if q := path.Clean(name); q != name {
base.Errorf("non-canonical import path %q (should be %q)", name, q)
return "", false
}
if base.Flag.Cfg.PackageFile != nil {
file, ok = base.Flag.Cfg.PackageFile[name]
return file, ok
}
for _, dir := range base.Flag.Cfg.ImportDirs {
file = fmt.Sprintf("%s/%s.a", dir, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
file = fmt.Sprintf("%s/%s.o", dir, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
}
if objabi.GOROOT != "" {
suffix := ""
suffixsep := ""
if base.Flag.InstallSuffix != "" {
suffixsep = "_"
suffix = base.Flag.InstallSuffix
} else if base.Flag.Race {
suffixsep = "_"
suffix = "race"
} else if base.Flag.MSan {
suffixsep = "_"
suffix = "msan"
}
file = fmt.Sprintf("%s/pkg/%s_%s%s%s/%s.a", objabi.GOROOT, objabi.GOOS, objabi.GOARCH, suffixsep, suffix, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
file = fmt.Sprintf("%s/pkg/%s_%s%s%s/%s.o", objabi.GOROOT, objabi.GOOS, objabi.GOARCH, suffixsep, suffix, name)
if _, err := os.Stat(file); err == nil {
return file, true
}
}
return "", false
}
// loadsys loads the definitions for the low-level runtime functions,
// so that the compiler can generate calls to them,
// but does not make them visible to user code.
func loadsys() {
types.Block = 1
inimport = true
typecheckok = true
typs := runtimeTypes()
for _, d := range &runtimeDecls {
sym := Runtimepkg.Lookup(d.name)
typ := typs[d.typ]
switch d.tag {
case funcTag:
importfunc(Runtimepkg, src.NoXPos, sym, typ)
case varTag:
importvar(Runtimepkg, src.NoXPos, sym, typ)
default:
base.Fatalf("unhandled declaration tag %v", d.tag)
}
}
typecheckok = false
inimport = false
}
// myheight tracks the local package's height based on packages
// imported so far.
var myheight int
[dev.regabi] cmd/compile: replace Val with go/constant.Value This replaces the compiler's legacy constant representation with go/constant, which is used by go/types. This should ease integrating with the new go/types-based type checker in the future. Performance difference is mixed, but there's still room for improvement. name old time/op new time/op delta Template 280ms ± 6% 281ms ± 6% ~ (p=0.488 n=592+587) Unicode 132ms ±11% 129ms ±11% -2.61% (p=0.000 n=592+591) GoTypes 865ms ± 3% 866ms ± 3% +0.16% (p=0.019 n=572+577) Compiler 3.60s ± 3% 3.60s ± 3% ~ (p=0.083 n=578+582) SSA 8.27s ± 2% 8.28s ± 2% +0.14% (p=0.002 n=575+580) Flate 177ms ± 8% 176ms ± 8% ~ (p=0.133 n=580+590) GoParser 238ms ± 7% 237ms ± 6% ~ (p=0.569 n=587+591) Reflect 542ms ± 4% 543ms ± 4% ~ (p=0.064 n=581+579) Tar 244ms ± 6% 244ms ± 6% ~ (p=0.880 n=586+584) XML 322ms ± 5% 322ms ± 5% ~ (p=0.449 n=589+590) LinkCompiler 454ms ± 6% 453ms ± 6% ~ (p=0.249 n=585+583) ExternalLinkCompiler 1.35s ± 4% 1.35s ± 4% ~ (p=0.968 n=590+588) LinkWithoutDebugCompiler 279ms ± 7% 280ms ± 7% ~ (p=0.270 n=589+586) [Geo mean] 535ms 534ms -0.17% name old user-time/op new user-time/op delta Template 599ms ±22% 602ms ±21% ~ (p=0.377 n=588+590) Unicode 410ms ±43% 376ms ±39% -8.36% (p=0.000 n=596+586) GoTypes 1.96s ±15% 1.97s ±17% +0.70% (p=0.031 n=596+594) Compiler 7.47s ± 9% 7.50s ± 8% +0.38% (p=0.031 n=591+583) SSA 16.2s ± 4% 16.2s ± 5% ~ (p=0.617 n=531+531) Flate 298ms ±25% 292ms ±30% -2.14% (p=0.001 n=594+596) GoParser 379ms ±20% 381ms ±21% ~ (p=0.312 n=578+584) Reflect 1.24s ±20% 1.25s ±23% +0.88% (p=0.031 n=592+596) Tar 471ms ±23% 473ms ±21% ~ (p=0.616 n=593+587) XML 674ms ±20% 681ms ±21% +1.03% (p=0.050 n=584+587) LinkCompiler 842ms ±10% 839ms ±10% ~ (p=0.074 n=587+590) ExternalLinkCompiler 1.65s ± 7% 1.65s ± 7% ~ (p=0.767 n=590+585) LinkWithoutDebugCompiler 378ms ±11% 379ms ±12% ~ (p=0.677 n=591+586) [Geo mean] 1.02s 1.02s -0.52% name old alloc/op new alloc/op delta Template 37.4MB ± 0% 37.4MB ± 0% +0.06% (p=0.000 n=589+585) Unicode 29.6MB ± 0% 28.6MB ± 0% -3.11% (p=0.000 n=574+566) GoTypes 120MB ± 0% 120MB ± 0% -0.01% (p=0.000 n=594+593) Compiler 568MB ± 0% 568MB ± 0% -0.02% (p=0.000 n=588+591) SSA 1.45GB ± 0% 1.45GB ± 0% -0.16% (p=0.000 n=596+592) Flate 22.6MB ± 0% 22.5MB ± 0% -0.36% (p=0.000 n=593+595) GoParser 30.1MB ± 0% 30.1MB ± 0% -0.01% (p=0.000 n=590+594) Reflect 77.8MB ± 0% 77.8MB ± 0% ~ (p=0.631 n=584+591) Tar 34.1MB ± 0% 34.1MB ± 0% -0.04% (p=0.000 n=584+588) XML 43.6MB ± 0% 43.6MB ± 0% +0.07% (p=0.000 n=593+591) LinkCompiler 98.6MB ± 0% 98.6MB ± 0% ~ (p=0.096 n=590+589) ExternalLinkCompiler 89.6MB ± 0% 89.6MB ± 0% ~ (p=0.695 n=590+587) LinkWithoutDebugCompiler 57.2MB ± 0% 57.2MB ± 0% ~ (p=0.674 n=590+589) [Geo mean] 78.5MB 78.3MB -0.28% name old allocs/op new allocs/op delta Template 379k ± 0% 380k ± 0% +0.33% (p=0.000 n=593+590) Unicode 344k ± 0% 338k ± 0% -1.67% (p=0.000 n=594+589) GoTypes 1.30M ± 0% 1.31M ± 0% +0.19% (p=0.000 n=592+591) Compiler 5.40M ± 0% 5.41M ± 0% +0.23% (p=0.000 n=587+585) SSA 14.2M ± 0% 14.2M ± 0% +0.08% (p=0.000 n=594+591) Flate 231k ± 0% 230k ± 0% -0.42% (p=0.000 n=588+589) GoParser 314k ± 0% 315k ± 0% +0.16% (p=0.000 n=587+594) Reflect 975k ± 0% 976k ± 0% +0.10% (p=0.000 n=590+594) Tar 344k ± 0% 345k ± 0% +0.24% (p=0.000 n=595+590) XML 422k ± 0% 424k ± 0% +0.57% (p=0.000 n=590+589) LinkCompiler 538k ± 0% 538k ± 0% -0.00% (p=0.045 n=592+587) ExternalLinkCompiler 593k ± 0% 593k ± 0% ~ (p=0.171 n=588+587) LinkWithoutDebugCompiler 172k ± 0% 172k ± 0% ~ (p=0.996 n=590+585) [Geo mean] 685k 685k -0.02% name old maxRSS/op new maxRSS/op delta Template 53.7M ± 8% 53.8M ± 8% ~ (p=0.666 n=576+574) Unicode 54.4M ±12% 55.0M ±10% +1.15% (p=0.000 n=591+588) GoTypes 95.1M ± 4% 95.1M ± 4% ~ (p=0.948 n=589+591) Compiler 334M ± 6% 334M ± 6% ~ (p=0.875 n=592+593) SSA 792M ± 5% 791M ± 5% ~ (p=0.067 n=592+591) Flate 39.9M ±11% 40.0M ±10% ~ (p=0.131 n=596+596) GoParser 45.2M ±11% 45.3M ±11% ~ (p=0.353 n=592+590) Reflect 76.1M ± 5% 76.2M ± 5% ~ (p=0.114 n=594+594) Tar 49.4M ±10% 49.6M ± 9% +0.57% (p=0.015 n=590+593) XML 57.4M ± 9% 57.7M ± 8% +0.67% (p=0.000 n=592+580) LinkCompiler 183M ± 2% 183M ± 2% ~ (p=0.229 n=587+591) ExternalLinkCompiler 187M ± 2% 187M ± 3% ~ (p=0.362 n=571+562) LinkWithoutDebugCompiler 143M ± 3% 143M ± 3% ~ (p=0.350 n=584+586) [Geo mean] 103M 103M +0.23% Passes toolstash-check. Fixes #4617. Change-Id: Id4f6759b4afc5e002770091d0d4f6e272ee6cbdd Reviewed-on: https://go-review.googlesource.com/c/go/+/272654 Reviewed-by: Robert Griesemer <gri@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-11-13 23:36:48 -08:00
func importfile(f constant.Value) *types.Pkg {
if f.Kind() != constant.String {
base.Errorf("import path must be a string")
return nil
}
[dev.regabi] cmd/compile: replace Val with go/constant.Value This replaces the compiler's legacy constant representation with go/constant, which is used by go/types. This should ease integrating with the new go/types-based type checker in the future. Performance difference is mixed, but there's still room for improvement. name old time/op new time/op delta Template 280ms ± 6% 281ms ± 6% ~ (p=0.488 n=592+587) Unicode 132ms ±11% 129ms ±11% -2.61% (p=0.000 n=592+591) GoTypes 865ms ± 3% 866ms ± 3% +0.16% (p=0.019 n=572+577) Compiler 3.60s ± 3% 3.60s ± 3% ~ (p=0.083 n=578+582) SSA 8.27s ± 2% 8.28s ± 2% +0.14% (p=0.002 n=575+580) Flate 177ms ± 8% 176ms ± 8% ~ (p=0.133 n=580+590) GoParser 238ms ± 7% 237ms ± 6% ~ (p=0.569 n=587+591) Reflect 542ms ± 4% 543ms ± 4% ~ (p=0.064 n=581+579) Tar 244ms ± 6% 244ms ± 6% ~ (p=0.880 n=586+584) XML 322ms ± 5% 322ms ± 5% ~ (p=0.449 n=589+590) LinkCompiler 454ms ± 6% 453ms ± 6% ~ (p=0.249 n=585+583) ExternalLinkCompiler 1.35s ± 4% 1.35s ± 4% ~ (p=0.968 n=590+588) LinkWithoutDebugCompiler 279ms ± 7% 280ms ± 7% ~ (p=0.270 n=589+586) [Geo mean] 535ms 534ms -0.17% name old user-time/op new user-time/op delta Template 599ms ±22% 602ms ±21% ~ (p=0.377 n=588+590) Unicode 410ms ±43% 376ms ±39% -8.36% (p=0.000 n=596+586) GoTypes 1.96s ±15% 1.97s ±17% +0.70% (p=0.031 n=596+594) Compiler 7.47s ± 9% 7.50s ± 8% +0.38% (p=0.031 n=591+583) SSA 16.2s ± 4% 16.2s ± 5% ~ (p=0.617 n=531+531) Flate 298ms ±25% 292ms ±30% -2.14% (p=0.001 n=594+596) GoParser 379ms ±20% 381ms ±21% ~ (p=0.312 n=578+584) Reflect 1.24s ±20% 1.25s ±23% +0.88% (p=0.031 n=592+596) Tar 471ms ±23% 473ms ±21% ~ (p=0.616 n=593+587) XML 674ms ±20% 681ms ±21% +1.03% (p=0.050 n=584+587) LinkCompiler 842ms ±10% 839ms ±10% ~ (p=0.074 n=587+590) ExternalLinkCompiler 1.65s ± 7% 1.65s ± 7% ~ (p=0.767 n=590+585) LinkWithoutDebugCompiler 378ms ±11% 379ms ±12% ~ (p=0.677 n=591+586) [Geo mean] 1.02s 1.02s -0.52% name old alloc/op new alloc/op delta Template 37.4MB ± 0% 37.4MB ± 0% +0.06% (p=0.000 n=589+585) Unicode 29.6MB ± 0% 28.6MB ± 0% -3.11% (p=0.000 n=574+566) GoTypes 120MB ± 0% 120MB ± 0% -0.01% (p=0.000 n=594+593) Compiler 568MB ± 0% 568MB ± 0% -0.02% (p=0.000 n=588+591) SSA 1.45GB ± 0% 1.45GB ± 0% -0.16% (p=0.000 n=596+592) Flate 22.6MB ± 0% 22.5MB ± 0% -0.36% (p=0.000 n=593+595) GoParser 30.1MB ± 0% 30.1MB ± 0% -0.01% (p=0.000 n=590+594) Reflect 77.8MB ± 0% 77.8MB ± 0% ~ (p=0.631 n=584+591) Tar 34.1MB ± 0% 34.1MB ± 0% -0.04% (p=0.000 n=584+588) XML 43.6MB ± 0% 43.6MB ± 0% +0.07% (p=0.000 n=593+591) LinkCompiler 98.6MB ± 0% 98.6MB ± 0% ~ (p=0.096 n=590+589) ExternalLinkCompiler 89.6MB ± 0% 89.6MB ± 0% ~ (p=0.695 n=590+587) LinkWithoutDebugCompiler 57.2MB ± 0% 57.2MB ± 0% ~ (p=0.674 n=590+589) [Geo mean] 78.5MB 78.3MB -0.28% name old allocs/op new allocs/op delta Template 379k ± 0% 380k ± 0% +0.33% (p=0.000 n=593+590) Unicode 344k ± 0% 338k ± 0% -1.67% (p=0.000 n=594+589) GoTypes 1.30M ± 0% 1.31M ± 0% +0.19% (p=0.000 n=592+591) Compiler 5.40M ± 0% 5.41M ± 0% +0.23% (p=0.000 n=587+585) SSA 14.2M ± 0% 14.2M ± 0% +0.08% (p=0.000 n=594+591) Flate 231k ± 0% 230k ± 0% -0.42% (p=0.000 n=588+589) GoParser 314k ± 0% 315k ± 0% +0.16% (p=0.000 n=587+594) Reflect 975k ± 0% 976k ± 0% +0.10% (p=0.000 n=590+594) Tar 344k ± 0% 345k ± 0% +0.24% (p=0.000 n=595+590) XML 422k ± 0% 424k ± 0% +0.57% (p=0.000 n=590+589) LinkCompiler 538k ± 0% 538k ± 0% -0.00% (p=0.045 n=592+587) ExternalLinkCompiler 593k ± 0% 593k ± 0% ~ (p=0.171 n=588+587) LinkWithoutDebugCompiler 172k ± 0% 172k ± 0% ~ (p=0.996 n=590+585) [Geo mean] 685k 685k -0.02% name old maxRSS/op new maxRSS/op delta Template 53.7M ± 8% 53.8M ± 8% ~ (p=0.666 n=576+574) Unicode 54.4M ±12% 55.0M ±10% +1.15% (p=0.000 n=591+588) GoTypes 95.1M ± 4% 95.1M ± 4% ~ (p=0.948 n=589+591) Compiler 334M ± 6% 334M ± 6% ~ (p=0.875 n=592+593) SSA 792M ± 5% 791M ± 5% ~ (p=0.067 n=592+591) Flate 39.9M ±11% 40.0M ±10% ~ (p=0.131 n=596+596) GoParser 45.2M ±11% 45.3M ±11% ~ (p=0.353 n=592+590) Reflect 76.1M ± 5% 76.2M ± 5% ~ (p=0.114 n=594+594) Tar 49.4M ±10% 49.6M ± 9% +0.57% (p=0.015 n=590+593) XML 57.4M ± 9% 57.7M ± 8% +0.67% (p=0.000 n=592+580) LinkCompiler 183M ± 2% 183M ± 2% ~ (p=0.229 n=587+591) ExternalLinkCompiler 187M ± 2% 187M ± 3% ~ (p=0.362 n=571+562) LinkWithoutDebugCompiler 143M ± 3% 143M ± 3% ~ (p=0.350 n=584+586) [Geo mean] 103M 103M +0.23% Passes toolstash-check. Fixes #4617. Change-Id: Id4f6759b4afc5e002770091d0d4f6e272ee6cbdd Reviewed-on: https://go-review.googlesource.com/c/go/+/272654 Reviewed-by: Robert Griesemer <gri@golang.org> Trust: Matthew Dempsky <mdempsky@google.com>
2020-11-13 23:36:48 -08:00
path_ := constant.StringVal(f)
if len(path_) == 0 {
base.Errorf("import path is empty")
return nil
}
if isbadimport(path_, false) {
return nil
}
// The package name main is no longer reserved,
// but we reserve the import path "main" to identify
// the main package, just as we reserve the import
// path "math" to identify the standard math package.
if path_ == "main" {
base.Errorf("cannot import \"main\"")
base.ErrorExit()
}
if base.Ctxt.Pkgpath != "" && path_ == base.Ctxt.Pkgpath {
base.Errorf("import %q while compiling that package (import cycle)", path_)
base.ErrorExit()
}
if mapped, ok := base.Flag.Cfg.ImportMap[path_]; ok {
path_ = mapped
}
if path_ == "unsafe" {
return unsafepkg
}
if islocalname(path_) {
if path_[0] == '/' {
base.Errorf("import path cannot be absolute path")
return nil
}
prefix := base.Ctxt.Pathname
if base.Flag.D != "" {
prefix = base.Flag.D
}
path_ = path.Join(prefix, path_)
if isbadimport(path_, true) {
return nil
}
}
file, found := findpkg(path_)
if !found {
base.Errorf("can't find import: %q", path_)
base.ErrorExit()
}
importpkg := types.NewPkg(path_, "")
if importpkg.Imported {
return importpkg
}
importpkg.Imported = true
imp, err := bio.Open(file)
if err != nil {
base.Errorf("can't open import: %q: %v", path_, err)
base.ErrorExit()
}
defer imp.Close()
// check object header
p, err := imp.ReadString('\n')
if err != nil {
base.Errorf("import %s: reading input: %v", file, err)
base.ErrorExit()
}
if p == "!<arch>\n" { // package archive
// package export block should be first
sz := arsize(imp.Reader, "__.PKGDEF")
if sz <= 0 {
base.Errorf("import %s: not a package file", file)
base.ErrorExit()
}
p, err = imp.ReadString('\n')
if err != nil {
base.Errorf("import %s: reading input: %v", file, err)
base.ErrorExit()
}
}
if !strings.HasPrefix(p, "go object ") {
base.Errorf("import %s: not a go object file: %s", file, p)
base.ErrorExit()
}
q := fmt.Sprintf("%s %s %s %s\n", objabi.GOOS, objabi.GOARCH, objabi.Version, objabi.Expstring())
if p[10:] != q {
base.Errorf("import %s: object is [%s] expected [%s]", file, p[10:], q)
base.ErrorExit()
}
// process header lines
for {
p, err = imp.ReadString('\n')
if err != nil {
base.Errorf("import %s: reading input: %v", file, err)
base.ErrorExit()
}
if p == "\n" {
break // header ends with blank line
}
}
// Expect $$B\n to signal binary import format.
// look for $$
var c byte
for {
c, err = imp.ReadByte()
if err != nil {
break
}
if c == '$' {
c, err = imp.ReadByte()
if c == '$' || err != nil {
break
}
}
}
// get character after $$
if err == nil {
c, _ = imp.ReadByte()
}
var fingerprint goobj.FingerprintType
switch c {
case '\n':
base.Errorf("cannot import %s: old export format no longer supported (recompile library)", path_)
return nil
case 'B':
if base.Debug.Export != 0 {
cmd/compile: export inlined function bodies Completed implementation for exporting inlined functions using the new binary export format. This change passes (export GO_GCFLAGS=-newexport; make all.bash) but for gc's builtin_test.go which we need to adjust before enabling this code by default. For a high-level description of the export format see the comment at the top of bexport.go. Major changes: 1) The export format for the platform independent export data changed: When we export inlined function bodies, additional objects (other functions, types, etc.) that are referred to by the function bodies will need to be exported. While this doesn't affect the platform-independent portion directly, it adds more objects to the exportlist while we are exporting. Instead of trying to sort the objects into groups, just export objects as they appear in the export list. This is slightly less compact (one extra byte per object), but it is simpler and much more flexible. 2) The export format contains now three sections: 1) The plat- form independent objects, 2) the objects pulled in for export via inlined function bodies, and 3) the inlined function bodies. 3) Completed the exporting and importing code for inlined function bodies. The format is completely compiler-specific and easily changeable w/o affecting other tools. There is still quite a bit of room for denser encoding. This can happen at any time in the future. This change contains also the adjustments for go/internal/gcimporter, necessary because of the export format change 1) mentioned above. For #13241. Change-Id: I86bca0bd984b12ccf13d0d30892e6e25f6d04ed5 Reviewed-on: https://go-review.googlesource.com/21172 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 17:21:32 -07:00
fmt.Printf("importing %s (%s)\n", path_, file)
}
imp.ReadByte() // skip \n after $$B
cmd/compile: add indexed export format This CL introduces a new indexed data format for package export data. This improves on the previous (sequential) binary format by allowing the compiler to selectively (and lazily) load only the data that's actually needed for compilation. In large Go projects, the package export data can become very large due to transitive type declaration dependencies and inline function/method bodies. By lazily loading these declarations and bodies as needed, we avoid wasting time and memory processing unnecessary and/or redundant data. In the benchmarks below, "old" is -iexport=false and "new" is -iexport=true. The suffixes indicate the compiler concurrency (-c) and inlining (-l) settings used for the build (using -gcflags=all=-foo). Benchmarks were run on an HP Z620. Juju is "go build -a github.com/juju/juju/cmd/...": name old real-time/op new real-time/op delta Juju/c=1/l=0 44.0s ± 1% 38.7s ± 9% -11.97% (p=0.001 n=7+7) Juju/c=1/l=4 53.7s ± 3% 45.3s ± 4% -15.53% (p=0.001 n=7+7) Juju/c=4/l=0 39.7s ± 8% 32.0s ± 4% -19.38% (p=0.001 n=7+7) Juju/c=4/l=4 46.3s ± 4% 38.0s ± 4% -18.06% (p=0.001 n=7+7) name old user-time/op new user-time/op delta Juju/c=1/l=0 371s ± 1% 300s ± 0% -19.07% (p=0.001 n=7+6) Juju/c=1/l=4 482s ± 0% 374s ± 1% -22.37% (p=0.001 n=7+7) Juju/c=4/l=0 410s ± 1% 340s ± 1% -17.19% (p=0.001 n=7+7) Juju/c=4/l=4 532s ± 1% 424s ± 1% -20.26% (p=0.001 n=7+7) name old sys-time/op new sys-time/op delta Juju/c=1/l=0 33.4s ± 1% 28.4s ± 2% -15.02% (p=0.001 n=7+7) Juju/c=1/l=4 40.7s ± 2% 32.8s ± 3% -19.51% (p=0.001 n=7+7) Juju/c=4/l=0 39.8s ± 2% 34.4s ± 2% -13.74% (p=0.001 n=7+7) Juju/c=4/l=4 48.4s ± 2% 40.4s ± 2% -16.50% (p=0.001 n=7+7) Kubelet is "go build -a k8s.io/kubernetes/cmd/kubelet": name old real-time/op new real-time/op delta Kubelet/c=1/l=0 42.0s ± 1% 34.8s ± 1% -17.27% (p=0.008 n=5+5) Kubelet/c=1/l=4 55.4s ± 3% 45.4s ± 3% -18.06% (p=0.002 n=6+6) Kubelet/c=4/l=0 37.4s ± 3% 29.9s ± 1% -20.25% (p=0.004 n=6+5) Kubelet/c=4/l=4 48.1s ± 2% 39.0s ± 5% -18.93% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Kubelet/c=1/l=0 291s ± 1% 233s ± 1% -19.96% (p=0.002 n=6+6) Kubelet/c=1/l=4 385s ± 1% 298s ± 1% -22.51% (p=0.002 n=6+6) Kubelet/c=4/l=0 325s ± 0% 268s ± 1% -17.48% (p=0.004 n=5+6) Kubelet/c=4/l=4 429s ± 1% 343s ± 1% -20.08% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Kubelet/c=1/l=0 25.1s ± 2% 20.9s ± 4% -16.69% (p=0.002 n=6+6) Kubelet/c=1/l=4 31.2s ± 3% 24.4s ± 0% -21.67% (p=0.010 n=6+4) Kubelet/c=4/l=0 30.2s ± 2% 25.6s ± 1% -15.34% (p=0.002 n=6+6) Kubelet/c=4/l=4 37.3s ± 1% 30.9s ± 2% -17.11% (p=0.002 n=6+6) Change-Id: Ie43eb3bbe1392cbb61c86792a17a57b33b9561f0 Reviewed-on: https://go-review.googlesource.com/106796 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-01 01:55:55 -07:00
c, err = imp.ReadByte()
if err != nil {
base.Errorf("import %s: reading input: %v", file, err)
base.ErrorExit()
cmd/compile: add indexed export format This CL introduces a new indexed data format for package export data. This improves on the previous (sequential) binary format by allowing the compiler to selectively (and lazily) load only the data that's actually needed for compilation. In large Go projects, the package export data can become very large due to transitive type declaration dependencies and inline function/method bodies. By lazily loading these declarations and bodies as needed, we avoid wasting time and memory processing unnecessary and/or redundant data. In the benchmarks below, "old" is -iexport=false and "new" is -iexport=true. The suffixes indicate the compiler concurrency (-c) and inlining (-l) settings used for the build (using -gcflags=all=-foo). Benchmarks were run on an HP Z620. Juju is "go build -a github.com/juju/juju/cmd/...": name old real-time/op new real-time/op delta Juju/c=1/l=0 44.0s ± 1% 38.7s ± 9% -11.97% (p=0.001 n=7+7) Juju/c=1/l=4 53.7s ± 3% 45.3s ± 4% -15.53% (p=0.001 n=7+7) Juju/c=4/l=0 39.7s ± 8% 32.0s ± 4% -19.38% (p=0.001 n=7+7) Juju/c=4/l=4 46.3s ± 4% 38.0s ± 4% -18.06% (p=0.001 n=7+7) name old user-time/op new user-time/op delta Juju/c=1/l=0 371s ± 1% 300s ± 0% -19.07% (p=0.001 n=7+6) Juju/c=1/l=4 482s ± 0% 374s ± 1% -22.37% (p=0.001 n=7+7) Juju/c=4/l=0 410s ± 1% 340s ± 1% -17.19% (p=0.001 n=7+7) Juju/c=4/l=4 532s ± 1% 424s ± 1% -20.26% (p=0.001 n=7+7) name old sys-time/op new sys-time/op delta Juju/c=1/l=0 33.4s ± 1% 28.4s ± 2% -15.02% (p=0.001 n=7+7) Juju/c=1/l=4 40.7s ± 2% 32.8s ± 3% -19.51% (p=0.001 n=7+7) Juju/c=4/l=0 39.8s ± 2% 34.4s ± 2% -13.74% (p=0.001 n=7+7) Juju/c=4/l=4 48.4s ± 2% 40.4s ± 2% -16.50% (p=0.001 n=7+7) Kubelet is "go build -a k8s.io/kubernetes/cmd/kubelet": name old real-time/op new real-time/op delta Kubelet/c=1/l=0 42.0s ± 1% 34.8s ± 1% -17.27% (p=0.008 n=5+5) Kubelet/c=1/l=4 55.4s ± 3% 45.4s ± 3% -18.06% (p=0.002 n=6+6) Kubelet/c=4/l=0 37.4s ± 3% 29.9s ± 1% -20.25% (p=0.004 n=6+5) Kubelet/c=4/l=4 48.1s ± 2% 39.0s ± 5% -18.93% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Kubelet/c=1/l=0 291s ± 1% 233s ± 1% -19.96% (p=0.002 n=6+6) Kubelet/c=1/l=4 385s ± 1% 298s ± 1% -22.51% (p=0.002 n=6+6) Kubelet/c=4/l=0 325s ± 0% 268s ± 1% -17.48% (p=0.004 n=5+6) Kubelet/c=4/l=4 429s ± 1% 343s ± 1% -20.08% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Kubelet/c=1/l=0 25.1s ± 2% 20.9s ± 4% -16.69% (p=0.002 n=6+6) Kubelet/c=1/l=4 31.2s ± 3% 24.4s ± 0% -21.67% (p=0.010 n=6+4) Kubelet/c=4/l=0 30.2s ± 2% 25.6s ± 1% -15.34% (p=0.002 n=6+6) Kubelet/c=4/l=4 37.3s ± 1% 30.9s ± 2% -17.11% (p=0.002 n=6+6) Change-Id: Ie43eb3bbe1392cbb61c86792a17a57b33b9561f0 Reviewed-on: https://go-review.googlesource.com/106796 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-01 01:55:55 -07:00
}
// Indexed format is distinguished by an 'i' byte,
// whereas previous export formats started with 'c', 'd', or 'v'.
if c != 'i' {
base.Errorf("import %s: unexpected package format byte: %v", file, c)
base.ErrorExit()
cmd/compile: add indexed export format This CL introduces a new indexed data format for package export data. This improves on the previous (sequential) binary format by allowing the compiler to selectively (and lazily) load only the data that's actually needed for compilation. In large Go projects, the package export data can become very large due to transitive type declaration dependencies and inline function/method bodies. By lazily loading these declarations and bodies as needed, we avoid wasting time and memory processing unnecessary and/or redundant data. In the benchmarks below, "old" is -iexport=false and "new" is -iexport=true. The suffixes indicate the compiler concurrency (-c) and inlining (-l) settings used for the build (using -gcflags=all=-foo). Benchmarks were run on an HP Z620. Juju is "go build -a github.com/juju/juju/cmd/...": name old real-time/op new real-time/op delta Juju/c=1/l=0 44.0s ± 1% 38.7s ± 9% -11.97% (p=0.001 n=7+7) Juju/c=1/l=4 53.7s ± 3% 45.3s ± 4% -15.53% (p=0.001 n=7+7) Juju/c=4/l=0 39.7s ± 8% 32.0s ± 4% -19.38% (p=0.001 n=7+7) Juju/c=4/l=4 46.3s ± 4% 38.0s ± 4% -18.06% (p=0.001 n=7+7) name old user-time/op new user-time/op delta Juju/c=1/l=0 371s ± 1% 300s ± 0% -19.07% (p=0.001 n=7+6) Juju/c=1/l=4 482s ± 0% 374s ± 1% -22.37% (p=0.001 n=7+7) Juju/c=4/l=0 410s ± 1% 340s ± 1% -17.19% (p=0.001 n=7+7) Juju/c=4/l=4 532s ± 1% 424s ± 1% -20.26% (p=0.001 n=7+7) name old sys-time/op new sys-time/op delta Juju/c=1/l=0 33.4s ± 1% 28.4s ± 2% -15.02% (p=0.001 n=7+7) Juju/c=1/l=4 40.7s ± 2% 32.8s ± 3% -19.51% (p=0.001 n=7+7) Juju/c=4/l=0 39.8s ± 2% 34.4s ± 2% -13.74% (p=0.001 n=7+7) Juju/c=4/l=4 48.4s ± 2% 40.4s ± 2% -16.50% (p=0.001 n=7+7) Kubelet is "go build -a k8s.io/kubernetes/cmd/kubelet": name old real-time/op new real-time/op delta Kubelet/c=1/l=0 42.0s ± 1% 34.8s ± 1% -17.27% (p=0.008 n=5+5) Kubelet/c=1/l=4 55.4s ± 3% 45.4s ± 3% -18.06% (p=0.002 n=6+6) Kubelet/c=4/l=0 37.4s ± 3% 29.9s ± 1% -20.25% (p=0.004 n=6+5) Kubelet/c=4/l=4 48.1s ± 2% 39.0s ± 5% -18.93% (p=0.002 n=6+6) name old user-time/op new user-time/op delta Kubelet/c=1/l=0 291s ± 1% 233s ± 1% -19.96% (p=0.002 n=6+6) Kubelet/c=1/l=4 385s ± 1% 298s ± 1% -22.51% (p=0.002 n=6+6) Kubelet/c=4/l=0 325s ± 0% 268s ± 1% -17.48% (p=0.004 n=5+6) Kubelet/c=4/l=4 429s ± 1% 343s ± 1% -20.08% (p=0.002 n=6+6) name old sys-time/op new sys-time/op delta Kubelet/c=1/l=0 25.1s ± 2% 20.9s ± 4% -16.69% (p=0.002 n=6+6) Kubelet/c=1/l=4 31.2s ± 3% 24.4s ± 0% -21.67% (p=0.010 n=6+4) Kubelet/c=4/l=0 30.2s ± 2% 25.6s ± 1% -15.34% (p=0.002 n=6+6) Kubelet/c=4/l=4 37.3s ± 1% 30.9s ± 2% -17.11% (p=0.002 n=6+6) Change-Id: Ie43eb3bbe1392cbb61c86792a17a57b33b9561f0 Reviewed-on: https://go-review.googlesource.com/106796 Run-TryBot: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Robert Griesemer <gri@golang.org>
2018-04-01 01:55:55 -07:00
}
fingerprint = iimport(importpkg, imp)
default:
base.Errorf("no import in %q", path_)
base.ErrorExit()
}
// assume files move (get installed) so don't record the full path
if base.Flag.Cfg.PackageFile != nil {
// If using a packageFile map, assume path_ can be recorded directly.
base.Ctxt.AddImport(path_, fingerprint)
} else {
// For file "/Users/foo/go/pkg/darwin_amd64/math.a" record "math.a".
base.Ctxt.AddImport(file[len(file)-len(path_)-len(".a"):], fingerprint)
}
if importpkg.Height >= myheight {
myheight = importpkg.Height + 1
}
return importpkg
}
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
func pkgnotused(lineno src.XPos, path string, name string) {
// If the package was imported with a name other than the final
// import path element, show it explicitly in the error message.
// Note that this handles both renamed imports and imports of
// packages containing unconventional package declarations.
// Note that this uses / always, even on Windows, because Go import
// paths always use forward slashes.
elem := path
if i := strings.LastIndex(elem, "/"); i >= 0 {
elem = elem[i+1:]
}
if name == "" || elem == name {
base.ErrorfAt(lineno, "imported and not used: %q", path)
} else {
base.ErrorfAt(lineno, "imported and not used: %q as %s", path, name)
}
}
func mkpackage(pkgname string) {
if types.LocalPkg.Name == "" {
if pkgname == "_" {
base.Errorf("invalid package name _")
}
types.LocalPkg.Name = pkgname
} else {
if pkgname != types.LocalPkg.Name {
base.Errorf("package %s; expected %s", pkgname, types.LocalPkg.Name)
}
}
}
func clearImports() {
type importedPkg struct {
pos src.XPos
path string
name string
}
var unused []importedPkg
for _, s := range types.LocalPkg.Syms {
[dev.regabi] cmd/compile: introduce cmd/compile/internal/ir [generated] If we want to break up package gc at all, we will need to move the compiler IR it defines into a separate package that can be imported by packages that gc itself imports. This CL does that. It also removes the TINT8 etc aliases so that all code is clear about which package things are coming from. This CL is automatically generated by the script below. See the comments in the script for details about the changes. [git-generate] cd src/cmd/compile/internal/gc rf ' # These names were never fully qualified # when the types package was added. # Do it now, to avoid confusion about where they live. inline -rm \ Txxx \ TINT8 \ TUINT8 \ TINT16 \ TUINT16 \ TINT32 \ TUINT32 \ TINT64 \ TUINT64 \ TINT \ TUINT \ TUINTPTR \ TCOMPLEX64 \ TCOMPLEX128 \ TFLOAT32 \ TFLOAT64 \ TBOOL \ TPTR \ TFUNC \ TSLICE \ TARRAY \ TSTRUCT \ TCHAN \ TMAP \ TINTER \ TFORW \ TANY \ TSTRING \ TUNSAFEPTR \ TIDEAL \ TNIL \ TBLANK \ TFUNCARGS \ TCHANARGS \ NTYPE \ BADWIDTH # esc.go and escape.go do not need to be split. # Append esc.go onto the end of escape.go. mv esc.go escape.go # Pull out the type format installation from func Main, # so it can be carried into package ir. mv Main:/Sconv.=/-0,/TypeLinkSym/-1 InstallTypeFormats # Names that need to be exported for use by code left in gc. mv Isconst IsConst mv asNode AsNode mv asNodes AsNodes mv asTypesNode AsTypesNode mv basicnames BasicTypeNames mv builtinpkg BuiltinPkg mv consttype ConstType mv dumplist DumpList mv fdumplist FDumpList mv fmtMode FmtMode mv goopnames OpNames mv inspect Inspect mv inspectList InspectList mv localpkg LocalPkg mv nblank BlankNode mv numImport NumImport mv opprec OpPrec mv origSym OrigSym mv stmtwithinit StmtWithInit mv dump DumpAny mv fdump FDumpAny mv nod Nod mv nodl NodAt mv newname NewName mv newnamel NewNameAt mv assertRepresents AssertValidTypeForConst mv represents ValidTypeForConst mv nodlit NewLiteral # Types and fields that need to be exported for use by gc. mv nowritebarrierrecCallSym SymAndPos mv SymAndPos.lineno SymAndPos.Pos mv SymAndPos.target SymAndPos.Sym mv Func.lsym Func.LSym mv Func.setWBPos Func.SetWBPos mv Func.numReturns Func.NumReturns mv Func.numDefers Func.NumDefers mv Func.nwbrCalls Func.NWBRCalls # initLSym is an algorithm left behind in gc, # not an operation on Func itself. mv Func.initLSym initLSym mv nodeQueue NodeQueue mv NodeQueue.empty NodeQueue.Empty mv NodeQueue.popLeft NodeQueue.PopLeft mv NodeQueue.pushRight NodeQueue.PushRight # Many methods on Node are actually algorithms that # would apply to any node implementation. # Those become plain functions. mv Node.funcname FuncName mv Node.isBlank IsBlank mv Node.isGoConst isGoConst mv Node.isNil IsNil mv Node.isParamHeapCopy isParamHeapCopy mv Node.isParamStackCopy isParamStackCopy mv Node.isSimpleName isSimpleName mv Node.mayBeShared MayBeShared mv Node.pkgFuncName PkgFuncName mv Node.backingArrayPtrLen backingArrayPtrLen mv Node.isterminating isTermNode mv Node.labeledControl labeledControl mv Nodes.isterminating isTermNodes mv Nodes.sigerr fmtSignature mv Node.MethodName methodExprName mv Node.MethodFunc methodExprFunc mv Node.IsMethod IsMethod # Every node will need to implement RawCopy; # Copy and SepCopy algorithms will use it. mv Node.rawcopy Node.RawCopy mv Node.copy Copy mv Node.sepcopy SepCopy # Extract Node.Format method body into func FmtNode, # but leave method wrapper behind. mv Node.Format:0,$ FmtNode # Formatting helpers that will apply to all node implementations. mv Node.Line Line mv Node.exprfmt exprFmt mv Node.jconv jconvFmt mv Node.modeString modeString mv Node.nconv nconvFmt mv Node.nodedump nodeDumpFmt mv Node.nodefmt nodeFmt mv Node.stmtfmt stmtFmt # Constant support needed for code moving to ir. mv okforconst OKForConst mv vconv FmtConst mv int64Val Int64Val mv float64Val Float64Val mv Node.ValueInterface ConstValue # Organize code into files. mv LocalPkg BuiltinPkg ir.go mv NumImport InstallTypeFormats Line fmt.go mv syntax.go Nod NodAt NewNameAt Class Pxxx PragmaFlag Nointerface SymAndPos \ AsNode AsTypesNode BlankNode OrigSym \ Node.SliceBounds Node.SetSliceBounds Op.IsSlice3 \ IsConst Node.Int64Val Node.CanInt64 Node.Uint64Val Node.BoolVal Node.StringVal \ Node.RawCopy SepCopy Copy \ IsNil IsBlank IsMethod \ Node.Typ Node.StorageClass node.go mv ConstType ConstValue Int64Val Float64Val AssertValidTypeForConst ValidTypeForConst NewLiteral idealType OKForConst val.go # Move files to new ir package. mv bitset.go class_string.go dump.go fmt.go \ ir.go node.go op_string.go val.go \ sizeof_test.go cmd/compile/internal/ir ' : # fix mkbuiltin.go to generate the changes made to builtin.go during rf sed -i '' ' s/\[T/[types.T/g s/\*Node/*ir.Node/g /internal\/types/c \ fmt.Fprintln(&b, `import (`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/ir"`) \ fmt.Fprintln(&b, ` "cmd/compile/internal/types"`) \ fmt.Fprintln(&b, `)`) ' mkbuiltin.go gofmt -w mkbuiltin.go : # update cmd/dist to add internal/ir cd ../../../dist sed -i '' '/compile.internal.gc/a\ "cmd/compile/internal/ir", ' buildtool.go gofmt -w buildtool.go : # update cmd/compile TestFormats cd ../.. go install std cmd cd cmd/compile go test -u || go test # first one updates but fails; second passes Change-Id: I5f7caf6b20629b51970279e81231a3574d5b51db Reviewed-on: https://go-review.googlesource.com/c/go/+/273008 Trust: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-19 21:09:22 -05:00
n := ir.AsNode(s.Def)
if n == nil {
continue
}
[dev.regabi] cmd/compile: use Node getters and setters [generated] Now that we have all the getters and setters defined, use them and unexport all the actual Node fields. This is the next step toward replacing Node with an interface. [git-generate] cd src/cmd/compile/internal/gc rf ' ex . ../ir ../ssa { import "cmd/compile/internal/ir" import "cmd/compile/internal/types" import "cmd/internal/src" var n, x *ir.Node var op ir.Op var t *types.Type var f *ir.Func var m *ir.Name var s *types.Sym var p src.XPos var i int64 var e uint16 var nodes ir.Nodes n.Op = op -> n.SetOp(op) n.Left = x -> n.SetLeft(x) n.Right = x -> n.SetRight(x) n.Orig = x -> n.SetOrig(x) n.Type = t -> n.SetType(t) n.Func = f -> n.SetFunc(f) n.Name = m -> n.SetName(m) n.Sym = s -> n.SetSym(s) n.Pos = p -> n.SetPos(p) n.Xoffset = i -> n.SetXoffset(i) n.Esc = e -> n.SetEsc(e) n.Ninit.Append -> n.PtrNinit().Append n.Ninit.AppendNodes -> n.PtrNinit().AppendNodes n.Ninit.MoveNodes -> n.PtrNinit().MoveNodes n.Ninit.Prepend -> n.PtrNinit().Prepend n.Ninit.Set -> n.PtrNinit().Set n.Ninit.Set1 -> n.PtrNinit().Set1 n.Ninit.Set2 -> n.PtrNinit().Set2 n.Ninit.Set3 -> n.PtrNinit().Set3 &n.Ninit -> n.PtrNinit() n.Ninit = nodes -> n.SetNinit(nodes) n.Nbody.Append -> n.PtrNbody().Append n.Nbody.AppendNodes -> n.PtrNbody().AppendNodes n.Nbody.MoveNodes -> n.PtrNbody().MoveNodes n.Nbody.Prepend -> n.PtrNbody().Prepend n.Nbody.Set -> n.PtrNbody().Set n.Nbody.Set1 -> n.PtrNbody().Set1 n.Nbody.Set2 -> n.PtrNbody().Set2 n.Nbody.Set3 -> n.PtrNbody().Set3 &n.Nbody -> n.PtrNbody() n.Nbody = nodes -> n.SetNbody(nodes) n.List.Append -> n.PtrList().Append n.List.AppendNodes -> n.PtrList().AppendNodes n.List.MoveNodes -> n.PtrList().MoveNodes n.List.Prepend -> n.PtrList().Prepend n.List.Set -> n.PtrList().Set n.List.Set1 -> n.PtrList().Set1 n.List.Set2 -> n.PtrList().Set2 n.List.Set3 -> n.PtrList().Set3 &n.List -> n.PtrList() n.List = nodes -> n.SetList(nodes) n.Rlist.Append -> n.PtrRlist().Append n.Rlist.AppendNodes -> n.PtrRlist().AppendNodes n.Rlist.MoveNodes -> n.PtrRlist().MoveNodes n.Rlist.Prepend -> n.PtrRlist().Prepend n.Rlist.Set -> n.PtrRlist().Set n.Rlist.Set1 -> n.PtrRlist().Set1 n.Rlist.Set2 -> n.PtrRlist().Set2 n.Rlist.Set3 -> n.PtrRlist().Set3 &n.Rlist -> n.PtrRlist() n.Rlist = nodes -> n.SetRlist(nodes) } ex . ../ir ../ssa { import "cmd/compile/internal/ir" var n *ir.Node n.Op -> n.GetOp() n.Left -> n.GetLeft() n.Right -> n.GetRight() n.Orig -> n.GetOrig() n.Type -> n.GetType() n.Func -> n.GetFunc() n.Name -> n.GetName() n.Sym -> n.GetSym() n.Pos -> n.GetPos() n.Xoffset -> n.GetXoffset() n.Esc -> n.GetEsc() avoid (*ir.Node).PtrNinit avoid (*ir.Node).PtrNbody avoid (*ir.Node).PtrList avoid (*ir.Node).PtrRlist n.Ninit -> n.GetNinit() n.Nbody -> n.GetNbody() n.List -> n.GetList() n.Rlist -> n.GetRlist() } ' cd ../ir rf ' mv Node.Op Node.op mv Node.GetOp Node.Op mv Node.Left Node.left mv Node.GetLeft Node.Left mv Node.Right Node.right mv Node.GetRight Node.Right mv Node.Orig Node.orig mv Node.GetOrig Node.Orig mv Node.Type Node.typ mv Node.GetType Node.Type mv Node.Func Node.fn mv Node.GetFunc Node.Func mv Node.Name Node.name mv Node.GetName Node.Name # All uses are in other Node methods already. mv Node.E Node.e mv Node.Sym Node.sym mv Node.GetSym Node.Sym mv Node.Pos Node.pos mv Node.GetPos Node.Pos mv Node.Esc Node.esc mv Node.GetEsc Node.Esc # While we are here, rename Xoffset to more idiomatic Offset. mv Node.Xoffset Node.offset mv Node.GetXoffset Node.Offset mv Node.SetXoffset Node.SetOffset # While we are here, rename Ninit, Nbody to more idiomatic Init, Body. mv Node.Ninit Node.init mv Node.GetNinit Node.Init mv Node.PtrNinit Node.PtrInit mv Node.SetNinit Node.SetInit mv Node.Nbody Node.body mv Node.GetNbody Node.Body mv Node.PtrNbody Node.PtrBody mv Node.SetNbody Node.SetBody mv Node.List Node.list mv Node.GetList Node.List mv Node.Rlist Node.rlist mv Node.GetRlist Node.Rlist # Unexport these mv Node.SetHasOpt Node.setHasOpt mv Node.SetHasVal Node.setHasVal ' Change-Id: I9894f633375c5237a29b6d6d7b89ba181b56ca3a Reviewed-on: https://go-review.googlesource.com/c/go/+/273009 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-11-22 09:59:15 -05:00
if n.Op() == ir.OPACK {
// throw away top-level package name left over
// from previous file.
// leave s->block set to cause redeclaration
// errors if a conflicting top-level name is
// introduced by a different file.
p := n.(*ir.PkgName)
if !p.Used && base.SyntaxErrors() == 0 {
unused = append(unused, importedPkg{p.Pos(), p.Pkg.Path, s.Name})
}
s.Def = nil
continue
}
if IsAlias(s) {
// throw away top-level name left over
// from previous import . "x"
2020-12-13 10:35:20 -08:00
// We'll report errors after type checking in checkDotImports.
s.Def = nil
continue
}
}
sort.Slice(unused, func(i, j int) bool { return unused[i].pos.Before(unused[j].pos) })
for _, pkg := range unused {
pkgnotused(pkg.pos, pkg.path, pkg.name)
}
}
func IsAlias(sym *types.Sym) bool {
return sym.Def != nil && sym.Def.Sym() != sym
cmd/compile: add initial backend concurrency support This CL adds initial support for concurrent backend compilation. BACKGROUND The compiler currently consists (very roughly) of the following phases: 1. Initialization. 2. Lexing and parsing into the cmd/compile/internal/syntax AST. 3. Translation into the cmd/compile/internal/gc AST. 4. Some gc AST passes: typechecking, escape analysis, inlining, closure handling, expression evaluation ordering (order.go), and some lowering and optimization (walk.go). 5. Translation into the cmd/compile/internal/ssa SSA form. 6. Optimization and lowering of SSA form. 7. Translation from SSA form to assembler instructions. 8. Translation from assembler instructions to machine code. 9. Writing lots of output: machine code, DWARF symbols, type and reflection info, export data. Phase 2 was already concurrent as of Go 1.8. Phase 3 is planned for eventual removal; we hope to go straight from syntax AST to SSA. Phases 5–8 are per-function; this CL adds support for processing multiple functions concurrently. The slowest phases in the compiler are 5 and 6, so this offers the opportunity for some good speed-ups. Unfortunately, it's not quite that straightforward. In the current compiler, the latter parts of phase 4 (order, walk) are done function-at-a-time as needed. Making order and walk concurrency-safe proved hard, and they're not particularly slow, so there wasn't much reward. To enable phases 5–8 to be done concurrently, when concurrent backend compilation is requested, we complete phase 4 for all functions before starting later phases for any functions. Also, in reality, we automatically generate new functions in phase 9, such as method wrappers and equality and has routines. Those new functions then go through phases 4–8. This CL disables concurrent backend compilation after the first, big, user-provided batch of functions has been compiled. This is done to keep things simple, and because the autogenerated functions tend to be small, few, simple, and fast to compile. USAGE Concurrent backend compilation still defaults to off. To set the number of functions that may be backend-compiled concurrently, use the compiler flag -c. In future work, cmd/go will automatically set -c. Furthermore, this CL has been intentionally written so that the c=1 path has no backend concurrency whatsoever, not even spawning any goroutines. This helps ensure that, should problems arise late in the development cycle, we can simply have cmd/go set c=1 always, and revert to the original compiler behavior. MUTEXES Most of the work required to make concurrent backend compilation safe has occurred over the past month. This CL adds a handful of mutexes to get the rest of the way there; they are the mutexes that I didn't see a clean way to avoid. Some of them may still be eliminable in future work. In no particular order: * gc.funcsymsmu. The global funcsyms slice is populated lazily when we need function symbols for closures. This occurs during gc AST to SSA translation. The function funcsym also does a package lookup, which is a source of races on types.Pkg.Syms; funcsymsmu also covers that package lookup. This mutex is low priority: it adds a single global, it is in an infrequently used code path, and it is low contention. Since funcsyms may now be added in any order, we must sort them to preserve reproducible builds. * gc.largeStackFramesMu. We don't discover until after SSA compilation that a function's stack frame is gigantic. Recording that error happens basically never, but it does happen concurrently. Fix with a low priority mutex and sorting. * obj.Link.hashmu. ctxt.hash stores the mapping from types.Syms (compiler symbols) to obj.LSyms (linker symbols). It is accessed fairly heavily through all the phases. This is the only heavily contended mutex. * gc.signatlistmu. The global signatlist map is populated with types through several of the concurrent phases, including notably via ngotype during DWARF generation. It is low priority for removal. * gc.typepkgmu. Looking up symbols in the types package happens a fair amount during backend compilation and DWARF generation, particularly via ngotype. This mutex helps us to avoid a broader mutex on types.Pkg.Syms. It has low-to-moderate contention. * types.internedStringsmu. gc AST to SSA conversion and some SSA work introduce new autotmps. Those autotmps have their names interned to reduce allocations. That interning requires protecting types.internedStrings. The autotmp names are heavily re-used, and the mutex overhead and contention here are low, so it is probably a worthwhile performance optimization to keep this mutex. TESTING I have been testing this code locally by running 'go install -race cmd/compile' and then doing 'go build -a -gcflags=-c=128 std cmd' for all architectures and a variety of compiler flags. This obviously needs to be made part of the builders, but it is too expensive to make part of all.bash. I have filed #19962 for this. REPRODUCIBLE BUILDS This version of the compiler generates reproducible builds. Testing reproducible builds also needs automation, however, and is also too expensive for all.bash. This is #19961. Also of note is that some of the compiler flags used by 'toolstash -cmp' are currently incompatible with concurrent backend compilation. They still work fine with c=1. Time will tell whether this is a problem. NEXT STEPS * Continue to find and fix races and bugs, using a combination of code inspection, fuzzing, and hopefully some community experimentation. I do not know of any outstanding races, but there probably are some. * Improve testing. * Improve performance, for many values of c. * Integrate with cmd/go and fine tune. * Support concurrent compilation with the -race flag. It is a sad irony that it does not yet work. * Minor code cleanup that has been deferred during the last month due to uncertainty about the ultimate shape of this CL. PERFORMANCE Here's the buried lede, at last. :) All benchmarks are from my 8 core 2.9 GHz Intel Core i7 darwin/amd64 laptop. First, going from tip to this CL with c=1 has almost no impact. name old time/op new time/op delta Template 195ms ± 3% 194ms ± 5% ~ (p=0.370 n=30+29) Unicode 86.6ms ± 3% 87.0ms ± 7% ~ (p=0.958 n=29+30) GoTypes 548ms ± 3% 555ms ± 4% +1.35% (p=0.001 n=30+28) Compiler 2.51s ± 2% 2.54s ± 2% +1.17% (p=0.000 n=28+30) SSA 5.16s ± 3% 5.16s ± 2% ~ (p=0.910 n=30+29) Flate 124ms ± 5% 124ms ± 4% ~ (p=0.947 n=30+30) GoParser 146ms ± 3% 146ms ± 3% ~ (p=0.150 n=29+28) Reflect 354ms ± 3% 352ms ± 4% ~ (p=0.096 n=29+29) Tar 107ms ± 5% 106ms ± 3% ~ (p=0.370 n=30+29) XML 200ms ± 4% 201ms ± 4% ~ (p=0.313 n=29+28) [Geo mean] 332ms 333ms +0.10% name old user-time/op new user-time/op delta Template 227ms ± 5% 225ms ± 5% ~ (p=0.457 n=28+27) Unicode 109ms ± 4% 109ms ± 5% ~ (p=0.758 n=29+29) GoTypes 713ms ± 4% 721ms ± 5% ~ (p=0.051 n=30+29) Compiler 3.36s ± 2% 3.38s ± 3% ~ (p=0.146 n=30+30) SSA 7.46s ± 3% 7.47s ± 3% ~ (p=0.804 n=30+29) Flate 146ms ± 7% 147ms ± 3% ~ (p=0.833 n=29+27) GoParser 179ms ± 5% 179ms ± 5% ~ (p=0.866 n=30+30) Reflect 431ms ± 4% 429ms ± 4% ~ (p=0.593 n=29+30) Tar 124ms ± 5% 123ms ± 5% ~ (p=0.140 n=29+29) XML 243ms ± 4% 242ms ± 7% ~ (p=0.404 n=29+29) [Geo mean] 415ms 415ms +0.02% name old obj-bytes new obj-bytes delta Template 382k ± 0% 382k ± 0% ~ (all equal) Unicode 203k ± 0% 203k ± 0% ~ (all equal) GoTypes 1.18M ± 0% 1.18M ± 0% ~ (all equal) Compiler 3.98M ± 0% 3.98M ± 0% ~ (all equal) SSA 8.28M ± 0% 8.28M ± 0% ~ (all equal) Flate 230k ± 0% 230k ± 0% ~ (all equal) GoParser 287k ± 0% 287k ± 0% ~ (all equal) Reflect 1.00M ± 0% 1.00M ± 0% ~ (all equal) Tar 190k ± 0% 190k ± 0% ~ (all equal) XML 416k ± 0% 416k ± 0% ~ (all equal) [Geo mean] 660k 660k +0.00% Comparing this CL to itself, from c=1 to c=2 improves real times 20-30%, costs 5-10% more CPU time, and adds about 2% alloc. The allocation increase comes from allocating more ssa.Caches. name old time/op new time/op delta Template 202ms ± 3% 149ms ± 3% -26.15% (p=0.000 n=49+49) Unicode 87.4ms ± 4% 84.2ms ± 3% -3.68% (p=0.000 n=48+48) GoTypes 560ms ± 2% 398ms ± 2% -28.96% (p=0.000 n=49+49) Compiler 2.46s ± 3% 1.76s ± 2% -28.61% (p=0.000 n=48+46) SSA 6.17s ± 2% 4.04s ± 1% -34.52% (p=0.000 n=49+49) Flate 126ms ± 3% 92ms ± 2% -26.81% (p=0.000 n=49+48) GoParser 148ms ± 4% 107ms ± 2% -27.78% (p=0.000 n=49+48) Reflect 361ms ± 3% 281ms ± 3% -22.10% (p=0.000 n=49+49) Tar 109ms ± 4% 86ms ± 3% -20.81% (p=0.000 n=49+47) XML 204ms ± 3% 144ms ± 2% -29.53% (p=0.000 n=48+45) name old user-time/op new user-time/op delta Template 246ms ± 9% 246ms ± 4% ~ (p=0.401 n=50+48) Unicode 109ms ± 4% 111ms ± 4% +1.47% (p=0.000 n=44+50) GoTypes 728ms ± 3% 765ms ± 3% +5.04% (p=0.000 n=46+50) Compiler 3.33s ± 3% 3.41s ± 2% +2.31% (p=0.000 n=49+48) SSA 8.52s ± 2% 9.11s ± 2% +6.93% (p=0.000 n=49+47) Flate 149ms ± 4% 161ms ± 3% +8.13% (p=0.000 n=50+47) GoParser 181ms ± 5% 192ms ± 2% +6.40% (p=0.000 n=49+46) Reflect 452ms ± 9% 474ms ± 2% +4.99% (p=0.000 n=50+48) Tar 126ms ± 6% 136ms ± 4% +7.95% (p=0.000 n=50+49) XML 247ms ± 5% 264ms ± 3% +6.94% (p=0.000 n=48+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 39.3MB ± 0% +1.48% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.2MB ± 0% +1.19% (p=0.008 n=5+5) GoTypes 113MB ± 0% 114MB ± 0% +0.69% (p=0.008 n=5+5) Compiler 443MB ± 0% 447MB ± 0% +0.95% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.26GB ± 0% +0.89% (p=0.008 n=5+5) Flate 25.3MB ± 0% 25.9MB ± 1% +2.35% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 32.2MB ± 0% +1.59% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 78.9MB ± 0% +0.91% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.0MB ± 0% +1.80% (p=0.008 n=5+5) XML 42.4MB ± 0% 43.4MB ± 0% +2.35% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 379k ± 0% 378k ± 0% ~ (p=0.421 n=5+5) Unicode 322k ± 0% 321k ± 0% ~ (p=0.222 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.548 n=5+5) Compiler 4.12M ± 0% 4.11M ± 0% -0.14% (p=0.032 n=5+5) SSA 9.72M ± 0% 9.72M ± 0% ~ (p=0.421 n=5+5) Flate 234k ± 1% 234k ± 0% ~ (p=0.421 n=5+5) GoParser 316k ± 1% 315k ± 0% ~ (p=0.222 n=5+5) Reflect 980k ± 0% 979k ± 0% ~ (p=0.095 n=5+5) Tar 249k ± 1% 249k ± 1% ~ (p=0.841 n=5+5) XML 392k ± 0% 391k ± 0% ~ (p=0.095 n=5+5) From c=1 to c=4, real time is down ~40%, CPU usage up 10-20%, alloc up ~5%: name old time/op new time/op delta Template 203ms ± 3% 131ms ± 5% -35.45% (p=0.000 n=50+50) Unicode 87.2ms ± 4% 84.1ms ± 2% -3.61% (p=0.000 n=48+47) GoTypes 560ms ± 4% 310ms ± 2% -44.65% (p=0.000 n=50+49) Compiler 2.47s ± 3% 1.41s ± 2% -43.10% (p=0.000 n=50+46) SSA 6.17s ± 2% 3.20s ± 2% -48.06% (p=0.000 n=49+49) Flate 126ms ± 4% 74ms ± 2% -41.06% (p=0.000 n=49+48) GoParser 148ms ± 4% 89ms ± 3% -39.97% (p=0.000 n=49+50) Reflect 360ms ± 3% 242ms ± 3% -32.81% (p=0.000 n=49+49) Tar 108ms ± 4% 73ms ± 4% -32.48% (p=0.000 n=50+49) XML 203ms ± 3% 119ms ± 3% -41.56% (p=0.000 n=49+48) name old user-time/op new user-time/op delta Template 246ms ± 9% 287ms ± 9% +16.98% (p=0.000 n=50+50) Unicode 109ms ± 4% 118ms ± 5% +7.56% (p=0.000 n=46+50) GoTypes 735ms ± 4% 806ms ± 2% +9.62% (p=0.000 n=50+50) Compiler 3.34s ± 4% 3.56s ± 2% +6.78% (p=0.000 n=49+49) SSA 8.54s ± 3% 10.04s ± 3% +17.55% (p=0.000 n=50+50) Flate 149ms ± 6% 176ms ± 3% +17.82% (p=0.000 n=50+48) GoParser 181ms ± 5% 213ms ± 3% +17.47% (p=0.000 n=50+50) Reflect 453ms ± 6% 499ms ± 2% +10.11% (p=0.000 n=50+48) Tar 126ms ± 5% 149ms ±11% +18.76% (p=0.000 n=50+50) XML 246ms ± 5% 287ms ± 4% +16.53% (p=0.000 n=49+50) name old alloc/op new alloc/op delta Template 38.8MB ± 0% 40.4MB ± 0% +4.21% (p=0.008 n=5+5) Unicode 29.8MB ± 0% 30.9MB ± 0% +3.68% (p=0.008 n=5+5) GoTypes 113MB ± 0% 116MB ± 0% +2.71% (p=0.008 n=5+5) Compiler 443MB ± 0% 455MB ± 0% +2.75% (p=0.008 n=5+5) SSA 1.25GB ± 0% 1.27GB ± 0% +1.84% (p=0.008 n=5+5) Flate 25.3MB ± 0% 26.9MB ± 1% +6.31% (p=0.008 n=5+5) GoParser 31.7MB ± 0% 33.2MB ± 0% +4.61% (p=0.008 n=5+5) Reflect 78.2MB ± 0% 80.2MB ± 0% +2.53% (p=0.008 n=5+5) Tar 26.6MB ± 0% 27.9MB ± 0% +5.19% (p=0.008 n=5+5) XML 42.4MB ± 0% 44.6MB ± 0% +5.20% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Template 380k ± 0% 379k ± 0% -0.39% (p=0.032 n=5+5) Unicode 321k ± 0% 321k ± 0% ~ (p=0.841 n=5+5) GoTypes 1.14M ± 0% 1.14M ± 0% ~ (p=0.421 n=5+5) Compiler 4.12M ± 0% 4.14M ± 0% +0.52% (p=0.008 n=5+5) SSA 9.72M ± 0% 9.76M ± 0% +0.37% (p=0.008 n=5+5) Flate 234k ± 1% 234k ± 1% ~ (p=0.690 n=5+5) GoParser 316k ± 0% 317k ± 1% ~ (p=0.841 n=5+5) Reflect 981k ± 0% 981k ± 0% ~ (p=1.000 n=5+5) Tar 250k ± 0% 249k ± 1% ~ (p=0.151 n=5+5) XML 393k ± 0% 392k ± 0% ~ (p=0.056 n=5+5) Going beyond c=4 on my machine tends to increase CPU time and allocs without impacting real time. The CPU time numbers matter, because when there are many concurrent compilation processes, that will impact the overall throughput. The numbers above are in many ways the best case scenario; we can take full advantage of all cores. Fortunately, the most common compilation scenario is incremental re-compilation of a single package during a build/test cycle. Updates #15756 Change-Id: I6725558ca2069edec0ac5b0d1683105a9fff6bea Reviewed-on: https://go-review.googlesource.com/40693 Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Robert Griesemer <gri@golang.org> Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-19 08:27:26 -07:00
}
// recordFlags records the specified command-line flags to be placed
// in the DWARF info.
func recordFlags(flags ...string) {
if base.Ctxt.Pkgpath == "" {
// We can't record the flags if we don't know what the
// package name is.
return
}
type BoolFlag interface {
IsBoolFlag() bool
}
type CountFlag interface {
IsCountFlag() bool
}
var cmd bytes.Buffer
for _, name := range flags {
f := flag.Lookup(name)
if f == nil {
continue
}
getter := f.Value.(flag.Getter)
if getter.String() == f.DefValue {
// Flag has default value, so omit it.
continue
}
if bf, ok := f.Value.(BoolFlag); ok && bf.IsBoolFlag() {
val, ok := getter.Get().(bool)
if ok && val {
fmt.Fprintf(&cmd, " -%s", f.Name)
continue
}
}
if cf, ok := f.Value.(CountFlag); ok && cf.IsCountFlag() {
val, ok := getter.Get().(int)
if ok && val == 1 {
fmt.Fprintf(&cmd, " -%s", f.Name)
continue
}
}
fmt.Fprintf(&cmd, " -%s=%v", f.Name, getter.Get())
}
if cmd.Len() == 0 {
return
}
s := base.Ctxt.Lookup(dwarf.CUInfoPrefix + "producer." + base.Ctxt.Pkgpath)
s.Type = objabi.SDWARFCUINFO
// Sometimes (for example when building tests) we can link
// together two package main archives. So allow dups.
s.Set(obj.AttrDuplicateOK, true)
base.Ctxt.Data = append(base.Ctxt.Data, s)
s.P = cmd.Bytes()[1:]
}
// recordPackageName records the name of the package being
// compiled, so that the linker can save it in the compile unit's DIE.
func recordPackageName() {
s := base.Ctxt.Lookup(dwarf.CUInfoPrefix + "packagename." + base.Ctxt.Pkgpath)
s.Type = objabi.SDWARFCUINFO
// Sometimes (for example when building tests) we can link
// together two package main archives. So allow dups.
s.Set(obj.AttrDuplicateOK, true)
base.Ctxt.Data = append(base.Ctxt.Data, s)
s.P = []byte(types.LocalPkg.Name)
}
// currentLang returns the current language version.
func currentLang() string {
return fmt.Sprintf("go1.%d", goversion.Version)
}
// goVersionRE is a regular expression that matches the valid
// arguments to the -lang flag.
var goVersionRE = regexp.MustCompile(`^go([1-9][0-9]*)\.(0|[1-9][0-9]*)$`)
// A lang is a language version broken into major and minor numbers.
type lang struct {
major, minor int
}
// langWant is the desired language version set by the -lang flag.
// If the -lang flag is not set, this is the zero value, meaning that
// any language version is supported.
var langWant lang
// AllowsGoVersion reports whether a particular package
// is allowed to use Go version major.minor.
// We assume the imported packages have all been checked,
// so we only have to check the local package against the -lang flag.
func AllowsGoVersion(pkg *types.Pkg, major, minor int) bool {
if pkg == nil {
// TODO(mdempsky): Set Pkg for local types earlier.
pkg = types.LocalPkg
}
if pkg != types.LocalPkg {
// Assume imported packages passed type-checking.
return true
}
if langWant.major == 0 && langWant.minor == 0 {
return true
}
return langWant.major > major || (langWant.major == major && langWant.minor >= minor)
}
func langSupported(major, minor int, pkg *types.Pkg) bool {
return AllowsGoVersion(pkg, major, minor)
}
// checkLang verifies that the -lang flag holds a valid value, and
// exits if not. It initializes data used by langSupported.
func checkLang() {
if base.Flag.Lang == "" {
return
}
var err error
langWant, err = parseLang(base.Flag.Lang)
if err != nil {
log.Fatalf("invalid value %q for -lang: %v", base.Flag.Lang, err)
}
if def := currentLang(); base.Flag.Lang != def {
defVers, err := parseLang(def)
if err != nil {
log.Fatalf("internal error parsing default lang %q: %v", def, err)
}
if langWant.major > defVers.major || (langWant.major == defVers.major && langWant.minor > defVers.minor) {
log.Fatalf("invalid value %q for -lang: max known version is %q", base.Flag.Lang, def)
}
}
}
// parseLang parses a -lang option into a langVer.
func parseLang(s string) (lang, error) {
matches := goVersionRE.FindStringSubmatch(s)
if matches == nil {
return lang{}, fmt.Errorf(`should be something like "go1.12"`)
}
major, err := strconv.Atoi(matches[1])
if err != nil {
return lang{}, err
}
minor, err := strconv.Atoi(matches[2])
if err != nil {
return lang{}, err
}
return lang{major: major, minor: minor}, nil
}
[dev.regabi] cmd/compile,cmd/link: initial support for ABI wrappers Add compiler support for emitting ABI wrappers by creating real IR as opposed to introducing ABI aliases. At the moment these are "no-op" wrappers in the sense that they make a simple call (using the existing ABI) to their target. The assumption here is that once late call expansion can handle both ABI0 and the "new" ABIInternal (register version), it can expand the call to do the right thing. Note that the runtime contains functions that do not strictly follow the rules of the current Go ABI0; this has been handled in most cases by treating these as ABIInternal instead (these changes have been made in previous patches). Generation of ABI wrappers (as opposed to ABI aliases) is currently gated by GOEXPERIMENT=regabi -- wrapper generation is on by default if GOEXPERIMENT=regabi is set and off otherwise (but can be turned on using "-gcflags=all=-abiwrap -ldflags=-abiwrap"). Wrapper generation currently only workd on AMD64; explicitly enabling wrapper for other architectures (via the command line) is not supported. Also in this patch are a few other command line options for debugging (tracing and/or limiting wrapper creation). These will presumably go away at some point. Updates #27539, #40724. Change-Id: I1ee3226fc15a3c32ca2087b8ef8e41dbe6df4a75 Reviewed-on: https://go-review.googlesource.com/c/go/+/270863 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Than McIntosh <thanm@google.com>
2020-09-24 13:14:46 -04:00
// useNewABIWrapGen returns TRUE if the compiler should generate an
// ABI wrapper for the function 'f'.
func useABIWrapGen(f *ir.Func) bool {
if !base.Flag.ABIWrap {
return false
}
// Support limit option for bisecting.
if base.Flag.ABIWrapLimit == 1 {
return false
}
if base.Flag.ABIWrapLimit < 1 {
return true
}
base.Flag.ABIWrapLimit--
if base.Debug.ABIWrap != 0 && base.Flag.ABIWrapLimit == 1 {
fmt.Fprintf(os.Stderr, "=-= limit reached after new wrapper for %s\n",
f.LSym.Name)
}
[dev.regabi] cmd/compile,cmd/link: initial support for ABI wrappers Add compiler support for emitting ABI wrappers by creating real IR as opposed to introducing ABI aliases. At the moment these are "no-op" wrappers in the sense that they make a simple call (using the existing ABI) to their target. The assumption here is that once late call expansion can handle both ABI0 and the "new" ABIInternal (register version), it can expand the call to do the right thing. Note that the runtime contains functions that do not strictly follow the rules of the current Go ABI0; this has been handled in most cases by treating these as ABIInternal instead (these changes have been made in previous patches). Generation of ABI wrappers (as opposed to ABI aliases) is currently gated by GOEXPERIMENT=regabi -- wrapper generation is on by default if GOEXPERIMENT=regabi is set and off otherwise (but can be turned on using "-gcflags=all=-abiwrap -ldflags=-abiwrap"). Wrapper generation currently only workd on AMD64; explicitly enabling wrapper for other architectures (via the command line) is not supported. Also in this patch are a few other command line options for debugging (tracing and/or limiting wrapper creation). These will presumably go away at some point. Updates #27539, #40724. Change-Id: I1ee3226fc15a3c32ca2087b8ef8e41dbe6df4a75 Reviewed-on: https://go-review.googlesource.com/c/go/+/270863 Run-TryBot: Than McIntosh <thanm@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Cherry Zhang <cherryyz@google.com> Trust: Than McIntosh <thanm@google.com>
2020-09-24 13:14:46 -04:00
return true
}