go/src/cmd/compile/internal/ssa/config.go

436 lines
13 KiB
Go
Raw Normal View History

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ssa
import (
"cmd/internal/obj"
"cmd/internal/src"
"crypto/sha1"
"fmt"
"os"
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
"strconv"
"strings"
)
// A Config holds readonly compilation information.
// It is created once, early during compilation,
// and shared across all compilations.
type Config struct {
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
arch string // "amd64", etc.
IntSize int64 // 4 or 8
PtrSize int64 // 4 or 8
RegSize int64 // 4 or 8
lowerBlock func(*Block, *Config) bool // lowering function
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
lowerValue func(*Value, *Config) bool // lowering function
registers []Register // machine registers
gpRegMask regMask // general purpose integer register mask
fpRegMask regMask // floating point register mask
specialRegMask regMask // special register mask
FPReg int8 // register number of frame pointer, -1 if not used
LinkReg int8 // register number of link register if it is a general purpose register, -1 if not used
hasGReg bool // has hardware g register
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
fe Frontend // callbacks into compiler frontend
HTML *HTMLWriter // html writer, for debugging
ctxt *obj.Link // Generic arch information
optimize bool // Do optimization
noDuffDevice bool // Don't use Duff's device
nacl bool // GOOS=nacl
use387 bool // GO386=387
OldArch bool // True for older versions of architecture, e.g. true for PPC64BE, false for PPC64LE
NeedsFpScratch bool // No direct move between GP and FP register sets
BigEndian bool //
DebugTest bool // default true unless $GOSSAHASH != ""; as a debugging aid, make new code conditional on this and use GOSSAHASH to binary search for failing cases
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
sparsePhiCutoff uint64 // Sparse phi location algorithm used above this #blocks*#variables score
curFunc *Func
// TODO: more stuff. Compiler flags of interest, ...
// Given an environment variable used for debug hash match,
// what file (if any) receives the yes/no logging?
logfiles map[string]*os.File
// Storage for low-numbered values and blocks.
values [2000]Value
blocks [200]Block
cmd/compile: add reusable []Location to ssa.Config name old time/op new time/op delta Template 218ms ± 3% 214ms ± 3% -1.70% (p=0.000 n=30+30) Unicode 100ms ± 3% 100ms ± 4% ~ (p=0.614 n=29+30) GoTypes 657ms ± 1% 660ms ± 3% +0.46% (p=0.046 n=29+30) Compiler 2.80s ± 2% 2.80s ± 1% ~ (p=0.451 n=28+29) Flate 131ms ± 2% 132ms ± 4% ~ (p=1.000 n=29+29) GoParser 159ms ± 3% 160ms ± 5% ~ (p=0.341 n=28+30) Reflect 406ms ± 3% 408ms ± 4% ~ (p=0.511 n=28+30) Tar 118ms ± 4% 118ms ± 4% ~ (p=0.827 n=29+30) XML 222ms ± 6% 222ms ± 3% ~ (p=0.532 n=30+30) name old user-ns/op new user-ns/op delta Template 274user-ms ± 3% 272user-ms ± 3% -0.87% (p=0.015 n=29+30) Unicode 140user-ms ± 4% 140user-ms ± 3% ~ (p=0.735 n=29+30) GoTypes 890user-ms ± 1% 897user-ms ± 2% +0.88% (p=0.002 n=29+30) Compiler 3.88user-s ± 2% 3.89user-s ± 1% ~ (p=0.132 n=30+29) Flate 168user-ms ± 2% 157user-ms ± 4% -6.21% (p=0.000 n=25+28) GoParser 211user-ms ± 2% 213user-ms ± 5% ~ (p=0.086 n=28+30) Reflect 539user-ms ± 2% 541user-ms ± 3% ~ (p=0.267 n=27+29) Tar 156user-ms ± 7% 155user-ms ± 5% ~ (p=0.708 n=30+30) XML 291user-ms ± 5% 294user-ms ± 3% +0.83% (p=0.029 n=29+30) name old alloc/op new alloc/op delta Template 40.7MB ± 0% 39.4MB ± 0% -3.26% (p=0.000 n=29+26) Unicode 30.8MB ± 0% 30.7MB ± 0% -0.40% (p=0.000 n=28+30) GoTypes 123MB ± 0% 119MB ± 0% -3.47% (p=0.000 n=30+29) Compiler 472MB ± 0% 455MB ± 0% -3.60% (p=0.000 n=30+30) Flate 26.5MB ± 0% 25.6MB ± 0% -3.21% (p=0.000 n=28+30) GoParser 32.3MB ± 0% 31.4MB ± 0% -2.98% (p=0.000 n=29+30) Reflect 84.4MB ± 0% 82.1MB ± 0% -2.83% (p=0.000 n=30+30) Tar 27.3MB ± 0% 26.5MB ± 0% -2.70% (p=0.000 n=29+29) XML 44.6MB ± 0% 43.1MB ± 0% -3.49% (p=0.000 n=30+30) name old allocs/op new allocs/op delta Template 401k ± 1% 399k ± 0% -0.35% (p=0.000 n=30+28) Unicode 331k ± 0% 331k ± 1% ~ (p=0.907 n=28+30) GoTypes 1.24M ± 0% 1.23M ± 0% -0.43% (p=0.000 n=30+30) Compiler 4.26M ± 0% 4.25M ± 0% -0.34% (p=0.000 n=29+30) Flate 252k ± 1% 251k ± 1% -0.41% (p=0.000 n=30+30) GoParser 325k ± 1% 324k ± 1% -0.31% (p=0.000 n=27+30) Reflect 1.06M ± 0% 1.05M ± 0% -0.69% (p=0.000 n=30+30) Tar 266k ± 1% 265k ± 1% -0.51% (p=0.000 n=29+30) XML 416k ± 1% 415k ± 1% -0.36% (p=0.002 n=30+30) Change-Id: I8f784001324df83b2764c44f0e83a540e5beab34 Reviewed-on: https://go-review.googlesource.com/36212 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-27 12:01:05 -08:00
locs [2000]Location
// Reusable stackAllocState.
// See stackalloc.go's {new,put}StackAllocState.
stackAllocState *stackAllocState
domblockstore []ID // scratch space for computing dominators
scrSparse []*sparseSet // scratch sparse sets to be re-used.
}
type TypeSource interface {
TypeBool() Type
TypeInt8() Type
TypeInt16() Type
TypeInt32() Type
TypeInt64() Type
TypeUInt8() Type
TypeUInt16() Type
TypeUInt32() Type
TypeUInt64() Type
TypeInt() Type
TypeFloat32() Type
TypeFloat64() Type
TypeUintptr() Type
TypeString() Type
TypeBytePtr() Type // TODO: use unsafe.Pointer instead?
CanSSA(t Type) bool
}
type Logger interface {
// Logf logs a message from the compiler.
Logf(string, ...interface{})
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Log returns true if logging is not a no-op
// some logging calls account for more than a few heap allocations.
Log() bool
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Fatal reports a compiler error and exits.
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
Fatalf(pos src.XPos, msg string, args ...interface{})
[dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors The SSA implementation logs for three purposes: * debug logging * fatal errors * unimplemented features Separating these three uses lets us attempt an SSA implementation for all functions, not just _ssa functions. This turns the entire standard library into a compilation test, and makes it easy to figure out things like "how much coverage does SSA have now" and "what should we do next to get more coverage?". Functions called _ssa are still special. They log profusely by default and the output of the SSA implementation is used. For all other functions, logging is off, and the implementation is built and discarded, due to lack of support for the runtime. While we're here, fix a few minor bugs and add some extra Unimplementeds to allow all.bash to pass. As of now, SSA handles 20.79% of the functions in the standard library (689 of 3314). The top missing features are: 10.03% 2597 SSA unimplemented: zero for type error not implemented 7.79% 2016 SSA unimplemented: addr: bad op DOTPTR 7.33% 1898 SSA unimplemented: unhandled expr EQ 6.10% 1579 SSA unimplemented: unhandled expr OROR 4.91% 1271 SSA unimplemented: unhandled expr NE 4.49% 1163 SSA unimplemented: unhandled expr LROT 4.00% 1036 SSA unimplemented: unhandled expr LEN 3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC 2.37% 615 SSA unimplemented: zero for type []byte not implemented 1.90% 492 SSA unimplemented: unhandled stmt CALLMETH 1.74% 450 SSA unimplemented: unhandled expr CALLINTER 1.74% 450 SSA unimplemented: unhandled expr DOT 1.71% 444 SSA unimplemented: unhandled expr ANDAND 1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR 1.54% 400 SSA unimplemented: unhandled expr CALLMETH 1.51% 390 SSA unimplemented: unhandled stmt SWITCH 1.47% 380 SSA unimplemented: unhandled expr CONV 1.33% 345 SSA unimplemented: addr: bad op * 1.30% 336 SSA unimplemented: unhandled OLITERAL 6 Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7 Reviewed-on: https://go-review.googlesource.com/11160 Reviewed-by: Keith Randall <khr@golang.org>
2015-06-12 11:01:13 -07:00
// Warnl writes compiler messages in the form expected by "errorcheck" tests
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
Warnl(pos src.XPos, fmt_ string, args ...interface{})
// Forwards the Debug flags from gc
Debug_checknil() bool
Debug_wb() bool
}
type Frontend interface {
TypeSource
Logger
// StringData returns a symbol pointing to the given string's contents.
StringData(string) interface{} // returns *gc.Sym
// Auto returns a Node for an auto variable of the given type.
// The SSA compiler uses this function to allocate space for spills.
Auto(Type) GCNode
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Given the name for a compound type, returns the name we should use
// for the parts of that compound type.
SplitString(LocalSlot) (LocalSlot, LocalSlot)
SplitInterface(LocalSlot) (LocalSlot, LocalSlot)
SplitSlice(LocalSlot) (LocalSlot, LocalSlot, LocalSlot)
SplitComplex(LocalSlot) (LocalSlot, LocalSlot)
SplitStruct(LocalSlot, int) LocalSlot
SplitArray(LocalSlot) LocalSlot // array must be length 1
SplitInt64(LocalSlot) (LocalSlot, LocalSlot) // returns (hi, lo)
cmd/compile: better job of naming compound types Compound AUTO types weren't named previously. That was because live variable analysis (plive.go) doesn't handle spilling to compound types. It can't handle them because there is no valid place to put VARDEFs when regalloc is spilling compound types. compound types = multiword builtin types: complex, string, slice, and interface. Instead, we split named AUTOs into individual one-word variables. For example, a string s gets split into a byte ptr s.ptr and an integer s.len. Those two variables can be spilled to / restored from independently. As a result, live variable analysis can handle them because they are one-word objects. This CL will change how AUTOs are described in DWARF information. Consider the code: func f(s string, i int) int { x := s[i:i+5] g() return lookup(x) } The old compiler would spill x to two consecutive slots on the stack, both named x (at offsets 0 and 8). The new compiler spills the pointer of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a constant (5) and can be rematerialized for the call to lookup. So compound objects may not be spilled in their entirety, and even if they are they won't necessarily be contiguous. Such is the price of optimization. Re-enable live variable analysis tests. One test remains disabled, it fails because of #14904. Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801 Reviewed-on: https://go-review.googlesource.com/21233 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// DerefItab dereferences an itab function
// entry, given the symbol of the itab and
// the byte offset of the function pointer.
// It may return nil.
DerefItab(sym *obj.LSym, offset int64) *obj.LSym
// Line returns a string describing the given position.
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
Line(src.XPos) string
// AllocFrame assigns frame offsets to all live auto variables.
AllocFrame(f *Func)
// Syslook returns a symbol of the runtime function/variable with the
// given name.
Syslook(string) *obj.LSym
}
// interface used to hold *gc.Node. We'd use *gc.Node directly but
// that would lead to an import cycle.
type GCNode interface {
Typ() Type
String() string
}
// NewConfig returns a new configuration object for the given architecture.
func NewConfig(arch string, fe Frontend, ctxt *obj.Link, optimize bool) *Config {
c := &Config{arch: arch, fe: fe}
switch arch {
case "amd64":
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
c.IntSize = 8
c.PtrSize = 8
c.RegSize = 8
c.lowerBlock = rewriteBlockAMD64
c.lowerValue = rewriteValueAMD64
c.registers = registersAMD64[:]
c.gpRegMask = gpRegMaskAMD64
c.fpRegMask = fpRegMaskAMD64
c.FPReg = framepointerRegAMD64
c.LinkReg = linkRegAMD64
c.hasGReg = false
case "amd64p32":
c.IntSize = 4
c.PtrSize = 4
c.RegSize = 8
c.lowerBlock = rewriteBlockAMD64
c.lowerValue = rewriteValueAMD64
c.registers = registersAMD64[:]
c.gpRegMask = gpRegMaskAMD64
c.fpRegMask = fpRegMaskAMD64
c.FPReg = framepointerRegAMD64
c.LinkReg = linkRegAMD64
c.hasGReg = false
c.noDuffDevice = true
case "386":
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled For integer types less than a machine register, we have to decide what the invariants are for the high bits of the register. We used to set the high bits to the correct extension (sign or zero, as determined by the type) of the low bits. This CL makes the compiler ignore the high bits of the register altogether (they are junk). On this plus side, this means ops that generate subword results don't have to worry about correctly extending them. On the minus side, ops that consume subword arguments have to deal with the input registers not being correctly extended. For x86, this tradeoff is probably worth it. Almost all opcodes have versions that use only the correct subword piece of their inputs. (The one big exception is array indexing.) Not many opcodes can correctly sign extend on output. For other architectures, the tradeoff is probably not so clear, as they don't have many subword-safe opcodes (e.g. 16-bit compare, ignoring the high 16/48 bits). Fortunately we can decide whether we do this per-architecture. For the machine-independent opcodes, we pretend that the "register" size is equal to the type width, so sign extension is immaterial. Opcodes that care about the signedness of the input (e.g. compare, right shift) have two different variants. Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d Reviewed-on: https://go-review.googlesource.com/12600 Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
c.IntSize = 4
c.PtrSize = 4
c.RegSize = 4
c.lowerBlock = rewriteBlock386
c.lowerValue = rewriteValue386
c.registers = registers386[:]
c.gpRegMask = gpRegMask386
c.fpRegMask = fpRegMask386
c.FPReg = framepointerReg386
c.LinkReg = linkReg386
c.hasGReg = false
case "arm":
c.IntSize = 4
c.PtrSize = 4
c.RegSize = 4
c.lowerBlock = rewriteBlockARM
c.lowerValue = rewriteValueARM
c.registers = registersARM[:]
c.gpRegMask = gpRegMaskARM
c.fpRegMask = fpRegMaskARM
c.FPReg = framepointerRegARM
c.LinkReg = linkRegARM
c.hasGReg = true
case "arm64":
c.IntSize = 8
c.PtrSize = 8
c.RegSize = 8
c.lowerBlock = rewriteBlockARM64
c.lowerValue = rewriteValueARM64
c.registers = registersARM64[:]
c.gpRegMask = gpRegMaskARM64
c.fpRegMask = fpRegMaskARM64
c.FPReg = framepointerRegARM64
c.LinkReg = linkRegARM64
c.hasGReg = true
c.noDuffDevice = obj.GOOS == "darwin" // darwin linker cannot handle BR26 reloc with non-zero addend
case "ppc64":
c.OldArch = true
c.BigEndian = true
fallthrough
case "ppc64le":
c.IntSize = 8
c.PtrSize = 8
c.RegSize = 8
c.lowerBlock = rewriteBlockPPC64
c.lowerValue = rewriteValuePPC64
c.registers = registersPPC64[:]
c.gpRegMask = gpRegMaskPPC64
c.fpRegMask = fpRegMaskPPC64
c.FPReg = framepointerRegPPC64
c.LinkReg = linkRegPPC64
c.noDuffDevice = true // TODO: Resolve PPC64 DuffDevice (has zero, but not copy)
c.hasGReg = true
case "mips64":
c.BigEndian = true
fallthrough
case "mips64le":
c.IntSize = 8
c.PtrSize = 8
c.RegSize = 8
c.lowerBlock = rewriteBlockMIPS64
c.lowerValue = rewriteValueMIPS64
c.registers = registersMIPS64[:]
c.gpRegMask = gpRegMaskMIPS64
c.fpRegMask = fpRegMaskMIPS64
c.specialRegMask = specialRegMaskMIPS64
c.FPReg = framepointerRegMIPS64
c.LinkReg = linkRegMIPS64
c.hasGReg = true
case "s390x":
c.IntSize = 8
c.PtrSize = 8
c.RegSize = 8
c.lowerBlock = rewriteBlockS390X
c.lowerValue = rewriteValueS390X
c.registers = registersS390X[:]
c.gpRegMask = gpRegMaskS390X
c.fpRegMask = fpRegMaskS390X
c.FPReg = framepointerRegS390X
c.LinkReg = linkRegS390X
c.hasGReg = true
c.noDuffDevice = true
c.BigEndian = true
case "mips":
c.BigEndian = true
fallthrough
case "mipsle":
c.IntSize = 4
c.PtrSize = 4
c.RegSize = 4
c.lowerBlock = rewriteBlockMIPS
c.lowerValue = rewriteValueMIPS
c.registers = registersMIPS[:]
c.gpRegMask = gpRegMaskMIPS
c.fpRegMask = fpRegMaskMIPS
c.specialRegMask = specialRegMaskMIPS
c.FPReg = framepointerRegMIPS
c.LinkReg = linkRegMIPS
c.hasGReg = true
c.noDuffDevice = true
default:
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
fe.Fatalf(src.NoXPos, "arch %s not implemented", arch)
}
c.ctxt = ctxt
c.optimize = optimize
c.nacl = obj.GOOS == "nacl"
// Don't use Duff's device on Plan 9 AMD64, because floating
// point operations are not allowed in note handler.
if obj.GOOS == "plan9" && arch == "amd64" {
c.noDuffDevice = true
}
if c.nacl {
c.noDuffDevice = true // Don't use Duff's device on NaCl
// runtime call clobber R12 on nacl
opcodeTable[OpARMCALLudiv].reg.clobbers |= 1 << 12 // R12
}
// Assign IDs to preallocated values/blocks.
for i := range c.values {
c.values[i].ID = ID(i)
}
for i := range c.blocks {
c.blocks[i].ID = ID(i)
}
c.logfiles = make(map[string]*os.File)
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
// cutoff is compared with product of numblocks and numvalues,
// if product is smaller than cutoff, use old non-sparse method.
// cutoff == 0 implies all sparse.
// cutoff == -1 implies none sparse.
// Good cutoff values seem to be O(million) depending on constant factor cost of sparse.
// TODO: get this from a flag, not an environment variable
c.sparsePhiCutoff = 2500000 // 0 for testing. // 2500000 determined with crude experiments w/ make.bash
ev := os.Getenv("GO_SSA_PHI_LOC_CUTOFF")
if ev != "" {
v, err := strconv.ParseInt(ev, 10, 64)
if err != nil {
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
fe.Fatalf(src.NoXPos, "Environment variable GO_SSA_PHI_LOC_CUTOFF (value '%s') did not parse as a number", ev)
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
}
c.sparsePhiCutoff = uint64(v) // convert -1 to maxint, for never use sparse
}
return c
}
func (c *Config) Set387(b bool) {
c.NeedsFpScratch = b
c.use387 = b
}
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
func (c *Config) Frontend() Frontend { return c.fe }
func (c *Config) SparsePhiCutoff() uint64 { return c.sparsePhiCutoff }
func (c *Config) Ctxt() *obj.Link { return c.ctxt }
[dev.ssa] cmd/compile: add GOSSAFUNC and GOSSAPKG These temporary environment variables make it possible to enable using SSA-generated code for a particular function or package without having to rebuild the compiler. This makes it possible to start bulk testing SSA generated code. First, bump up the default stack size (_StackMin in runtime/stack2.go) to something large like 32768, because without stackmaps we can't grow stacks. Then run something like: for pkg in `go list std` do GOGC=off GOSSAPKG=`basename $pkg` go test -a $pkg done When a test fails, you can re-run those tests, selectively enabling one function after another, until you find the one that is causing trouble. Doing this right now yields some interesting results: * There are several packages for which we generate some code and whose tests pass. Yay! * We can generate code for encoding/base64, but tests there fail, so there's a bug to fix. * Attempting to build the runtime yields a panic during codegen: panic: interface conversion: ssa.Location is nil, not *ssa.LocalSlot * The top unimplemented codegen items are (simplified): 59 genValue not implemented: REPMOVSB 18 genValue not implemented: REPSTOSQ 14 genValue not implemented: SUBQ 9 branch not implemented: If v -> b b. Control: XORQconst <bool> [1] 8 genValue not implemented: MOVQstoreidx8 4 branch not implemented: If v -> b b. Control: SETG <bool> 3 branch not implemented: If v -> b b. Control: SETLE <bool> 2 load flags not implemented: LoadReg8 <flags> 2 genValue not implemented: InvertFlags <flags> 1 store flags not implemented: StoreReg8 <flags> 1 branch not implemented: If v -> b b. Control: SETGE <bool> Change-Id: Ib64809ac0c917e25bcae27829ae634c70d290c7f Reviewed-on: https://go-review.googlesource.com/12547 Reviewed-by: Keith Randall <khr@golang.org>
2015-07-22 13:13:53 -07:00
// NewFunc returns a new, empty function object.
// Caller must call f.Free() before calling NewFunc again.
func (c *Config) NewFunc() *Func {
// TODO(khr): should this function take name, type, etc. as arguments?
if c.curFunc != nil {
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
c.Fatalf(src.NoXPos, "NewFunc called without previous Free")
}
f := &Func{Config: c, NamedValues: map[LocalSlot][]*Value{}}
c.curFunc = f
return f
}
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
func (c *Config) Logf(msg string, args ...interface{}) { c.fe.Logf(msg, args...) }
func (c *Config) Log() bool { return c.fe.Log() }
func (c *Config) Fatalf(pos src.XPos, msg string, args ...interface{}) { c.fe.Fatalf(pos, msg, args...) }
func (c *Config) Warnl(pos src.XPos, msg string, args ...interface{}) { c.fe.Warnl(pos, msg, args...) }
func (c *Config) Debug_checknil() bool { return c.fe.Debug_checknil() }
func (c *Config) Debug_wb() bool { return c.fe.Debug_wb() }
func (c *Config) logDebugHashMatch(evname, name string) {
file := c.logfiles[evname]
if file == nil {
file = os.Stdout
tmpfile := os.Getenv("GSHS_LOGFILE")
if tmpfile != "" {
var ok error
file, ok = os.Create(tmpfile)
if ok != nil {
[dev.inline] cmd/internal/src: introduce compact source position representation XPos is a compact (8 instead of 16 bytes on a 64bit machine) source position representation. There is a 1:1 correspondence between each XPos and each regular Pos, translated via a global table. In some sense this brings back the LineHist, though positions can track line and column information; there is a O(1) translation between the representations (no binary search), and the translation is factored out. The size increase with the prior change is brought down again and the compiler speed is in line with the master repo (measured on the same "quiet" machine as for prior change): name old time/op new time/op delta Template 256ms ± 1% 262ms ± 2% ~ (p=0.063 n=5+4) Unicode 132ms ± 1% 135ms ± 2% ~ (p=0.063 n=5+4) GoTypes 891ms ± 1% 871ms ± 1% -2.28% (p=0.016 n=5+4) Compiler 3.84s ± 2% 3.89s ± 2% ~ (p=0.413 n=5+4) MakeBash 47.1s ± 1% 46.2s ± 2% ~ (p=0.095 n=5+5) name old user-ns/op new user-ns/op delta Template 309M ± 1% 314M ± 2% ~ (p=0.111 n=5+4) Unicode 165M ± 1% 172M ± 9% ~ (p=0.151 n=5+5) GoTypes 1.14G ± 2% 1.12G ± 1% ~ (p=0.063 n=5+4) Compiler 5.00G ± 1% 4.96G ± 1% ~ (p=0.286 n=5+4) Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47 Reviewed-on: https://go-review.googlesource.com/34506 Run-TryBot: Robert Griesemer <gri@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: Austin Clements <austin@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-12-15 17:17:01 -08:00
c.Fatalf(src.NoXPos, "Could not open hash-testing logfile %s", tmpfile)
}
}
c.logfiles[evname] = file
}
s := fmt.Sprintf("%s triggered %s\n", evname, name)
file.WriteString(s)
file.Sync()
}
// DebugHashMatch returns true if environment variable evname
// 1) is empty (this is a special more-quickly implemented case of 3)
// 2) is "y" or "Y"
// 3) is a suffix of the sha1 hash of name
// 4) is a suffix of the environment variable
// fmt.Sprintf("%s%d", evname, n)
// provided that all such variables are nonempty for 0 <= i <= n
// Otherwise it returns false.
// When true is returned the message
// "%s triggered %s\n", evname, name
// is printed on the file named in environment variable
// GSHS_LOGFILE
// or standard out if that is empty or there is an error
// opening the file.
func (c *Config) DebugHashMatch(evname, name string) bool {
evhash := os.Getenv(evname)
if evhash == "" {
return true // default behavior with no EV is "on"
}
if evhash == "y" || evhash == "Y" {
c.logDebugHashMatch(evname, name)
return true
}
if evhash == "n" || evhash == "N" {
return false
}
// Check the hash of the name against a partial input hash.
// We use this feature to do a binary search to
// find a function that is incorrectly compiled.
hstr := ""
for _, b := range sha1.Sum([]byte(name)) {
hstr += fmt.Sprintf("%08b", b)
}
if strings.HasSuffix(hstr, evhash) {
c.logDebugHashMatch(evname, name)
return true
}
// Iteratively try additional hashes to allow tests for multi-point
// failure.
for i := 0; true; i++ {
ev := fmt.Sprintf("%s%d", evname, i)
evv := os.Getenv(ev)
if evv == "" {
break
}
if strings.HasSuffix(hstr, evv) {
c.logDebugHashMatch(ev, name)
return true
}
}
return false
}
cmd/compile: use sparse algorithm for phis in large program This adds a sparse method for locating nearest ancestors in a dominator tree, and checks blocks with more than one predecessor for differences and inserts phi functions where there are. Uses reversed post order to cut number of passes, running it from first def to last use ("last use" for paramout and mem is end-of-program; last use for a phi input from a backedge is the source of the back edge) Includes a cutover from old algorithm to new to avoid paying large constant factor for small programs. This keeps normal builds running at about the same time, while not running over-long on large machine-generated inputs. Add "phase" flags for ssa/build -- ssa/build/stats prints number of blocks, values (before and after linking references and inserting phis, so expansion can be measured), and their product; the product governs the cutover, where a good value seems to be somewhere between 1 and 5 million. Among the files compiled by make.bash, this is the shape of the tail of the distribution for #blocks, #vars, and their product: #blocks #vars product max 6171 28180 173,898,780 99.9% 1641 6548 10,401,878 99% 463 1909 873,721 95% 152 639 95,235 90% 84 359 30,021 The old algorithm is indeed usually fastest, for 99%ile values of usually. The fix to LookupVarOutgoing ( https://go-review.googlesource.com/#/c/22790/ ) deals with some of the same problems addressed by this CL, but on at least one bug ( #15537 ) this change is still a significant help. With this CL: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 4m35.200s user 13m16.644s sys 0m36.712s and pprof reports 3.4GB allocated in one of the larger profiles With tip: /tmp/gopath$ rm -rf pkg bin /tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \ github.com/gogo/protobuf/test/theproto3/combos/... ... real 10m36.569s user 25m52.286s sys 4m3.696s and pprof reports 8.3GB allocated in the same larger profile With this CL, most of the compilation time on the benchmarked input is spent in register/stack allocation (cumulative 53%) and in the sparse lookup algorithm itself (cumulative 20%). Fixes #15537. Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00 Reviewed-on: https://go-review.googlesource.com/22342 Reviewed-by: Keith Randall <khr@golang.org> Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
func (c *Config) DebugNameMatch(evname, name string) bool {
return os.Getenv(evname) == name
}