2015-04-15 15:51:25 -07:00
// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ssa
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
import (
"cmd/internal/obj"
"crypto/sha1"
"fmt"
"os"
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
"strconv"
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
"strings"
)
2015-08-24 02:16:19 -07:00
2015-04-15 15:51:25 -07:00
type Config struct {
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
arch string // "amd64", etc.
IntSize int64 // 4 or 8
PtrSize int64 // 4 or 8
2016-09-28 10:20:24 -04:00
RegSize int64 // 4 or 8
2016-08-26 15:41:51 -04:00
lowerBlock func ( * Block , * Config ) bool // lowering function
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
lowerValue func ( * Value , * Config ) bool // lowering function
registers [ ] Register // machine registers
2016-05-19 12:33:30 -04:00
gpRegMask regMask // general purpose integer register mask
fpRegMask regMask // floating point register mask
2016-08-22 12:25:23 -04:00
specialRegMask regMask // special register mask
2016-05-19 12:33:30 -04:00
FPReg int8 // register number of frame pointer, -1 if not used
2016-05-31 14:01:34 -04:00
hasGReg bool // has hardware g register
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
fe Frontend // callbacks into compiler frontend
HTML * HTMLWriter // html writer, for debugging
ctxt * obj . Link // Generic arch information
optimize bool // Do optimization
noDuffDevice bool // Don't use Duff's device
2016-07-07 10:49:43 -04:00
nacl bool // GOOS=nacl
2016-07-26 11:51:33 -07:00
use387 bool // GO386=387
2016-09-16 15:02:47 -07:00
OldArch bool // True for older versions of architecture, e.g. true for PPC64BE, false for PPC64LE
2016-08-10 11:44:57 -07:00
NeedsFpScratch bool // No direct move between GP and FP register sets
2016-09-16 15:02:47 -07:00
DebugTest bool // default true unless $GOSSAHASH != ""; as a debugging aid, make new code conditional on this and use GOSSAHASH to binary search for failing cases
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
sparsePhiCutoff uint64 // Sparse phi location algorithm used above this #blocks*#variables score
curFunc * Func
2015-04-15 15:51:25 -07:00
2016-03-01 23:21:55 +00:00
// TODO: more stuff. Compiler flags of interest, ...
2016-01-28 13:46:30 -08:00
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
// Given an environment variable used for debug hash match,
// what file (if any) receives the yes/no logging?
logfiles map [ string ] * os . File
2016-01-28 13:46:30 -08:00
// Storage for low-numbered values and blocks.
values [ 2000 ] Value
blocks [ 200 ] Block
2016-02-02 06:35:34 -05:00
2016-03-21 07:20:01 +11:00
// Reusable stackAllocState.
// See stackalloc.go's {new,put}StackAllocState.
stackAllocState * stackAllocState
2016-02-10 17:43:31 -05:00
domblockstore [ ] ID // scratch space for computing dominators
scrSparse [ ] * sparseSet // scratch sparse sets to be re-used.
2015-04-15 15:51:25 -07:00
}
2015-07-30 11:03:05 -07:00
type TypeSource interface {
TypeBool ( ) Type
TypeInt8 ( ) Type
TypeInt16 ( ) Type
TypeInt32 ( ) Type
TypeInt64 ( ) Type
TypeUInt8 ( ) Type
TypeUInt16 ( ) Type
TypeUInt32 ( ) Type
TypeUInt64 ( ) Type
TypeInt ( ) Type
2015-08-28 14:24:10 -04:00
TypeFloat32 ( ) Type
TypeFloat64 ( ) Type
2015-07-30 11:03:05 -07:00
TypeUintptr ( ) Type
TypeString ( ) Type
TypeBytePtr ( ) Type // TODO: use unsafe.Pointer instead?
2015-09-18 22:58:10 -07:00
CanSSA ( t Type ) bool
2015-07-30 11:03:05 -07:00
}
2015-08-10 12:15:52 -07:00
type Logger interface {
2016-01-29 14:44:15 -05:00
// Logf logs a message from the compiler.
2015-06-24 14:03:39 -07:00
Logf ( string , ... interface { } )
2015-06-12 11:01:13 -07:00
2016-01-29 14:44:15 -05:00
// Log returns true if logging is not a no-op
// some logging calls account for more than a few heap allocations.
Log ( ) bool
2015-06-12 11:01:13 -07:00
// Fatal reports a compiler error and exits.
2016-01-13 11:14:57 -08:00
Fatalf ( line int32 , msg string , args ... interface { } )
2015-06-12 11:01:13 -07:00
2015-10-26 17:34:06 -04:00
// Warnl writes compiler messages in the form expected by "errorcheck" tests
2016-03-13 23:04:31 -05:00
Warnl ( line int32 , fmt_ string , args ... interface { } )
2015-10-26 17:34:06 -04:00
// Fowards the Debug_checknil flag from gc
Debug_checknil ( ) bool
2015-05-27 14:52:22 -07:00
}
2015-08-10 12:15:52 -07:00
type Frontend interface {
TypeSource
Logger
// StringData returns a symbol pointing to the given string's contents.
StringData ( string ) interface { } // returns *gc.Sym
2015-08-24 02:16:19 -07:00
// Auto returns a Node for an auto variable of the given type.
// The SSA compiler uses this function to allocate space for spills.
2015-10-22 14:22:38 -07:00
Auto ( Type ) GCNode
2016-01-14 16:02:23 -08:00
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
// Given the name for a compound type, returns the name we should use
// for the parts of that compound type.
SplitString ( LocalSlot ) ( LocalSlot , LocalSlot )
SplitInterface ( LocalSlot ) ( LocalSlot , LocalSlot )
SplitSlice ( LocalSlot ) ( LocalSlot , LocalSlot , LocalSlot )
SplitComplex ( LocalSlot ) ( LocalSlot , LocalSlot )
2016-03-31 21:24:10 -07:00
SplitStruct ( LocalSlot , int ) LocalSlot
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
SplitInt64 ( LocalSlot ) ( LocalSlot , LocalSlot ) // returns (hi, lo)
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
2016-01-14 16:02:23 -08:00
// Line returns a string describing the given line number.
Line ( int32 ) string
2016-10-03 12:26:25 -07:00
// AllocFrame assigns frame offsets to all live auto variables.
AllocFrame ( f * Func )
2015-10-22 14:22:38 -07:00
}
2016-03-01 23:21:55 +00:00
// interface used to hold *gc.Node. We'd use *gc.Node directly but
2015-10-22 14:22:38 -07:00
// that would lead to an import cycle.
type GCNode interface {
Typ ( ) Type
String ( ) string
2015-08-10 12:15:52 -07:00
}
2015-04-15 15:51:25 -07:00
// NewConfig returns a new configuration object for the given architecture.
2016-01-27 16:47:23 -08:00
func NewConfig ( arch string , fe Frontend , ctxt * obj . Link , optimize bool ) * Config {
2015-05-27 14:52:22 -07:00
c := & Config { arch : arch , fe : fe }
2015-04-15 15:51:25 -07:00
switch arch {
case "amd64" :
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled
For integer types less than a machine register, we have to decide
what the invariants are for the high bits of the register. We used
to set the high bits to the correct extension (sign or zero, as
determined by the type) of the low bits.
This CL makes the compiler ignore the high bits of the register
altogether (they are junk).
On this plus side, this means ops that generate subword results don't
have to worry about correctly extending them. On the minus side,
ops that consume subword arguments have to deal with the input
registers not being correctly extended.
For x86, this tradeoff is probably worth it. Almost all opcodes
have versions that use only the correct subword piece of their
inputs. (The one big exception is array indexing.) Not many opcodes
can correctly sign extend on output.
For other architectures, the tradeoff is probably not so clear, as
they don't have many subword-safe opcodes (e.g. 16-bit compare,
ignoring the high 16/48 bits). Fortunately we can decide whether
we do this per-architecture.
For the machine-independent opcodes, we pretend that the "register"
size is equal to the type width, so sign extension is immaterial.
Opcodes that care about the signedness of the input (e.g. compare,
right shift) have two different variants.
Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
Reviewed-on: https://go-review.googlesource.com/12600
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
c . IntSize = 8
2015-07-19 15:48:20 -07:00
c . PtrSize = 8
2016-09-28 10:20:24 -04:00
c . RegSize = 8
2015-06-06 16:03:33 -07:00
c . lowerBlock = rewriteBlockAMD64
c . lowerValue = rewriteValueAMD64
2016-03-21 22:57:26 -07:00
c . registers = registersAMD64 [ : ]
2016-05-19 12:33:30 -04:00
c . gpRegMask = gpRegMaskAMD64
c . fpRegMask = fpRegMaskAMD64
c . FPReg = framepointerRegAMD64
2016-05-31 14:01:34 -04:00
c . hasGReg = false
2016-08-08 11:26:25 -07:00
case "amd64p32" :
c . IntSize = 4
c . PtrSize = 4
2016-09-28 10:20:24 -04:00
c . RegSize = 8
2016-08-08 11:26:25 -07:00
c . lowerBlock = rewriteBlockAMD64
c . lowerValue = rewriteValueAMD64
c . registers = registersAMD64 [ : ]
c . gpRegMask = gpRegMaskAMD64
c . fpRegMask = fpRegMaskAMD64
c . FPReg = framepointerRegAMD64
c . hasGReg = false
c . noDuffDevice = true
2015-04-15 15:51:25 -07:00
case "386" :
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled
For integer types less than a machine register, we have to decide
what the invariants are for the high bits of the register. We used
to set the high bits to the correct extension (sign or zero, as
determined by the type) of the low bits.
This CL makes the compiler ignore the high bits of the register
altogether (they are junk).
On this plus side, this means ops that generate subword results don't
have to worry about correctly extending them. On the minus side,
ops that consume subword arguments have to deal with the input
registers not being correctly extended.
For x86, this tradeoff is probably worth it. Almost all opcodes
have versions that use only the correct subword piece of their
inputs. (The one big exception is array indexing.) Not many opcodes
can correctly sign extend on output.
For other architectures, the tradeoff is probably not so clear, as
they don't have many subword-safe opcodes (e.g. 16-bit compare,
ignoring the high 16/48 bits). Fortunately we can decide whether
we do this per-architecture.
For the machine-independent opcodes, we pretend that the "register"
size is equal to the type width, so sign extension is immaterial.
Opcodes that care about the signedness of the input (e.g. compare,
right shift) have two different variants.
Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
Reviewed-on: https://go-review.googlesource.com/12600
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
c . IntSize = 4
2015-07-19 15:48:20 -07:00
c . PtrSize = 4
2016-09-28 10:20:24 -04:00
c . RegSize = 4
2016-07-13 13:43:08 -07:00
c . lowerBlock = rewriteBlock386
c . lowerValue = rewriteValue386
c . registers = registers386 [ : ]
c . gpRegMask = gpRegMask386
c . fpRegMask = fpRegMask386
c . FPReg = framepointerReg386
c . hasGReg = false
2016-03-21 22:57:26 -07:00
case "arm" :
c . IntSize = 4
c . PtrSize = 4
2016-09-28 10:20:24 -04:00
c . RegSize = 4
2016-03-21 22:57:26 -07:00
c . lowerBlock = rewriteBlockARM
c . lowerValue = rewriteValueARM
c . registers = registersARM [ : ]
2016-05-19 12:33:30 -04:00
c . gpRegMask = gpRegMaskARM
c . fpRegMask = fpRegMaskARM
c . FPReg = framepointerRegARM
2016-05-31 14:01:34 -04:00
c . hasGReg = true
2016-07-21 12:42:49 -04:00
case "arm64" :
c . IntSize = 8
c . PtrSize = 8
2016-09-28 10:20:24 -04:00
c . RegSize = 8
2016-07-21 12:42:49 -04:00
c . lowerBlock = rewriteBlockARM64
c . lowerValue = rewriteValueARM64
c . registers = registersARM64 [ : ]
c . gpRegMask = gpRegMaskARM64
c . fpRegMask = fpRegMaskARM64
c . FPReg = framepointerRegARM64
c . hasGReg = true
2016-09-09 08:13:16 -04:00
c . noDuffDevice = obj . GOOS == "darwin" // darwin linker cannot handle BR26 reloc with non-zero addend
2016-09-16 15:02:47 -07:00
case "ppc64" :
c . OldArch = true
fallthrough
case "ppc64le" :
2016-06-24 14:37:17 -05:00
c . IntSize = 8
c . PtrSize = 8
2016-09-28 10:20:24 -04:00
c . RegSize = 8
2016-06-24 14:37:17 -05:00
c . lowerBlock = rewriteBlockPPC64
c . lowerValue = rewriteValuePPC64
c . registers = registersPPC64 [ : ]
c . gpRegMask = gpRegMaskPPC64
c . fpRegMask = fpRegMaskPPC64
c . FPReg = framepointerRegPPC64
2016-07-27 13:54:07 -07:00
c . noDuffDevice = true // TODO: Resolve PPC64 DuffDevice (has zero, but not copy)
2016-08-10 11:44:57 -07:00
c . NeedsFpScratch = true
2016-07-26 09:24:18 -07:00
c . hasGReg = true
2016-08-19 16:35:36 -04:00
case "mips64" , "mips64le" :
c . IntSize = 8
c . PtrSize = 8
2016-09-28 10:20:24 -04:00
c . RegSize = 8
2016-08-19 16:35:36 -04:00
c . lowerBlock = rewriteBlockMIPS64
c . lowerValue = rewriteValueMIPS64
c . registers = registersMIPS64 [ : ]
c . gpRegMask = gpRegMaskMIPS64
c . fpRegMask = fpRegMaskMIPS64
2016-08-22 12:25:23 -04:00
c . specialRegMask = specialRegMaskMIPS64
2016-08-19 16:35:36 -04:00
c . FPReg = framepointerRegMIPS64
c . hasGReg = true
2016-09-12 14:50:10 -04:00
case "s390x" :
c . IntSize = 8
c . PtrSize = 8
2016-09-28 10:20:24 -04:00
c . RegSize = 8
2016-09-12 14:50:10 -04:00
c . lowerBlock = rewriteBlockS390X
c . lowerValue = rewriteValueS390X
c . registers = registersS390X [ : ]
c . gpRegMask = gpRegMaskS390X
c . fpRegMask = fpRegMaskS390X
c . FPReg = framepointerRegS390X
c . hasGReg = true
c . noDuffDevice = true
2015-04-15 15:51:25 -07:00
default :
2016-09-14 10:01:05 -07:00
fe . Fatalf ( 0 , "arch %s not implemented" , arch )
2015-04-15 15:51:25 -07:00
}
2015-10-22 13:07:38 -07:00
c . ctxt = ctxt
2016-01-27 16:47:23 -08:00
c . optimize = optimize
2016-09-09 08:13:16 -04:00
c . nacl = obj . GOOS == "nacl"
2015-04-15 15:51:25 -07:00
2016-07-07 10:49:43 -04:00
// Don't use Duff's device on Plan 9 AMD64, because floating
2016-03-03 19:45:24 +01:00
// point operations are not allowed in note handler.
2016-09-09 08:13:16 -04:00
if obj . GOOS == "plan9" && arch == "amd64" {
2016-03-03 19:45:24 +01:00
c . noDuffDevice = true
}
2016-07-07 10:49:43 -04:00
if c . nacl {
c . noDuffDevice = true // Don't use Duff's device on NaCl
2016-09-19 07:45:08 -04:00
// runtime call clobber R12 on nacl
opcodeTable [ OpARMUDIVrtcall ] . reg . clobbers |= 1 << 12 // R12
2016-07-07 10:49:43 -04:00
}
2016-01-28 13:46:30 -08:00
// Assign IDs to preallocated values/blocks.
for i := range c . values {
c . values [ i ] . ID = ID ( i )
}
for i := range c . blocks {
c . blocks [ i ] . ID = ID ( i )
}
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
c . logfiles = make ( map [ string ] * os . File )
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
// cutoff is compared with product of numblocks and numvalues,
// if product is smaller than cutoff, use old non-sparse method.
// cutoff == 0 implies all sparse.
// cutoff == -1 implies none sparse.
// Good cutoff values seem to be O(million) depending on constant factor cost of sparse.
// TODO: get this from a flag, not an environment variable
c . sparsePhiCutoff = 2500000 // 0 for testing. // 2500000 determined with crude experiments w/ make.bash
ev := os . Getenv ( "GO_SSA_PHI_LOC_CUTOFF" )
if ev != "" {
v , err := strconv . ParseInt ( ev , 10 , 64 )
if err != nil {
fe . Fatalf ( 0 , "Environment variable GO_SSA_PHI_LOC_CUTOFF (value '%s') did not parse as a number" , ev )
}
c . sparsePhiCutoff = uint64 ( v ) // convert -1 to maxint, for never use sparse
}
2015-04-15 15:51:25 -07:00
return c
}
2016-07-26 11:51:33 -07:00
func ( c * Config ) Set387 ( b bool ) {
2016-08-10 11:44:57 -07:00
c . NeedsFpScratch = b
2016-07-26 11:51:33 -07:00
c . use387 = b
}
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
func ( c * Config ) Frontend ( ) Frontend { return c . fe }
func ( c * Config ) SparsePhiCutoff ( ) uint64 { return c . sparsePhiCutoff }
2016-08-15 13:51:00 -07:00
func ( c * Config ) Ctxt ( ) * obj . Link { return c . ctxt }
2015-07-22 13:13:53 -07:00
2016-01-28 13:46:30 -08:00
// NewFunc returns a new, empty function object.
// Caller must call f.Free() before calling NewFunc again.
2015-04-15 15:51:25 -07:00
func ( c * Config ) NewFunc ( ) * Func {
// TODO(khr): should this function take name, type, etc. as arguments?
2016-01-28 13:46:30 -08:00
if c . curFunc != nil {
c . Fatalf ( 0 , "NewFunc called without previous Free" )
}
f := & Func { Config : c , NamedValues : map [ LocalSlot ] [ ] * Value { } }
c . curFunc = f
return f
2015-04-15 15:51:25 -07:00
}
2016-01-13 11:14:57 -08:00
func ( c * Config ) Logf ( msg string , args ... interface { } ) { c . fe . Logf ( msg , args ... ) }
2016-01-29 14:44:15 -05:00
func ( c * Config ) Log ( ) bool { return c . fe . Log ( ) }
2016-01-13 11:14:57 -08:00
func ( c * Config ) Fatalf ( line int32 , msg string , args ... interface { } ) { c . fe . Fatalf ( line , msg , args ... ) }
2016-09-14 10:01:05 -07:00
func ( c * Config ) Warnl ( line int32 , msg string , args ... interface { } ) { c . fe . Warnl ( line , msg , args ... ) }
func ( c * Config ) Debug_checknil ( ) bool { return c . fe . Debug_checknil ( ) }
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
func ( c * Config ) logDebugHashMatch ( evname , name string ) {
2016-03-24 20:57:53 +11:00
file := c . logfiles [ evname ]
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
if file == nil {
file = os . Stdout
tmpfile := os . Getenv ( "GSHS_LOGFILE" )
if tmpfile != "" {
var ok error
file , ok = os . Create ( tmpfile )
if ok != nil {
c . Fatalf ( 0 , "Could not open hash-testing logfile %s" , tmpfile )
}
}
c . logfiles [ evname ] = file
}
s := fmt . Sprintf ( "%s triggered %s\n" , evname , name )
file . WriteString ( s )
file . Sync ( )
}
// DebugHashMatch returns true if environment variable evname
// 1) is empty (this is a special more-quickly implemented case of 3)
// 2) is "y" or "Y"
// 3) is a suffix of the sha1 hash of name
// 4) is a suffix of the environment variable
// fmt.Sprintf("%s%d", evname, n)
// provided that all such variables are nonempty for 0 <= i <= n
// Otherwise it returns false.
// When true is returned the message
// "%s triggered %s\n", evname, name
// is printed on the file named in environment variable
// GSHS_LOGFILE
// or standard out if that is empty or there is an error
// opening the file.
func ( c * Config ) DebugHashMatch ( evname , name string ) bool {
evhash := os . Getenv ( evname )
if evhash == "" {
return true // default behavior with no EV is "on"
}
if evhash == "y" || evhash == "Y" {
c . logDebugHashMatch ( evname , name )
return true
}
if evhash == "n" || evhash == "N" {
return false
}
// Check the hash of the name against a partial input hash.
// We use this feature to do a binary search to
// find a function that is incorrectly compiled.
hstr := ""
for _ , b := range sha1 . Sum ( [ ] byte ( name ) ) {
hstr += fmt . Sprintf ( "%08b" , b )
}
if strings . HasSuffix ( hstr , evhash ) {
c . logDebugHashMatch ( evname , name )
return true
}
// Iteratively try additional hashes to allow tests for multi-point
// failure.
for i := 0 ; true ; i ++ {
ev := fmt . Sprintf ( "%s%d" , evname , i )
evv := os . Getenv ( ev )
if evv == "" {
break
}
if strings . HasSuffix ( hstr , evv ) {
c . logDebugHashMatch ( ev , name )
return true
}
}
return false
}
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
func ( c * Config ) DebugNameMatch ( evname , name string ) bool {
return os . Getenv ( evname ) == name
}