2015-04-15 15:51:25 -07:00
|
|
|
// Copyright 2015 The Go Authors. All rights reserved.
|
|
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
package ssa
|
|
|
|
|
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
import (
|
|
|
|
|
"cmd/internal/obj"
|
|
|
|
|
"crypto/sha1"
|
|
|
|
|
"fmt"
|
|
|
|
|
"os"
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
"strconv"
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
"strings"
|
|
|
|
|
)
|
2015-08-24 02:16:19 -07:00
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
type Config struct {
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
arch string // "amd64", etc.
|
|
|
|
|
IntSize int64 // 4 or 8
|
|
|
|
|
PtrSize int64 // 4 or 8
|
|
|
|
|
lowerBlock func(*Block) bool // lowering function
|
|
|
|
|
lowerValue func(*Value, *Config) bool // lowering function
|
|
|
|
|
registers []Register // machine registers
|
2016-05-19 12:33:30 -04:00
|
|
|
gpRegMask regMask // general purpose integer register mask
|
|
|
|
|
fpRegMask regMask // floating point register mask
|
2016-05-27 15:18:49 -04:00
|
|
|
flagRegMask regMask // flag register mask
|
2016-05-19 12:33:30 -04:00
|
|
|
FPReg int8 // register number of frame pointer, -1 if not used
|
2016-05-31 14:01:34 -04:00
|
|
|
hasGReg bool // has hardware g register
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
fe Frontend // callbacks into compiler frontend
|
|
|
|
|
HTML *HTMLWriter // html writer, for debugging
|
|
|
|
|
ctxt *obj.Link // Generic arch information
|
|
|
|
|
optimize bool // Do optimization
|
|
|
|
|
noDuffDevice bool // Don't use Duff's device
|
|
|
|
|
sparsePhiCutoff uint64 // Sparse phi location algorithm used above this #blocks*#variables score
|
|
|
|
|
curFunc *Func
|
2015-04-15 15:51:25 -07:00
|
|
|
|
2016-03-01 23:21:55 +00:00
|
|
|
// TODO: more stuff. Compiler flags of interest, ...
|
2016-01-28 13:46:30 -08:00
|
|
|
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
// Given an environment variable used for debug hash match,
|
|
|
|
|
// what file (if any) receives the yes/no logging?
|
|
|
|
|
logfiles map[string]*os.File
|
|
|
|
|
|
2016-01-28 13:46:30 -08:00
|
|
|
// Storage for low-numbered values and blocks.
|
|
|
|
|
values [2000]Value
|
|
|
|
|
blocks [200]Block
|
2016-02-02 06:35:34 -05:00
|
|
|
|
2016-03-21 07:20:01 +11:00
|
|
|
// Reusable stackAllocState.
|
|
|
|
|
// See stackalloc.go's {new,put}StackAllocState.
|
|
|
|
|
stackAllocState *stackAllocState
|
|
|
|
|
|
2016-02-10 17:43:31 -05:00
|
|
|
domblockstore []ID // scratch space for computing dominators
|
|
|
|
|
scrSparse []*sparseSet // scratch sparse sets to be re-used.
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
|
|
|
|
|
2015-07-30 11:03:05 -07:00
|
|
|
type TypeSource interface {
|
|
|
|
|
TypeBool() Type
|
|
|
|
|
TypeInt8() Type
|
|
|
|
|
TypeInt16() Type
|
|
|
|
|
TypeInt32() Type
|
|
|
|
|
TypeInt64() Type
|
|
|
|
|
TypeUInt8() Type
|
|
|
|
|
TypeUInt16() Type
|
|
|
|
|
TypeUInt32() Type
|
|
|
|
|
TypeUInt64() Type
|
|
|
|
|
TypeInt() Type
|
2015-08-28 14:24:10 -04:00
|
|
|
TypeFloat32() Type
|
|
|
|
|
TypeFloat64() Type
|
2015-07-30 11:03:05 -07:00
|
|
|
TypeUintptr() Type
|
|
|
|
|
TypeString() Type
|
|
|
|
|
TypeBytePtr() Type // TODO: use unsafe.Pointer instead?
|
2015-09-18 22:58:10 -07:00
|
|
|
|
|
|
|
|
CanSSA(t Type) bool
|
2015-07-30 11:03:05 -07:00
|
|
|
}
|
|
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
type Logger interface {
|
2016-01-29 14:44:15 -05:00
|
|
|
// Logf logs a message from the compiler.
|
2015-06-24 14:03:39 -07:00
|
|
|
Logf(string, ...interface{})
|
2015-06-12 11:01:13 -07:00
|
|
|
|
2016-01-29 14:44:15 -05:00
|
|
|
// Log returns true if logging is not a no-op
|
|
|
|
|
// some logging calls account for more than a few heap allocations.
|
|
|
|
|
Log() bool
|
|
|
|
|
|
2015-06-12 11:01:13 -07:00
|
|
|
// Fatal reports a compiler error and exits.
|
2016-01-13 11:14:57 -08:00
|
|
|
Fatalf(line int32, msg string, args ...interface{})
|
2015-06-12 11:01:13 -07:00
|
|
|
|
|
|
|
|
// Unimplemented reports that the function cannot be compiled.
|
|
|
|
|
// It will be removed once SSA work is complete.
|
2016-01-13 11:14:57 -08:00
|
|
|
Unimplementedf(line int32, msg string, args ...interface{})
|
2015-10-26 17:34:06 -04:00
|
|
|
|
|
|
|
|
// Warnl writes compiler messages in the form expected by "errorcheck" tests
|
2016-03-13 23:04:31 -05:00
|
|
|
Warnl(line int32, fmt_ string, args ...interface{})
|
2015-10-26 17:34:06 -04:00
|
|
|
|
|
|
|
|
// Fowards the Debug_checknil flag from gc
|
|
|
|
|
Debug_checknil() bool
|
2015-05-27 14:52:22 -07:00
|
|
|
}
|
|
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
type Frontend interface {
|
|
|
|
|
TypeSource
|
|
|
|
|
Logger
|
|
|
|
|
|
|
|
|
|
// StringData returns a symbol pointing to the given string's contents.
|
|
|
|
|
StringData(string) interface{} // returns *gc.Sym
|
2015-08-24 02:16:19 -07:00
|
|
|
|
|
|
|
|
// Auto returns a Node for an auto variable of the given type.
|
|
|
|
|
// The SSA compiler uses this function to allocate space for spills.
|
2015-10-22 14:22:38 -07:00
|
|
|
Auto(Type) GCNode
|
2016-01-14 16:02:23 -08:00
|
|
|
|
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
|
|
|
// Given the name for a compound type, returns the name we should use
|
|
|
|
|
// for the parts of that compound type.
|
|
|
|
|
SplitString(LocalSlot) (LocalSlot, LocalSlot)
|
|
|
|
|
SplitInterface(LocalSlot) (LocalSlot, LocalSlot)
|
|
|
|
|
SplitSlice(LocalSlot) (LocalSlot, LocalSlot, LocalSlot)
|
|
|
|
|
SplitComplex(LocalSlot) (LocalSlot, LocalSlot)
|
2016-03-31 21:24:10 -07:00
|
|
|
SplitStruct(LocalSlot, int) LocalSlot
|
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
|
|
|
SplitInt64(LocalSlot) (LocalSlot, LocalSlot) // returns (hi, lo)
|
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
|
|
|
|
2016-01-14 16:02:23 -08:00
|
|
|
// Line returns a string describing the given line number.
|
|
|
|
|
Line(int32) string
|
2015-10-22 14:22:38 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-01 23:21:55 +00:00
|
|
|
// interface used to hold *gc.Node. We'd use *gc.Node directly but
|
2015-10-22 14:22:38 -07:00
|
|
|
// that would lead to an import cycle.
|
|
|
|
|
type GCNode interface {
|
|
|
|
|
Typ() Type
|
|
|
|
|
String() string
|
2015-08-10 12:15:52 -07:00
|
|
|
}
|
|
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
// NewConfig returns a new configuration object for the given architecture.
|
2016-01-27 16:47:23 -08:00
|
|
|
func NewConfig(arch string, fe Frontend, ctxt *obj.Link, optimize bool) *Config {
|
2015-05-27 14:52:22 -07:00
|
|
|
c := &Config{arch: arch, fe: fe}
|
2015-04-15 15:51:25 -07:00
|
|
|
switch arch {
|
|
|
|
|
case "amd64":
|
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled
For integer types less than a machine register, we have to decide
what the invariants are for the high bits of the register. We used
to set the high bits to the correct extension (sign or zero, as
determined by the type) of the low bits.
This CL makes the compiler ignore the high bits of the register
altogether (they are junk).
On this plus side, this means ops that generate subword results don't
have to worry about correctly extending them. On the minus side,
ops that consume subword arguments have to deal with the input
registers not being correctly extended.
For x86, this tradeoff is probably worth it. Almost all opcodes
have versions that use only the correct subword piece of their
inputs. (The one big exception is array indexing.) Not many opcodes
can correctly sign extend on output.
For other architectures, the tradeoff is probably not so clear, as
they don't have many subword-safe opcodes (e.g. 16-bit compare,
ignoring the high 16/48 bits). Fortunately we can decide whether
we do this per-architecture.
For the machine-independent opcodes, we pretend that the "register"
size is equal to the type width, so sign extension is immaterial.
Opcodes that care about the signedness of the input (e.g. compare,
right shift) have two different variants.
Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
Reviewed-on: https://go-review.googlesource.com/12600
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
|
|
|
c.IntSize = 8
|
2015-07-19 15:48:20 -07:00
|
|
|
c.PtrSize = 8
|
2015-06-06 16:03:33 -07:00
|
|
|
c.lowerBlock = rewriteBlockAMD64
|
|
|
|
|
c.lowerValue = rewriteValueAMD64
|
2016-03-21 22:57:26 -07:00
|
|
|
c.registers = registersAMD64[:]
|
2016-05-19 12:33:30 -04:00
|
|
|
c.gpRegMask = gpRegMaskAMD64
|
|
|
|
|
c.fpRegMask = fpRegMaskAMD64
|
2016-05-13 11:25:07 -04:00
|
|
|
c.flagRegMask = flagRegMaskAMD64
|
2016-05-19 12:33:30 -04:00
|
|
|
c.FPReg = framepointerRegAMD64
|
2016-05-31 14:01:34 -04:00
|
|
|
c.hasGReg = false
|
2015-04-15 15:51:25 -07:00
|
|
|
case "386":
|
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled
For integer types less than a machine register, we have to decide
what the invariants are for the high bits of the register. We used
to set the high bits to the correct extension (sign or zero, as
determined by the type) of the low bits.
This CL makes the compiler ignore the high bits of the register
altogether (they are junk).
On this plus side, this means ops that generate subword results don't
have to worry about correctly extending them. On the minus side,
ops that consume subword arguments have to deal with the input
registers not being correctly extended.
For x86, this tradeoff is probably worth it. Almost all opcodes
have versions that use only the correct subword piece of their
inputs. (The one big exception is array indexing.) Not many opcodes
can correctly sign extend on output.
For other architectures, the tradeoff is probably not so clear, as
they don't have many subword-safe opcodes (e.g. 16-bit compare,
ignoring the high 16/48 bits). Fortunately we can decide whether
we do this per-architecture.
For the machine-independent opcodes, we pretend that the "register"
size is equal to the type width, so sign extension is immaterial.
Opcodes that care about the signedness of the input (e.g. compare,
right shift) have two different variants.
Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
Reviewed-on: https://go-review.googlesource.com/12600
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
|
|
|
c.IntSize = 4
|
2015-07-19 15:48:20 -07:00
|
|
|
c.PtrSize = 4
|
2016-07-13 13:43:08 -07:00
|
|
|
c.lowerBlock = rewriteBlock386
|
|
|
|
|
c.lowerValue = rewriteValue386
|
|
|
|
|
c.registers = registers386[:]
|
|
|
|
|
c.gpRegMask = gpRegMask386
|
|
|
|
|
c.fpRegMask = fpRegMask386
|
|
|
|
|
c.flagRegMask = flagRegMask386
|
|
|
|
|
c.FPReg = framepointerReg386
|
|
|
|
|
c.hasGReg = false
|
2016-03-21 22:57:26 -07:00
|
|
|
case "arm":
|
|
|
|
|
c.IntSize = 4
|
|
|
|
|
c.PtrSize = 4
|
|
|
|
|
c.lowerBlock = rewriteBlockARM
|
|
|
|
|
c.lowerValue = rewriteValueARM
|
|
|
|
|
c.registers = registersARM[:]
|
2016-05-19 12:33:30 -04:00
|
|
|
c.gpRegMask = gpRegMaskARM
|
|
|
|
|
c.fpRegMask = fpRegMaskARM
|
2016-05-13 11:25:07 -04:00
|
|
|
c.flagRegMask = flagRegMaskARM
|
2016-05-19 12:33:30 -04:00
|
|
|
c.FPReg = framepointerRegARM
|
2016-05-31 14:01:34 -04:00
|
|
|
c.hasGReg = true
|
2016-06-24 14:37:17 -05:00
|
|
|
case "ppc64le":
|
|
|
|
|
c.IntSize = 8
|
|
|
|
|
c.PtrSize = 8
|
|
|
|
|
c.lowerBlock = rewriteBlockPPC64
|
|
|
|
|
c.lowerValue = rewriteValuePPC64
|
|
|
|
|
c.registers = registersPPC64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskPPC64
|
|
|
|
|
c.fpRegMask = fpRegMaskPPC64
|
|
|
|
|
c.FPReg = framepointerRegPPC64
|
2015-04-15 15:51:25 -07:00
|
|
|
default:
|
2016-01-13 11:14:57 -08:00
|
|
|
fe.Unimplementedf(0, "arch %s not implemented", arch)
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
2015-10-22 13:07:38 -07:00
|
|
|
c.ctxt = ctxt
|
2016-01-27 16:47:23 -08:00
|
|
|
c.optimize = optimize
|
2015-04-15 15:51:25 -07:00
|
|
|
|
2016-03-03 19:45:24 +01:00
|
|
|
// Don't use Duff's device on Plan 9, because floating
|
|
|
|
|
// point operations are not allowed in note handler.
|
|
|
|
|
if obj.Getgoos() == "plan9" {
|
|
|
|
|
c.noDuffDevice = true
|
|
|
|
|
}
|
|
|
|
|
|
2016-01-28 13:46:30 -08:00
|
|
|
// Assign IDs to preallocated values/blocks.
|
|
|
|
|
for i := range c.values {
|
|
|
|
|
c.values[i].ID = ID(i)
|
|
|
|
|
}
|
|
|
|
|
for i := range c.blocks {
|
|
|
|
|
c.blocks[i].ID = ID(i)
|
|
|
|
|
}
|
|
|
|
|
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
c.logfiles = make(map[string]*os.File)
|
|
|
|
|
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
// cutoff is compared with product of numblocks and numvalues,
|
|
|
|
|
// if product is smaller than cutoff, use old non-sparse method.
|
|
|
|
|
// cutoff == 0 implies all sparse.
|
|
|
|
|
// cutoff == -1 implies none sparse.
|
|
|
|
|
// Good cutoff values seem to be O(million) depending on constant factor cost of sparse.
|
|
|
|
|
// TODO: get this from a flag, not an environment variable
|
|
|
|
|
c.sparsePhiCutoff = 2500000 // 0 for testing. // 2500000 determined with crude experiments w/ make.bash
|
|
|
|
|
ev := os.Getenv("GO_SSA_PHI_LOC_CUTOFF")
|
|
|
|
|
if ev != "" {
|
|
|
|
|
v, err := strconv.ParseInt(ev, 10, 64)
|
|
|
|
|
if err != nil {
|
|
|
|
|
fe.Fatalf(0, "Environment variable GO_SSA_PHI_LOC_CUTOFF (value '%s') did not parse as a number", ev)
|
|
|
|
|
}
|
|
|
|
|
c.sparsePhiCutoff = uint64(v) // convert -1 to maxint, for never use sparse
|
|
|
|
|
}
|
|
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
return c
|
|
|
|
|
}
|
|
|
|
|
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
func (c *Config) Frontend() Frontend { return c.fe }
|
|
|
|
|
func (c *Config) SparsePhiCutoff() uint64 { return c.sparsePhiCutoff }
|
2015-07-22 13:13:53 -07:00
|
|
|
|
2016-01-28 13:46:30 -08:00
|
|
|
// NewFunc returns a new, empty function object.
|
|
|
|
|
// Caller must call f.Free() before calling NewFunc again.
|
2015-04-15 15:51:25 -07:00
|
|
|
func (c *Config) NewFunc() *Func {
|
|
|
|
|
// TODO(khr): should this function take name, type, etc. as arguments?
|
2016-01-28 13:46:30 -08:00
|
|
|
if c.curFunc != nil {
|
|
|
|
|
c.Fatalf(0, "NewFunc called without previous Free")
|
|
|
|
|
}
|
|
|
|
|
f := &Func{Config: c, NamedValues: map[LocalSlot][]*Value{}}
|
|
|
|
|
c.curFunc = f
|
|
|
|
|
return f
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
|
|
|
|
|
2016-01-13 11:14:57 -08:00
|
|
|
func (c *Config) Logf(msg string, args ...interface{}) { c.fe.Logf(msg, args...) }
|
2016-01-29 14:44:15 -05:00
|
|
|
func (c *Config) Log() bool { return c.fe.Log() }
|
2016-01-13 11:14:57 -08:00
|
|
|
func (c *Config) Fatalf(line int32, msg string, args ...interface{}) { c.fe.Fatalf(line, msg, args...) }
|
|
|
|
|
func (c *Config) Unimplementedf(line int32, msg string, args ...interface{}) {
|
|
|
|
|
c.fe.Unimplementedf(line, msg, args...)
|
|
|
|
|
}
|
2016-03-13 23:04:31 -05:00
|
|
|
func (c *Config) Warnl(line int32, msg string, args ...interface{}) { c.fe.Warnl(line, msg, args...) }
|
|
|
|
|
func (c *Config) Debug_checknil() bool { return c.fe.Debug_checknil() }
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
|
|
|
|
|
func (c *Config) logDebugHashMatch(evname, name string) {
|
2016-03-24 20:57:53 +11:00
|
|
|
file := c.logfiles[evname]
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
if file == nil {
|
|
|
|
|
file = os.Stdout
|
|
|
|
|
tmpfile := os.Getenv("GSHS_LOGFILE")
|
|
|
|
|
if tmpfile != "" {
|
|
|
|
|
var ok error
|
|
|
|
|
file, ok = os.Create(tmpfile)
|
|
|
|
|
if ok != nil {
|
|
|
|
|
c.Fatalf(0, "Could not open hash-testing logfile %s", tmpfile)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
c.logfiles[evname] = file
|
|
|
|
|
}
|
|
|
|
|
s := fmt.Sprintf("%s triggered %s\n", evname, name)
|
|
|
|
|
file.WriteString(s)
|
|
|
|
|
file.Sync()
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// DebugHashMatch returns true if environment variable evname
|
|
|
|
|
// 1) is empty (this is a special more-quickly implemented case of 3)
|
|
|
|
|
// 2) is "y" or "Y"
|
|
|
|
|
// 3) is a suffix of the sha1 hash of name
|
|
|
|
|
// 4) is a suffix of the environment variable
|
|
|
|
|
// fmt.Sprintf("%s%d", evname, n)
|
|
|
|
|
// provided that all such variables are nonempty for 0 <= i <= n
|
|
|
|
|
// Otherwise it returns false.
|
|
|
|
|
// When true is returned the message
|
|
|
|
|
// "%s triggered %s\n", evname, name
|
|
|
|
|
// is printed on the file named in environment variable
|
|
|
|
|
// GSHS_LOGFILE
|
|
|
|
|
// or standard out if that is empty or there is an error
|
|
|
|
|
// opening the file.
|
|
|
|
|
|
|
|
|
|
func (c *Config) DebugHashMatch(evname, name string) bool {
|
|
|
|
|
evhash := os.Getenv(evname)
|
|
|
|
|
if evhash == "" {
|
|
|
|
|
return true // default behavior with no EV is "on"
|
|
|
|
|
}
|
|
|
|
|
if evhash == "y" || evhash == "Y" {
|
|
|
|
|
c.logDebugHashMatch(evname, name)
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if evhash == "n" || evhash == "N" {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
// Check the hash of the name against a partial input hash.
|
|
|
|
|
// We use this feature to do a binary search to
|
|
|
|
|
// find a function that is incorrectly compiled.
|
|
|
|
|
hstr := ""
|
|
|
|
|
for _, b := range sha1.Sum([]byte(name)) {
|
|
|
|
|
hstr += fmt.Sprintf("%08b", b)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
if strings.HasSuffix(hstr, evhash) {
|
|
|
|
|
c.logDebugHashMatch(evname, name)
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Iteratively try additional hashes to allow tests for multi-point
|
|
|
|
|
// failure.
|
|
|
|
|
for i := 0; true; i++ {
|
|
|
|
|
ev := fmt.Sprintf("%s%d", evname, i)
|
|
|
|
|
evv := os.Getenv(ev)
|
|
|
|
|
if evv == "" {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
if strings.HasSuffix(hstr, evv) {
|
|
|
|
|
c.logDebugHashMatch(ev, name)
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
|
|
|
|
|
func (c *Config) DebugNameMatch(evname, name string) bool {
|
|
|
|
|
return os.Getenv(evname) == name
|
|
|
|
|
}
|