2015-04-15 15:51:25 -07:00
|
|
|
// Copyright 2015 The Go Authors. All rights reserved.
|
|
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
package ssa
|
|
|
|
|
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
import (
|
|
|
|
|
"cmd/internal/obj"
|
2016-12-06 17:08:06 -08:00
|
|
|
"cmd/internal/src"
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
"os"
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
"strconv"
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
)
|
2015-08-24 02:16:19 -07:00
|
|
|
|
2017-03-14 16:44:48 -07:00
|
|
|
// A Config holds readonly compilation information.
|
|
|
|
|
// It is created once, early during compilation,
|
|
|
|
|
// and shared across all compilations.
|
2015-04-15 15:51:25 -07:00
|
|
|
type Config struct {
|
2017-03-17 10:50:20 -07:00
|
|
|
arch string // "amd64", etc.
|
|
|
|
|
IntSize int64 // 4 or 8
|
|
|
|
|
PtrSize int64 // 4 or 8
|
|
|
|
|
RegSize int64 // 4 or 8
|
|
|
|
|
lowerBlock blockRewriter // lowering function
|
|
|
|
|
lowerValue valueRewriter // lowering function
|
|
|
|
|
registers []Register // machine registers
|
|
|
|
|
gpRegMask regMask // general purpose integer register mask
|
|
|
|
|
fpRegMask regMask // floating point register mask
|
|
|
|
|
specialRegMask regMask // special register mask
|
|
|
|
|
FPReg int8 // register number of frame pointer, -1 if not used
|
|
|
|
|
LinkReg int8 // register number of link register if it is a general purpose register, -1 if not used
|
|
|
|
|
hasGReg bool // has hardware g register
|
|
|
|
|
fe Frontend // callbacks into compiler frontend
|
|
|
|
|
ctxt *obj.Link // Generic arch information
|
|
|
|
|
optimize bool // Do optimization
|
|
|
|
|
noDuffDevice bool // Don't use Duff's device
|
|
|
|
|
nacl bool // GOOS=nacl
|
|
|
|
|
use387 bool // GO386=387
|
|
|
|
|
OldArch bool // True for older versions of architecture, e.g. true for PPC64BE, false for PPC64LE
|
|
|
|
|
NeedsFpScratch bool // No direct move between GP and FP register sets
|
|
|
|
|
BigEndian bool //
|
|
|
|
|
sparsePhiCutoff uint64 // Sparse phi location algorithm used above this #blocks*#variables score
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
|
|
|
|
|
2017-03-17 10:50:20 -07:00
|
|
|
type (
|
|
|
|
|
blockRewriter func(*Block) bool
|
|
|
|
|
valueRewriter func(*Value) bool
|
|
|
|
|
)
|
|
|
|
|
|
2015-07-30 11:03:05 -07:00
|
|
|
type TypeSource interface {
|
|
|
|
|
TypeBool() Type
|
|
|
|
|
TypeInt8() Type
|
|
|
|
|
TypeInt16() Type
|
|
|
|
|
TypeInt32() Type
|
|
|
|
|
TypeInt64() Type
|
|
|
|
|
TypeUInt8() Type
|
|
|
|
|
TypeUInt16() Type
|
|
|
|
|
TypeUInt32() Type
|
|
|
|
|
TypeUInt64() Type
|
|
|
|
|
TypeInt() Type
|
2015-08-28 14:24:10 -04:00
|
|
|
TypeFloat32() Type
|
|
|
|
|
TypeFloat64() Type
|
2015-07-30 11:03:05 -07:00
|
|
|
TypeUintptr() Type
|
|
|
|
|
TypeString() Type
|
|
|
|
|
TypeBytePtr() Type // TODO: use unsafe.Pointer instead?
|
2015-09-18 22:58:10 -07:00
|
|
|
|
|
|
|
|
CanSSA(t Type) bool
|
2015-07-30 11:03:05 -07:00
|
|
|
}
|
|
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
type Logger interface {
|
2016-01-29 14:44:15 -05:00
|
|
|
// Logf logs a message from the compiler.
|
2015-06-24 14:03:39 -07:00
|
|
|
Logf(string, ...interface{})
|
2015-06-12 11:01:13 -07:00
|
|
|
|
2016-01-29 14:44:15 -05:00
|
|
|
// Log returns true if logging is not a no-op
|
|
|
|
|
// some logging calls account for more than a few heap allocations.
|
|
|
|
|
Log() bool
|
|
|
|
|
|
2015-06-12 11:01:13 -07:00
|
|
|
// Fatal reports a compiler error and exits.
|
2016-12-15 17:17:01 -08:00
|
|
|
Fatalf(pos src.XPos, msg string, args ...interface{})
|
2015-06-12 11:01:13 -07:00
|
|
|
|
2017-02-10 10:15:10 -05:00
|
|
|
// Error reports a compiler error but keep going.
|
|
|
|
|
Error(pos src.XPos, msg string, args ...interface{})
|
|
|
|
|
|
2015-10-26 17:34:06 -04:00
|
|
|
// Warnl writes compiler messages in the form expected by "errorcheck" tests
|
2016-12-15 17:17:01 -08:00
|
|
|
Warnl(pos src.XPos, fmt_ string, args ...interface{})
|
2015-10-26 17:34:06 -04:00
|
|
|
|
2017-01-07 08:23:11 -08:00
|
|
|
// Forwards the Debug flags from gc
|
2015-10-26 17:34:06 -04:00
|
|
|
Debug_checknil() bool
|
2016-10-13 06:57:00 -04:00
|
|
|
Debug_wb() bool
|
2015-05-27 14:52:22 -07:00
|
|
|
}
|
|
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
type Frontend interface {
|
|
|
|
|
TypeSource
|
|
|
|
|
Logger
|
|
|
|
|
|
|
|
|
|
// StringData returns a symbol pointing to the given string's contents.
|
|
|
|
|
StringData(string) interface{} // returns *gc.Sym
|
2015-08-24 02:16:19 -07:00
|
|
|
|
|
|
|
|
// Auto returns a Node for an auto variable of the given type.
|
|
|
|
|
// The SSA compiler uses this function to allocate space for spills.
|
2015-10-22 14:22:38 -07:00
|
|
|
Auto(Type) GCNode
|
2016-01-14 16:02:23 -08:00
|
|
|
|
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
|
|
|
// Given the name for a compound type, returns the name we should use
|
|
|
|
|
// for the parts of that compound type.
|
|
|
|
|
SplitString(LocalSlot) (LocalSlot, LocalSlot)
|
|
|
|
|
SplitInterface(LocalSlot) (LocalSlot, LocalSlot)
|
|
|
|
|
SplitSlice(LocalSlot) (LocalSlot, LocalSlot, LocalSlot)
|
|
|
|
|
SplitComplex(LocalSlot) (LocalSlot, LocalSlot)
|
2016-03-31 21:24:10 -07:00
|
|
|
SplitStruct(LocalSlot, int) LocalSlot
|
2016-10-30 21:10:03 -07:00
|
|
|
SplitArray(LocalSlot) LocalSlot // array must be length 1
|
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
|
|
|
SplitInt64(LocalSlot) (LocalSlot, LocalSlot) // returns (hi, lo)
|
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
|
|
|
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
// DerefItab dereferences an itab function
|
|
|
|
|
// entry, given the symbol of the itab and
|
|
|
|
|
// the byte offset of the function pointer.
|
|
|
|
|
// It may return nil.
|
|
|
|
|
DerefItab(sym *obj.LSym, offset int64) *obj.LSym
|
|
|
|
|
|
2016-12-08 13:49:51 -08:00
|
|
|
// Line returns a string describing the given position.
|
2016-12-15 17:17:01 -08:00
|
|
|
Line(src.XPos) string
|
2016-10-03 12:26:25 -07:00
|
|
|
|
|
|
|
|
// AllocFrame assigns frame offsets to all live auto variables.
|
|
|
|
|
AllocFrame(f *Func)
|
2016-10-13 06:57:00 -04:00
|
|
|
|
|
|
|
|
// Syslook returns a symbol of the runtime function/variable with the
|
|
|
|
|
// given name.
|
2017-02-06 13:30:40 -08:00
|
|
|
Syslook(string) *obj.LSym
|
2017-02-05 23:43:31 -05:00
|
|
|
|
|
|
|
|
// UseWriteBarrier returns whether write barrier is enabled
|
|
|
|
|
UseWriteBarrier() bool
|
2015-10-22 14:22:38 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-01 23:21:55 +00:00
|
|
|
// interface used to hold *gc.Node. We'd use *gc.Node directly but
|
2015-10-22 14:22:38 -07:00
|
|
|
// that would lead to an import cycle.
|
|
|
|
|
type GCNode interface {
|
|
|
|
|
Typ() Type
|
|
|
|
|
String() string
|
2015-08-10 12:15:52 -07:00
|
|
|
}
|
|
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
// NewConfig returns a new configuration object for the given architecture.
|
2016-01-27 16:47:23 -08:00
|
|
|
func NewConfig(arch string, fe Frontend, ctxt *obj.Link, optimize bool) *Config {
|
2015-05-27 14:52:22 -07:00
|
|
|
c := &Config{arch: arch, fe: fe}
|
2015-04-15 15:51:25 -07:00
|
|
|
switch arch {
|
|
|
|
|
case "amd64":
|
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled
For integer types less than a machine register, we have to decide
what the invariants are for the high bits of the register. We used
to set the high bits to the correct extension (sign or zero, as
determined by the type) of the low bits.
This CL makes the compiler ignore the high bits of the register
altogether (they are junk).
On this plus side, this means ops that generate subword results don't
have to worry about correctly extending them. On the minus side,
ops that consume subword arguments have to deal with the input
registers not being correctly extended.
For x86, this tradeoff is probably worth it. Almost all opcodes
have versions that use only the correct subword piece of their
inputs. (The one big exception is array indexing.) Not many opcodes
can correctly sign extend on output.
For other architectures, the tradeoff is probably not so clear, as
they don't have many subword-safe opcodes (e.g. 16-bit compare,
ignoring the high 16/48 bits). Fortunately we can decide whether
we do this per-architecture.
For the machine-independent opcodes, we pretend that the "register"
size is equal to the type width, so sign extension is immaterial.
Opcodes that care about the signedness of the input (e.g. compare,
right shift) have two different variants.
Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
Reviewed-on: https://go-review.googlesource.com/12600
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
|
|
|
c.IntSize = 8
|
2015-07-19 15:48:20 -07:00
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2015-06-06 16:03:33 -07:00
|
|
|
c.lowerBlock = rewriteBlockAMD64
|
|
|
|
|
c.lowerValue = rewriteValueAMD64
|
2016-03-21 22:57:26 -07:00
|
|
|
c.registers = registersAMD64[:]
|
2016-05-19 12:33:30 -04:00
|
|
|
c.gpRegMask = gpRegMaskAMD64
|
|
|
|
|
c.fpRegMask = fpRegMaskAMD64
|
|
|
|
|
c.FPReg = framepointerRegAMD64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegAMD64
|
2016-05-31 14:01:34 -04:00
|
|
|
c.hasGReg = false
|
2016-08-08 11:26:25 -07:00
|
|
|
case "amd64p32":
|
|
|
|
|
c.IntSize = 4
|
|
|
|
|
c.PtrSize = 4
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-08-08 11:26:25 -07:00
|
|
|
c.lowerBlock = rewriteBlockAMD64
|
|
|
|
|
c.lowerValue = rewriteValueAMD64
|
|
|
|
|
c.registers = registersAMD64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskAMD64
|
|
|
|
|
c.fpRegMask = fpRegMaskAMD64
|
|
|
|
|
c.FPReg = framepointerRegAMD64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegAMD64
|
2016-08-08 11:26:25 -07:00
|
|
|
c.hasGReg = false
|
|
|
|
|
c.noDuffDevice = true
|
2015-04-15 15:51:25 -07:00
|
|
|
case "386":
|
[dev.ssa] cmd/compile/internal/ssa: redo how sign extension is handled
For integer types less than a machine register, we have to decide
what the invariants are for the high bits of the register. We used
to set the high bits to the correct extension (sign or zero, as
determined by the type) of the low bits.
This CL makes the compiler ignore the high bits of the register
altogether (they are junk).
On this plus side, this means ops that generate subword results don't
have to worry about correctly extending them. On the minus side,
ops that consume subword arguments have to deal with the input
registers not being correctly extended.
For x86, this tradeoff is probably worth it. Almost all opcodes
have versions that use only the correct subword piece of their
inputs. (The one big exception is array indexing.) Not many opcodes
can correctly sign extend on output.
For other architectures, the tradeoff is probably not so clear, as
they don't have many subword-safe opcodes (e.g. 16-bit compare,
ignoring the high 16/48 bits). Fortunately we can decide whether
we do this per-architecture.
For the machine-independent opcodes, we pretend that the "register"
size is equal to the type width, so sign extension is immaterial.
Opcodes that care about the signedness of the input (e.g. compare,
right shift) have two different variants.
Change-Id: I465484c5734545ee697afe83bc8bf4b53bd9df8d
Reviewed-on: https://go-review.googlesource.com/12600
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-07-23 14:35:02 -07:00
|
|
|
c.IntSize = 4
|
2015-07-19 15:48:20 -07:00
|
|
|
c.PtrSize = 4
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 4
|
2016-07-13 13:43:08 -07:00
|
|
|
c.lowerBlock = rewriteBlock386
|
|
|
|
|
c.lowerValue = rewriteValue386
|
|
|
|
|
c.registers = registers386[:]
|
|
|
|
|
c.gpRegMask = gpRegMask386
|
|
|
|
|
c.fpRegMask = fpRegMask386
|
|
|
|
|
c.FPReg = framepointerReg386
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkReg386
|
2016-07-13 13:43:08 -07:00
|
|
|
c.hasGReg = false
|
2016-03-21 22:57:26 -07:00
|
|
|
case "arm":
|
|
|
|
|
c.IntSize = 4
|
|
|
|
|
c.PtrSize = 4
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 4
|
2016-03-21 22:57:26 -07:00
|
|
|
c.lowerBlock = rewriteBlockARM
|
|
|
|
|
c.lowerValue = rewriteValueARM
|
|
|
|
|
c.registers = registersARM[:]
|
2016-05-19 12:33:30 -04:00
|
|
|
c.gpRegMask = gpRegMaskARM
|
|
|
|
|
c.fpRegMask = fpRegMaskARM
|
|
|
|
|
c.FPReg = framepointerRegARM
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegARM
|
2016-05-31 14:01:34 -04:00
|
|
|
c.hasGReg = true
|
2016-07-21 12:42:49 -04:00
|
|
|
case "arm64":
|
|
|
|
|
c.IntSize = 8
|
|
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-07-21 12:42:49 -04:00
|
|
|
c.lowerBlock = rewriteBlockARM64
|
|
|
|
|
c.lowerValue = rewriteValueARM64
|
|
|
|
|
c.registers = registersARM64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskARM64
|
|
|
|
|
c.fpRegMask = fpRegMaskARM64
|
|
|
|
|
c.FPReg = framepointerRegARM64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegARM64
|
2016-07-21 12:42:49 -04:00
|
|
|
c.hasGReg = true
|
2016-09-09 08:13:16 -04:00
|
|
|
c.noDuffDevice = obj.GOOS == "darwin" // darwin linker cannot handle BR26 reloc with non-zero addend
|
2016-09-16 15:02:47 -07:00
|
|
|
case "ppc64":
|
|
|
|
|
c.OldArch = true
|
2016-10-18 23:50:42 +02:00
|
|
|
c.BigEndian = true
|
2016-09-16 15:02:47 -07:00
|
|
|
fallthrough
|
|
|
|
|
case "ppc64le":
|
2016-06-24 14:37:17 -05:00
|
|
|
c.IntSize = 8
|
|
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-06-24 14:37:17 -05:00
|
|
|
c.lowerBlock = rewriteBlockPPC64
|
|
|
|
|
c.lowerValue = rewriteValuePPC64
|
|
|
|
|
c.registers = registersPPC64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskPPC64
|
|
|
|
|
c.fpRegMask = fpRegMaskPPC64
|
|
|
|
|
c.FPReg = framepointerRegPPC64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegPPC64
|
2016-07-27 13:54:07 -07:00
|
|
|
c.noDuffDevice = true // TODO: Resolve PPC64 DuffDevice (has zero, but not copy)
|
2016-07-26 09:24:18 -07:00
|
|
|
c.hasGReg = true
|
2016-10-18 23:50:42 +02:00
|
|
|
case "mips64":
|
|
|
|
|
c.BigEndian = true
|
|
|
|
|
fallthrough
|
|
|
|
|
case "mips64le":
|
2016-08-19 16:35:36 -04:00
|
|
|
c.IntSize = 8
|
|
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-08-19 16:35:36 -04:00
|
|
|
c.lowerBlock = rewriteBlockMIPS64
|
|
|
|
|
c.lowerValue = rewriteValueMIPS64
|
|
|
|
|
c.registers = registersMIPS64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskMIPS64
|
|
|
|
|
c.fpRegMask = fpRegMaskMIPS64
|
2016-08-22 12:25:23 -04:00
|
|
|
c.specialRegMask = specialRegMaskMIPS64
|
2016-08-19 16:35:36 -04:00
|
|
|
c.FPReg = framepointerRegMIPS64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegMIPS64
|
2016-08-19 16:35:36 -04:00
|
|
|
c.hasGReg = true
|
2016-09-12 14:50:10 -04:00
|
|
|
case "s390x":
|
|
|
|
|
c.IntSize = 8
|
|
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-09-12 14:50:10 -04:00
|
|
|
c.lowerBlock = rewriteBlockS390X
|
|
|
|
|
c.lowerValue = rewriteValueS390X
|
|
|
|
|
c.registers = registersS390X[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskS390X
|
|
|
|
|
c.fpRegMask = fpRegMaskS390X
|
|
|
|
|
c.FPReg = framepointerRegS390X
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegS390X
|
2016-09-12 14:50:10 -04:00
|
|
|
c.hasGReg = true
|
|
|
|
|
c.noDuffDevice = true
|
2016-10-18 23:50:42 +02:00
|
|
|
c.BigEndian = true
|
|
|
|
|
case "mips":
|
|
|
|
|
c.BigEndian = true
|
|
|
|
|
fallthrough
|
|
|
|
|
case "mipsle":
|
|
|
|
|
c.IntSize = 4
|
|
|
|
|
c.PtrSize = 4
|
|
|
|
|
c.RegSize = 4
|
|
|
|
|
c.lowerBlock = rewriteBlockMIPS
|
|
|
|
|
c.lowerValue = rewriteValueMIPS
|
|
|
|
|
c.registers = registersMIPS[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskMIPS
|
|
|
|
|
c.fpRegMask = fpRegMaskMIPS
|
|
|
|
|
c.specialRegMask = specialRegMaskMIPS
|
|
|
|
|
c.FPReg = framepointerRegMIPS
|
|
|
|
|
c.LinkReg = linkRegMIPS
|
|
|
|
|
c.hasGReg = true
|
|
|
|
|
c.noDuffDevice = true
|
2015-04-15 15:51:25 -07:00
|
|
|
default:
|
2016-12-15 17:17:01 -08:00
|
|
|
fe.Fatalf(src.NoXPos, "arch %s not implemented", arch)
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
2015-10-22 13:07:38 -07:00
|
|
|
c.ctxt = ctxt
|
2016-01-27 16:47:23 -08:00
|
|
|
c.optimize = optimize
|
2016-09-09 08:13:16 -04:00
|
|
|
c.nacl = obj.GOOS == "nacl"
|
2015-04-15 15:51:25 -07:00
|
|
|
|
2016-07-07 10:49:43 -04:00
|
|
|
// Don't use Duff's device on Plan 9 AMD64, because floating
|
2016-03-03 19:45:24 +01:00
|
|
|
// point operations are not allowed in note handler.
|
2016-09-09 08:13:16 -04:00
|
|
|
if obj.GOOS == "plan9" && arch == "amd64" {
|
2016-03-03 19:45:24 +01:00
|
|
|
c.noDuffDevice = true
|
|
|
|
|
}
|
|
|
|
|
|
2016-07-07 10:49:43 -04:00
|
|
|
if c.nacl {
|
|
|
|
|
c.noDuffDevice = true // Don't use Duff's device on NaCl
|
|
|
|
|
|
2016-09-19 07:45:08 -04:00
|
|
|
// runtime call clobber R12 on nacl
|
2017-03-10 22:09:43 -08:00
|
|
|
opcodeTable[OpARMCALLudiv].reg.clobbers |= 1 << 12 // R12
|
2016-07-07 10:49:43 -04:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
// cutoff is compared with product of numblocks and numvalues,
|
|
|
|
|
// if product is smaller than cutoff, use old non-sparse method.
|
|
|
|
|
// cutoff == 0 implies all sparse.
|
|
|
|
|
// cutoff == -1 implies none sparse.
|
|
|
|
|
// Good cutoff values seem to be O(million) depending on constant factor cost of sparse.
|
|
|
|
|
// TODO: get this from a flag, not an environment variable
|
|
|
|
|
c.sparsePhiCutoff = 2500000 // 0 for testing. // 2500000 determined with crude experiments w/ make.bash
|
|
|
|
|
ev := os.Getenv("GO_SSA_PHI_LOC_CUTOFF")
|
|
|
|
|
if ev != "" {
|
|
|
|
|
v, err := strconv.ParseInt(ev, 10, 64)
|
|
|
|
|
if err != nil {
|
2016-12-15 17:17:01 -08:00
|
|
|
fe.Fatalf(src.NoXPos, "Environment variable GO_SSA_PHI_LOC_CUTOFF (value '%s') did not parse as a number", ev)
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
}
|
|
|
|
|
c.sparsePhiCutoff = uint64(v) // convert -1 to maxint, for never use sparse
|
|
|
|
|
}
|
|
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
return c
|
|
|
|
|
}
|
|
|
|
|
|
2016-07-26 11:51:33 -07:00
|
|
|
func (c *Config) Set387(b bool) {
|
2016-08-10 11:44:57 -07:00
|
|
|
c.NeedsFpScratch = b
|
2016-07-26 11:51:33 -07:00
|
|
|
c.use387 = b
|
|
|
|
|
}
|
|
|
|
|
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
func (c *Config) Frontend() Frontend { return c.fe }
|
|
|
|
|
func (c *Config) SparsePhiCutoff() uint64 { return c.sparsePhiCutoff }
|
2016-08-15 13:51:00 -07:00
|
|
|
func (c *Config) Ctxt() *obj.Link { return c.ctxt }
|
2015-07-22 13:13:53 -07:00
|
|
|
|
2016-12-15 17:17:01 -08:00
|
|
|
func (c *Config) Logf(msg string, args ...interface{}) { c.fe.Logf(msg, args...) }
|
|
|
|
|
func (c *Config) Log() bool { return c.fe.Log() }
|
|
|
|
|
func (c *Config) Fatalf(pos src.XPos, msg string, args ...interface{}) { c.fe.Fatalf(pos, msg, args...) }
|
2017-02-10 10:15:10 -05:00
|
|
|
func (c *Config) Error(pos src.XPos, msg string, args ...interface{}) { c.fe.Error(pos, msg, args...) }
|
2016-12-15 17:17:01 -08:00
|
|
|
func (c *Config) Warnl(pos src.XPos, msg string, args ...interface{}) { c.fe.Warnl(pos, msg, args...) }
|
|
|
|
|
func (c *Config) Debug_checknil() bool { return c.fe.Debug_checknil() }
|
|
|
|
|
func (c *Config) Debug_wb() bool { return c.fe.Debug_wb() }
|