2015-04-15 15:51:25 -07:00
|
|
|
// Copyright 2015 The Go Authors. All rights reserved.
|
|
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
package ssa
|
|
|
|
|
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
import (
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
"cmd/compile/internal/types"
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
"cmd/internal/obj"
|
2017-04-18 12:53:25 -07:00
|
|
|
"cmd/internal/objabi"
|
2016-12-06 17:08:06 -08:00
|
|
|
"cmd/internal/src"
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
"os"
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
"strconv"
|
[dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 13:10:51 -05:00
|
|
|
)
|
2015-08-24 02:16:19 -07:00
|
|
|
|
2017-03-14 16:44:48 -07:00
|
|
|
// A Config holds readonly compilation information.
|
|
|
|
|
// It is created once, early during compilation,
|
|
|
|
|
// and shared across all compilations.
|
2015-04-15 15:51:25 -07:00
|
|
|
type Config struct {
|
2017-03-17 16:04:46 -07:00
|
|
|
arch string // "amd64", etc.
|
2017-04-21 18:44:34 -07:00
|
|
|
PtrSize int64 // 4 or 8; copy of cmd/internal/sys.Arch.PtrSize
|
|
|
|
|
RegSize int64 // 4 or 8; copy of cmd/internal/sys.Arch.RegSize
|
2017-03-17 16:04:46 -07:00
|
|
|
Types Types
|
2017-03-17 10:50:20 -07:00
|
|
|
lowerBlock blockRewriter // lowering function
|
|
|
|
|
lowerValue valueRewriter // lowering function
|
|
|
|
|
registers []Register // machine registers
|
|
|
|
|
gpRegMask regMask // general purpose integer register mask
|
|
|
|
|
fpRegMask regMask // floating point register mask
|
|
|
|
|
specialRegMask regMask // special register mask
|
|
|
|
|
FPReg int8 // register number of frame pointer, -1 if not used
|
|
|
|
|
LinkReg int8 // register number of link register if it is a general purpose register, -1 if not used
|
|
|
|
|
hasGReg bool // has hardware g register
|
|
|
|
|
ctxt *obj.Link // Generic arch information
|
|
|
|
|
optimize bool // Do optimization
|
|
|
|
|
noDuffDevice bool // Don't use Duff's device
|
2017-08-25 01:56:50 +02:00
|
|
|
useSSE bool // Use SSE for non-float operations
|
2017-03-17 10:50:20 -07:00
|
|
|
nacl bool // GOOS=nacl
|
|
|
|
|
use387 bool // GO386=387
|
|
|
|
|
NeedsFpScratch bool // No direct move between GP and FP register sets
|
|
|
|
|
BigEndian bool //
|
|
|
|
|
sparsePhiCutoff uint64 // Sparse phi location algorithm used above this #blocks*#variables score
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
|
|
|
|
|
2017-03-17 10:50:20 -07:00
|
|
|
type (
|
|
|
|
|
blockRewriter func(*Block) bool
|
|
|
|
|
valueRewriter func(*Value) bool
|
|
|
|
|
)
|
|
|
|
|
|
2017-03-17 16:04:46 -07:00
|
|
|
type Types struct {
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
Bool *types.Type
|
|
|
|
|
Int8 *types.Type
|
|
|
|
|
Int16 *types.Type
|
|
|
|
|
Int32 *types.Type
|
|
|
|
|
Int64 *types.Type
|
|
|
|
|
UInt8 *types.Type
|
|
|
|
|
UInt16 *types.Type
|
|
|
|
|
UInt32 *types.Type
|
|
|
|
|
UInt64 *types.Type
|
|
|
|
|
Int *types.Type
|
|
|
|
|
Float32 *types.Type
|
|
|
|
|
Float64 *types.Type
|
|
|
|
|
Uintptr *types.Type
|
|
|
|
|
String *types.Type
|
|
|
|
|
BytePtr *types.Type // TODO: use unsafe.Pointer instead?
|
|
|
|
|
Int32Ptr *types.Type
|
|
|
|
|
UInt32Ptr *types.Type
|
|
|
|
|
IntPtr *types.Type
|
|
|
|
|
UintptrPtr *types.Type
|
|
|
|
|
Float32Ptr *types.Type
|
|
|
|
|
Float64Ptr *types.Type
|
|
|
|
|
BytePtrPtr *types.Type
|
2015-07-30 11:03:05 -07:00
|
|
|
}
|
|
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
type Logger interface {
|
2016-01-29 14:44:15 -05:00
|
|
|
// Logf logs a message from the compiler.
|
2015-06-24 14:03:39 -07:00
|
|
|
Logf(string, ...interface{})
|
2015-06-12 11:01:13 -07:00
|
|
|
|
2016-01-29 14:44:15 -05:00
|
|
|
// Log returns true if logging is not a no-op
|
|
|
|
|
// some logging calls account for more than a few heap allocations.
|
|
|
|
|
Log() bool
|
|
|
|
|
|
2015-06-12 11:01:13 -07:00
|
|
|
// Fatal reports a compiler error and exits.
|
2016-12-15 17:17:01 -08:00
|
|
|
Fatalf(pos src.XPos, msg string, args ...interface{})
|
2015-06-12 11:01:13 -07:00
|
|
|
|
2015-10-26 17:34:06 -04:00
|
|
|
// Warnl writes compiler messages in the form expected by "errorcheck" tests
|
2016-12-15 17:17:01 -08:00
|
|
|
Warnl(pos src.XPos, fmt_ string, args ...interface{})
|
2015-10-26 17:34:06 -04:00
|
|
|
|
2017-01-07 08:23:11 -08:00
|
|
|
// Forwards the Debug flags from gc
|
2015-10-26 17:34:06 -04:00
|
|
|
Debug_checknil() bool
|
2016-10-13 06:57:00 -04:00
|
|
|
Debug_wb() bool
|
2015-05-27 14:52:22 -07:00
|
|
|
}
|
|
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
type Frontend interface {
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
CanSSA(t *types.Type) bool
|
2017-03-17 16:04:46 -07:00
|
|
|
|
2015-08-10 12:15:52 -07:00
|
|
|
Logger
|
|
|
|
|
|
|
|
|
|
// StringData returns a symbol pointing to the given string's contents.
|
|
|
|
|
StringData(string) interface{} // returns *gc.Sym
|
2015-08-24 02:16:19 -07:00
|
|
|
|
|
|
|
|
// Auto returns a Node for an auto variable of the given type.
|
|
|
|
|
// The SSA compiler uses this function to allocate space for spills.
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
Auto(src.XPos, *types.Type) GCNode
|
2016-01-14 16:02:23 -08:00
|
|
|
|
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
|
|
|
// Given the name for a compound type, returns the name we should use
|
|
|
|
|
// for the parts of that compound type.
|
|
|
|
|
SplitString(LocalSlot) (LocalSlot, LocalSlot)
|
|
|
|
|
SplitInterface(LocalSlot) (LocalSlot, LocalSlot)
|
|
|
|
|
SplitSlice(LocalSlot) (LocalSlot, LocalSlot, LocalSlot)
|
|
|
|
|
SplitComplex(LocalSlot) (LocalSlot, LocalSlot)
|
2016-03-31 21:24:10 -07:00
|
|
|
SplitStruct(LocalSlot, int) LocalSlot
|
2016-10-30 21:10:03 -07:00
|
|
|
SplitArray(LocalSlot) LocalSlot // array must be length 1
|
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
|
|
|
SplitInt64(LocalSlot) (LocalSlot, LocalSlot) // returns (hi, lo)
|
cmd/compile: better job of naming compound types
Compound AUTO types weren't named previously. That was because live
variable analysis (plive.go) doesn't handle spilling to compound types.
It can't handle them because there is no valid place to put VARDEFs when
regalloc is spilling compound types.
compound types = multiword builtin types: complex, string, slice, and
interface.
Instead, we split named AUTOs into individual one-word variables. For
example, a string s gets split into a byte ptr s.ptr and an integer
s.len. Those two variables can be spilled to / restored from
independently. As a result, live variable analysis can handle them
because they are one-word objects.
This CL will change how AUTOs are described in DWARF information.
Consider the code:
func f(s string, i int) int {
x := s[i:i+5]
g()
return lookup(x)
}
The old compiler would spill x to two consecutive slots on the stack,
both named x (at offsets 0 and 8). The new compiler spills the pointer
of x to a slot named x.ptr. It doesn't spill x.len at all, as it is a
constant (5) and can be rematerialized for the call to lookup.
So compound objects may not be spilled in their entirety, and even if
they are they won't necessarily be contiguous. Such is the price of
optimization.
Re-enable live variable analysis tests. One test remains disabled, it
fails because of #14904.
Change-Id: I8ef2b5ab91e43a0d2136bfc231c05d100ec0b801
Reviewed-on: https://go-review.googlesource.com/21233
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-28 11:25:17 -07:00
|
|
|
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
// DerefItab dereferences an itab function
|
|
|
|
|
// entry, given the symbol of the itab and
|
|
|
|
|
// the byte offset of the function pointer.
|
|
|
|
|
// It may return nil.
|
|
|
|
|
DerefItab(sym *obj.LSym, offset int64) *obj.LSym
|
|
|
|
|
|
2016-12-08 13:49:51 -08:00
|
|
|
// Line returns a string describing the given position.
|
2016-12-15 17:17:01 -08:00
|
|
|
Line(src.XPos) string
|
2016-10-03 12:26:25 -07:00
|
|
|
|
|
|
|
|
// AllocFrame assigns frame offsets to all live auto variables.
|
|
|
|
|
AllocFrame(f *Func)
|
2016-10-13 06:57:00 -04:00
|
|
|
|
|
|
|
|
// Syslook returns a symbol of the runtime function/variable with the
|
|
|
|
|
// given name.
|
2017-02-06 13:30:40 -08:00
|
|
|
Syslook(string) *obj.LSym
|
2017-02-05 23:43:31 -05:00
|
|
|
|
|
|
|
|
// UseWriteBarrier returns whether write barrier is enabled
|
|
|
|
|
UseWriteBarrier() bool
|
2015-10-22 14:22:38 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-01 23:21:55 +00:00
|
|
|
// interface used to hold *gc.Node. We'd use *gc.Node directly but
|
2015-10-22 14:22:38 -07:00
|
|
|
// that would lead to an import cycle.
|
|
|
|
|
type GCNode interface {
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
Typ() *types.Type
|
2015-10-22 14:22:38 -07:00
|
|
|
String() string
|
2015-08-10 12:15:52 -07:00
|
|
|
}
|
|
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
// NewConfig returns a new configuration object for the given architecture.
|
2017-03-17 16:04:46 -07:00
|
|
|
func NewConfig(arch string, types Types, ctxt *obj.Link, optimize bool) *Config {
|
|
|
|
|
c := &Config{arch: arch, Types: types}
|
2015-04-15 15:51:25 -07:00
|
|
|
switch arch {
|
|
|
|
|
case "amd64":
|
2015-07-19 15:48:20 -07:00
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2015-06-06 16:03:33 -07:00
|
|
|
c.lowerBlock = rewriteBlockAMD64
|
|
|
|
|
c.lowerValue = rewriteValueAMD64
|
2016-03-21 22:57:26 -07:00
|
|
|
c.registers = registersAMD64[:]
|
2016-05-19 12:33:30 -04:00
|
|
|
c.gpRegMask = gpRegMaskAMD64
|
|
|
|
|
c.fpRegMask = fpRegMaskAMD64
|
|
|
|
|
c.FPReg = framepointerRegAMD64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegAMD64
|
2016-05-31 14:01:34 -04:00
|
|
|
c.hasGReg = false
|
2016-08-08 11:26:25 -07:00
|
|
|
case "amd64p32":
|
|
|
|
|
c.PtrSize = 4
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-08-08 11:26:25 -07:00
|
|
|
c.lowerBlock = rewriteBlockAMD64
|
|
|
|
|
c.lowerValue = rewriteValueAMD64
|
|
|
|
|
c.registers = registersAMD64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskAMD64
|
|
|
|
|
c.fpRegMask = fpRegMaskAMD64
|
|
|
|
|
c.FPReg = framepointerRegAMD64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegAMD64
|
2016-08-08 11:26:25 -07:00
|
|
|
c.hasGReg = false
|
|
|
|
|
c.noDuffDevice = true
|
2015-04-15 15:51:25 -07:00
|
|
|
case "386":
|
2015-07-19 15:48:20 -07:00
|
|
|
c.PtrSize = 4
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 4
|
2016-07-13 13:43:08 -07:00
|
|
|
c.lowerBlock = rewriteBlock386
|
|
|
|
|
c.lowerValue = rewriteValue386
|
|
|
|
|
c.registers = registers386[:]
|
|
|
|
|
c.gpRegMask = gpRegMask386
|
|
|
|
|
c.fpRegMask = fpRegMask386
|
|
|
|
|
c.FPReg = framepointerReg386
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkReg386
|
2016-07-13 13:43:08 -07:00
|
|
|
c.hasGReg = false
|
2016-03-21 22:57:26 -07:00
|
|
|
case "arm":
|
|
|
|
|
c.PtrSize = 4
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 4
|
2016-03-21 22:57:26 -07:00
|
|
|
c.lowerBlock = rewriteBlockARM
|
|
|
|
|
c.lowerValue = rewriteValueARM
|
|
|
|
|
c.registers = registersARM[:]
|
2016-05-19 12:33:30 -04:00
|
|
|
c.gpRegMask = gpRegMaskARM
|
|
|
|
|
c.fpRegMask = fpRegMaskARM
|
|
|
|
|
c.FPReg = framepointerRegARM
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegARM
|
2016-05-31 14:01:34 -04:00
|
|
|
c.hasGReg = true
|
2016-07-21 12:42:49 -04:00
|
|
|
case "arm64":
|
|
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-07-21 12:42:49 -04:00
|
|
|
c.lowerBlock = rewriteBlockARM64
|
|
|
|
|
c.lowerValue = rewriteValueARM64
|
|
|
|
|
c.registers = registersARM64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskARM64
|
|
|
|
|
c.fpRegMask = fpRegMaskARM64
|
|
|
|
|
c.FPReg = framepointerRegARM64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegARM64
|
2016-07-21 12:42:49 -04:00
|
|
|
c.hasGReg = true
|
2017-04-18 12:53:25 -07:00
|
|
|
c.noDuffDevice = objabi.GOOS == "darwin" // darwin linker cannot handle BR26 reloc with non-zero addend
|
2016-09-16 15:02:47 -07:00
|
|
|
case "ppc64":
|
2016-10-18 23:50:42 +02:00
|
|
|
c.BigEndian = true
|
2016-09-16 15:02:47 -07:00
|
|
|
fallthrough
|
|
|
|
|
case "ppc64le":
|
2016-06-24 14:37:17 -05:00
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-06-24 14:37:17 -05:00
|
|
|
c.lowerBlock = rewriteBlockPPC64
|
|
|
|
|
c.lowerValue = rewriteValuePPC64
|
|
|
|
|
c.registers = registersPPC64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskPPC64
|
|
|
|
|
c.fpRegMask = fpRegMaskPPC64
|
|
|
|
|
c.FPReg = framepointerRegPPC64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegPPC64
|
2016-07-27 13:54:07 -07:00
|
|
|
c.noDuffDevice = true // TODO: Resolve PPC64 DuffDevice (has zero, but not copy)
|
2016-07-26 09:24:18 -07:00
|
|
|
c.hasGReg = true
|
2016-10-18 23:50:42 +02:00
|
|
|
case "mips64":
|
|
|
|
|
c.BigEndian = true
|
|
|
|
|
fallthrough
|
|
|
|
|
case "mips64le":
|
2016-08-19 16:35:36 -04:00
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-08-19 16:35:36 -04:00
|
|
|
c.lowerBlock = rewriteBlockMIPS64
|
|
|
|
|
c.lowerValue = rewriteValueMIPS64
|
|
|
|
|
c.registers = registersMIPS64[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskMIPS64
|
|
|
|
|
c.fpRegMask = fpRegMaskMIPS64
|
2016-08-22 12:25:23 -04:00
|
|
|
c.specialRegMask = specialRegMaskMIPS64
|
2016-08-19 16:35:36 -04:00
|
|
|
c.FPReg = framepointerRegMIPS64
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegMIPS64
|
2016-08-19 16:35:36 -04:00
|
|
|
c.hasGReg = true
|
2016-09-12 14:50:10 -04:00
|
|
|
case "s390x":
|
|
|
|
|
c.PtrSize = 8
|
2016-09-28 10:20:24 -04:00
|
|
|
c.RegSize = 8
|
2016-09-12 14:50:10 -04:00
|
|
|
c.lowerBlock = rewriteBlockS390X
|
|
|
|
|
c.lowerValue = rewriteValueS390X
|
|
|
|
|
c.registers = registersS390X[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskS390X
|
|
|
|
|
c.fpRegMask = fpRegMaskS390X
|
|
|
|
|
c.FPReg = framepointerRegS390X
|
2016-10-06 15:06:45 -04:00
|
|
|
c.LinkReg = linkRegS390X
|
2016-09-12 14:50:10 -04:00
|
|
|
c.hasGReg = true
|
|
|
|
|
c.noDuffDevice = true
|
2016-10-18 23:50:42 +02:00
|
|
|
c.BigEndian = true
|
|
|
|
|
case "mips":
|
|
|
|
|
c.BigEndian = true
|
|
|
|
|
fallthrough
|
|
|
|
|
case "mipsle":
|
|
|
|
|
c.PtrSize = 4
|
|
|
|
|
c.RegSize = 4
|
|
|
|
|
c.lowerBlock = rewriteBlockMIPS
|
|
|
|
|
c.lowerValue = rewriteValueMIPS
|
|
|
|
|
c.registers = registersMIPS[:]
|
|
|
|
|
c.gpRegMask = gpRegMaskMIPS
|
|
|
|
|
c.fpRegMask = fpRegMaskMIPS
|
|
|
|
|
c.specialRegMask = specialRegMaskMIPS
|
|
|
|
|
c.FPReg = framepointerRegMIPS
|
|
|
|
|
c.LinkReg = linkRegMIPS
|
|
|
|
|
c.hasGReg = true
|
|
|
|
|
c.noDuffDevice = true
|
2015-04-15 15:51:25 -07:00
|
|
|
default:
|
2017-03-16 22:42:10 -07:00
|
|
|
ctxt.Diag("arch %s not implemented", arch)
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
2015-10-22 13:07:38 -07:00
|
|
|
c.ctxt = ctxt
|
2016-01-27 16:47:23 -08:00
|
|
|
c.optimize = optimize
|
2017-04-18 12:53:25 -07:00
|
|
|
c.nacl = objabi.GOOS == "nacl"
|
2017-08-25 01:56:50 +02:00
|
|
|
c.useSSE = true
|
2015-04-15 15:51:25 -07:00
|
|
|
|
2017-08-25 01:56:50 +02:00
|
|
|
// Don't use Duff's device nor SSE on Plan 9 AMD64, because
|
|
|
|
|
// floating point operations are not allowed in note handler.
|
2017-04-18 12:53:25 -07:00
|
|
|
if objabi.GOOS == "plan9" && arch == "amd64" {
|
2016-03-03 19:45:24 +01:00
|
|
|
c.noDuffDevice = true
|
2017-08-25 01:56:50 +02:00
|
|
|
c.useSSE = false
|
2016-03-03 19:45:24 +01:00
|
|
|
}
|
|
|
|
|
|
2016-07-07 10:49:43 -04:00
|
|
|
if c.nacl {
|
|
|
|
|
c.noDuffDevice = true // Don't use Duff's device on NaCl
|
|
|
|
|
|
2016-09-19 07:45:08 -04:00
|
|
|
// runtime call clobber R12 on nacl
|
2017-03-10 22:09:43 -08:00
|
|
|
opcodeTable[OpARMCALLudiv].reg.clobbers |= 1 << 12 // R12
|
2016-07-07 10:49:43 -04:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
// cutoff is compared with product of numblocks and numvalues,
|
|
|
|
|
// if product is smaller than cutoff, use old non-sparse method.
|
|
|
|
|
// cutoff == 0 implies all sparse.
|
|
|
|
|
// cutoff == -1 implies none sparse.
|
|
|
|
|
// Good cutoff values seem to be O(million) depending on constant factor cost of sparse.
|
|
|
|
|
// TODO: get this from a flag, not an environment variable
|
|
|
|
|
c.sparsePhiCutoff = 2500000 // 0 for testing. // 2500000 determined with crude experiments w/ make.bash
|
|
|
|
|
ev := os.Getenv("GO_SSA_PHI_LOC_CUTOFF")
|
|
|
|
|
if ev != "" {
|
|
|
|
|
v, err := strconv.ParseInt(ev, 10, 64)
|
|
|
|
|
if err != nil {
|
2017-03-16 22:42:10 -07:00
|
|
|
ctxt.Diag("Environment variable GO_SSA_PHI_LOC_CUTOFF (value '%s') did not parse as a number", ev)
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
}
|
|
|
|
|
c.sparsePhiCutoff = uint64(v) // convert -1 to maxint, for never use sparse
|
|
|
|
|
}
|
|
|
|
|
|
2015-04-15 15:51:25 -07:00
|
|
|
return c
|
|
|
|
|
}
|
|
|
|
|
|
2016-07-26 11:51:33 -07:00
|
|
|
func (c *Config) Set387(b bool) {
|
2016-08-10 11:44:57 -07:00
|
|
|
c.NeedsFpScratch = b
|
2016-07-26 11:51:33 -07:00
|
|
|
c.use387 = b
|
|
|
|
|
}
|
|
|
|
|
|
cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.
Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)
Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs. This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.
Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.
Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:
#blocks #vars product
max 6171 28180 173,898,780
99.9% 1641 6548 10,401,878
99% 463 1909 873,721
95% 152 639 95,235
90% 84 359 30,021
The old algorithm is indeed usually fastest, for 99%ile
values of usually.
The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.
With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 4m35.200s
user 13m16.644s
sys 0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles
With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
github.com/gogo/protobuf/test/theproto3/combos/...
...
real 10m36.569s
user 25m52.286s
sys 4m3.696s
and pprof reports 8.3GB allocated in the same larger profile
With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).
Fixes #15537.
Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 13:24:58 -04:00
|
|
|
func (c *Config) SparsePhiCutoff() uint64 { return c.sparsePhiCutoff }
|
2016-08-15 13:51:00 -07:00
|
|
|
func (c *Config) Ctxt() *obj.Link { return c.ctxt }
|