2015-03-23 17:02:11 -07:00
|
|
|
// Copyright 2015 The Go Authors. All rights reserved.
|
|
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
package ssa
|
|
|
|
|
|
2015-09-04 06:33:56 -05:00
|
|
|
import (
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
"cmd/compile/internal/types"
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
"cmd/internal/obj"
|
2015-09-04 06:33:56 -05:00
|
|
|
"fmt"
|
2017-04-22 18:59:11 -07:00
|
|
|
"io"
|
2015-09-04 06:33:56 -05:00
|
|
|
"math"
|
2016-05-24 15:43:25 -07:00
|
|
|
"os"
|
|
|
|
|
"path/filepath"
|
2015-09-04 06:33:56 -05:00
|
|
|
)
|
2015-03-23 17:02:11 -07:00
|
|
|
|
2017-03-17 10:50:20 -07:00
|
|
|
func applyRewrite(f *Func, rb blockRewriter, rv valueRewriter) {
|
2015-03-23 17:02:11 -07:00
|
|
|
// repeat rewrites until we find no more rewrites
|
|
|
|
|
for {
|
|
|
|
|
change := false
|
|
|
|
|
for _, b := range f.Blocks {
|
2015-05-28 16:45:33 -07:00
|
|
|
if b.Control != nil && b.Control.Op == OpCopy {
|
|
|
|
|
for b.Control.Op == OpCopy {
|
2016-03-15 20:45:50 -07:00
|
|
|
b.SetControl(b.Control.Args[0])
|
2015-05-28 16:45:33 -07:00
|
|
|
}
|
|
|
|
|
}
|
2017-03-17 10:50:20 -07:00
|
|
|
if rb(b) {
|
2015-05-28 16:45:33 -07:00
|
|
|
change = true
|
|
|
|
|
}
|
2015-03-23 17:02:11 -07:00
|
|
|
for _, v := range b.Values {
|
2016-04-21 10:11:33 +02:00
|
|
|
change = phielimValue(v) || change
|
|
|
|
|
|
2016-04-11 21:23:11 -07:00
|
|
|
// Eliminate copy inputs.
|
|
|
|
|
// If any copy input becomes unused, mark it
|
|
|
|
|
// as invalid and discard its argument. Repeat
|
|
|
|
|
// recursively on the discarded argument.
|
|
|
|
|
// This phase helps remove phantom "dead copy" uses
|
|
|
|
|
// of a value so that a x.Uses==1 rule condition
|
|
|
|
|
// fires reliably.
|
|
|
|
|
for i, a := range v.Args {
|
|
|
|
|
if a.Op != OpCopy {
|
|
|
|
|
continue
|
|
|
|
|
}
|
2016-04-27 16:58:50 -07:00
|
|
|
v.SetArg(i, copySource(a))
|
2016-04-11 21:23:11 -07:00
|
|
|
change = true
|
|
|
|
|
for a.Uses == 0 {
|
|
|
|
|
b := a.Args[0]
|
|
|
|
|
a.reset(OpInvalid)
|
|
|
|
|
a = b
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2015-05-18 16:44:20 -07:00
|
|
|
// apply rewrite function
|
2017-03-17 10:50:20 -07:00
|
|
|
if rv(v) {
|
2015-03-23 17:02:11 -07:00
|
|
|
change = true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if !change {
|
2016-04-11 21:23:11 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
}
|
2016-04-20 15:02:48 -07:00
|
|
|
// remove clobbered values
|
2016-04-11 21:23:11 -07:00
|
|
|
for _, b := range f.Blocks {
|
|
|
|
|
j := 0
|
|
|
|
|
for i, v := range b.Values {
|
|
|
|
|
if v.Op == OpInvalid {
|
|
|
|
|
f.freeValue(v)
|
|
|
|
|
continue
|
|
|
|
|
}
|
|
|
|
|
if i != j {
|
|
|
|
|
b.Values[j] = v
|
|
|
|
|
}
|
|
|
|
|
j++
|
|
|
|
|
}
|
|
|
|
|
if j != len(b.Values) {
|
|
|
|
|
tail := b.Values[j:]
|
|
|
|
|
for j := range tail {
|
|
|
|
|
tail[j] = nil
|
|
|
|
|
}
|
|
|
|
|
b.Values = b.Values[:j]
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Common functions called from rewriting rules
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is64BitFloat(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 8 && t.IsFloat()
|
2015-08-12 16:38:11 -04:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is32BitFloat(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 4 && t.IsFloat()
|
2015-08-12 16:38:11 -04:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is64BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 8 && t.IsInteger()
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is32BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 4 && t.IsInteger()
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is16BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 2 && t.IsInteger()
|
2015-06-14 11:38:46 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is8BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 1 && t.IsInteger()
|
2015-06-14 11:38:46 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func isPtr(t *types.Type) bool {
|
2016-03-28 10:55:44 -07:00
|
|
|
return t.IsPtrShaped()
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func isSigned(t *types.Type) bool {
|
2015-04-15 15:51:25 -07:00
|
|
|
return t.IsSigned()
|
2015-03-26 10:49:03 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-01 23:21:55 +00:00
|
|
|
// mergeSym merges two symbolic offsets. There is no real merging of
|
2015-08-23 21:14:25 -07:00
|
|
|
// offsets, we just pick the non-nil one.
|
2015-06-19 21:02:28 -07:00
|
|
|
func mergeSym(x, y interface{}) interface{} {
|
|
|
|
|
if x == nil {
|
|
|
|
|
return y
|
|
|
|
|
}
|
|
|
|
|
if y == nil {
|
|
|
|
|
return x
|
|
|
|
|
}
|
|
|
|
|
panic(fmt.Sprintf("mergeSym with two non-nil syms %s %s", x, y))
|
|
|
|
|
}
|
2015-08-23 21:14:25 -07:00
|
|
|
func canMergeSym(x, y interface{}) bool {
|
|
|
|
|
return x == nil || y == nil
|
|
|
|
|
}
|
2015-06-19 21:02:28 -07:00
|
|
|
|
2016-09-14 10:42:14 -04:00
|
|
|
// canMergeLoad reports whether the load can be merged into target without
|
|
|
|
|
// invalidating the schedule.
|
2017-03-18 11:16:30 -07:00
|
|
|
// It also checks that the other non-load argument x is something we
|
|
|
|
|
// are ok with clobbering (all our current load+op instructions clobber
|
|
|
|
|
// their input register).
|
|
|
|
|
func canMergeLoad(target, load, x *Value) bool {
|
2016-09-14 10:42:14 -04:00
|
|
|
if target.Block.ID != load.Block.ID {
|
|
|
|
|
// If the load is in a different block do not merge it.
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-03-18 11:16:30 -07:00
|
|
|
|
|
|
|
|
// We can't merge the load into the target if the load
|
|
|
|
|
// has more than one use.
|
|
|
|
|
if load.Uses != 1 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// The register containing x is going to get clobbered.
|
|
|
|
|
// Don't merge if we still need the value of x.
|
|
|
|
|
// We don't have liveness information here, but we can
|
|
|
|
|
// approximate x dying with:
|
|
|
|
|
// 1) target is x's only use.
|
|
|
|
|
// 2) target is not in a deeper loop than x.
|
|
|
|
|
if x.Uses != 1 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
loopnest := x.Block.Func.loopnest()
|
|
|
|
|
loopnest.calculateDepths()
|
|
|
|
|
if loopnest.depth(target.Block.ID) > loopnest.depth(x.Block.ID) {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
2017-03-03 13:44:18 -08:00
|
|
|
mem := load.MemoryArg()
|
2016-09-14 10:42:14 -04:00
|
|
|
|
|
|
|
|
// We need the load's memory arg to still be alive at target. That
|
|
|
|
|
// can't be the case if one of target's args depends on a memory
|
|
|
|
|
// state that is a successor of load's memory arg.
|
|
|
|
|
//
|
|
|
|
|
// For example, it would be invalid to merge load into target in
|
|
|
|
|
// the following situation because newmem has killed oldmem
|
|
|
|
|
// before target is reached:
|
|
|
|
|
// load = read ... oldmem
|
|
|
|
|
// newmem = write ... oldmem
|
|
|
|
|
// arg0 = read ... newmem
|
|
|
|
|
// target = add arg0 load
|
|
|
|
|
//
|
|
|
|
|
// If the argument comes from a different block then we can exclude
|
|
|
|
|
// it immediately because it must dominate load (which is in the
|
|
|
|
|
// same block as target).
|
|
|
|
|
var args []*Value
|
|
|
|
|
for _, a := range target.Args {
|
|
|
|
|
if a != load && a.Block.ID == target.Block.ID {
|
|
|
|
|
args = append(args, a)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// memPreds contains memory states known to be predecessors of load's
|
|
|
|
|
// memory state. It is lazily initialized.
|
|
|
|
|
var memPreds map[*Value]bool
|
|
|
|
|
search:
|
|
|
|
|
for i := 0; len(args) > 0; i++ {
|
|
|
|
|
const limit = 100
|
|
|
|
|
if i >= limit {
|
|
|
|
|
// Give up if we have done a lot of iterations.
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
v := args[len(args)-1]
|
|
|
|
|
args = args[:len(args)-1]
|
|
|
|
|
if target.Block.ID != v.Block.ID {
|
|
|
|
|
// Since target and load are in the same block
|
|
|
|
|
// we can stop searching when we leave the block.
|
|
|
|
|
continue search
|
|
|
|
|
}
|
|
|
|
|
if v.Op == OpPhi {
|
|
|
|
|
// A Phi implies we have reached the top of the block.
|
2017-06-06 15:25:29 -07:00
|
|
|
// The memory phi, if it exists, is always
|
|
|
|
|
// the first logical store in the block.
|
2016-09-14 10:42:14 -04:00
|
|
|
continue search
|
|
|
|
|
}
|
|
|
|
|
if v.Type.IsTuple() && v.Type.FieldType(1).IsMemory() {
|
|
|
|
|
// We could handle this situation however it is likely
|
|
|
|
|
// to be very rare.
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
if v.Type.IsMemory() {
|
|
|
|
|
if memPreds == nil {
|
|
|
|
|
// Initialise a map containing memory states
|
|
|
|
|
// known to be predecessors of load's memory
|
|
|
|
|
// state.
|
|
|
|
|
memPreds = make(map[*Value]bool)
|
|
|
|
|
m := mem
|
|
|
|
|
const limit = 50
|
|
|
|
|
for i := 0; i < limit; i++ {
|
|
|
|
|
if m.Op == OpPhi {
|
2017-06-06 15:25:29 -07:00
|
|
|
// The memory phi, if it exists, is always
|
|
|
|
|
// the first logical store in the block.
|
2016-09-14 10:42:14 -04:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
if m.Block.ID != target.Block.ID {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
if !m.Type.IsMemory() {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
memPreds[m] = true
|
|
|
|
|
if len(m.Args) == 0 {
|
|
|
|
|
break
|
|
|
|
|
}
|
2017-03-03 13:44:18 -08:00
|
|
|
m = m.MemoryArg()
|
2016-09-14 10:42:14 -04:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// We can merge if v is a predecessor of mem.
|
|
|
|
|
//
|
|
|
|
|
// For example, we can merge load into target in the
|
|
|
|
|
// following scenario:
|
|
|
|
|
// x = read ... v
|
|
|
|
|
// mem = write ... v
|
|
|
|
|
// load = read ... mem
|
|
|
|
|
// target = add x load
|
|
|
|
|
if memPreds[v] {
|
|
|
|
|
continue search
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
if len(v.Args) > 0 && v.Args[len(v.Args)-1] == mem {
|
|
|
|
|
// If v takes mem as an input then we know mem
|
|
|
|
|
// is valid at this point.
|
|
|
|
|
continue search
|
|
|
|
|
}
|
|
|
|
|
for _, a := range v.Args {
|
|
|
|
|
if target.Block.ID == a.Block.ID {
|
|
|
|
|
args = append(args, a)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2017-03-18 11:16:30 -07:00
|
|
|
|
2016-09-14 10:42:14 -04:00
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2016-08-26 15:41:51 -04:00
|
|
|
// isSameSym returns whether sym is the same as the given named symbol
|
|
|
|
|
func isSameSym(sym interface{}, name string) bool {
|
|
|
|
|
s, ok := sym.(fmt.Stringer)
|
|
|
|
|
return ok && s.String() == name
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-11 20:43:15 -06:00
|
|
|
// nlz returns the number of leading zeros.
|
|
|
|
|
func nlz(x int64) int64 {
|
|
|
|
|
// log2(0) == 1, so nlz(0) == 64
|
|
|
|
|
return 63 - log2(x)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ntz returns the number of trailing zeros.
|
|
|
|
|
func ntz(x int64) int64 {
|
|
|
|
|
return 64 - nlz(^x&(x-1))
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-09 05:01:26 +00:00
|
|
|
func oneBit(x int64) bool {
|
|
|
|
|
return nlz(x)+ntz(x) == 63
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-11 20:43:15 -06:00
|
|
|
// nlo returns the number of leading ones.
|
|
|
|
|
func nlo(x int64) int64 {
|
|
|
|
|
return nlz(^x)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// nto returns the number of trailing ones.
|
|
|
|
|
func nto(x int64) int64 {
|
|
|
|
|
return ntz(^x)
|
|
|
|
|
}
|
|
|
|
|
|
2017-02-13 16:00:09 -08:00
|
|
|
// log2 returns logarithm in base 2 of uint64(n), with log2(0) = -1.
|
2016-09-07 14:04:31 -07:00
|
|
|
// Rounds down.
|
2015-07-17 12:26:35 +02:00
|
|
|
func log2(n int64) (l int64) {
|
2016-02-11 20:43:15 -06:00
|
|
|
l = -1
|
|
|
|
|
x := uint64(n)
|
|
|
|
|
for ; x >= 0x8000; x >>= 16 {
|
|
|
|
|
l += 16
|
|
|
|
|
}
|
|
|
|
|
if x >= 0x80 {
|
|
|
|
|
x >>= 8
|
|
|
|
|
l += 8
|
|
|
|
|
}
|
|
|
|
|
if x >= 0x8 {
|
|
|
|
|
x >>= 4
|
|
|
|
|
l += 4
|
|
|
|
|
}
|
|
|
|
|
if x >= 0x2 {
|
|
|
|
|
x >>= 2
|
|
|
|
|
l += 2
|
|
|
|
|
}
|
|
|
|
|
if x >= 0x1 {
|
2015-07-17 12:26:35 +02:00
|
|
|
l++
|
|
|
|
|
}
|
2016-02-11 20:43:15 -06:00
|
|
|
return
|
2015-07-17 12:26:35 +02:00
|
|
|
}
|
|
|
|
|
|
2018-02-20 09:39:09 +01:00
|
|
|
// log2uint32 returns logarithm in base 2 of uint32(n), with log2(0) = -1.
|
|
|
|
|
// Rounds down.
|
|
|
|
|
func log2uint32(n int64) (l int64) {
|
|
|
|
|
return log2(int64(uint32(n)))
|
|
|
|
|
}
|
|
|
|
|
|
2015-07-25 12:53:58 -05:00
|
|
|
// isPowerOfTwo reports whether n is a power of 2.
|
2015-07-17 12:26:35 +02:00
|
|
|
func isPowerOfTwo(n int64) bool {
|
|
|
|
|
return n > 0 && n&(n-1) == 0
|
|
|
|
|
}
|
2015-07-25 12:53:58 -05:00
|
|
|
|
2018-02-20 09:39:09 +01:00
|
|
|
// isUint64PowerOfTwo reports whether uint64(n) is a power of 2.
|
|
|
|
|
func isUint64PowerOfTwo(in int64) bool {
|
|
|
|
|
n := uint64(in)
|
|
|
|
|
return n > 0 && n&(n-1) == 0
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// isUint32PowerOfTwo reports whether uint32(n) is a power of 2.
|
|
|
|
|
func isUint32PowerOfTwo(in int64) bool {
|
|
|
|
|
n := uint64(uint32(in))
|
|
|
|
|
return n > 0 && n&(n-1) == 0
|
|
|
|
|
}
|
|
|
|
|
|
2015-07-25 12:53:58 -05:00
|
|
|
// is32Bit reports whether n can be represented as a signed 32 bit integer.
|
|
|
|
|
func is32Bit(n int64) bool {
|
|
|
|
|
return n == int64(int32(n))
|
|
|
|
|
}
|
2015-09-03 18:24:22 -05:00
|
|
|
|
2016-07-06 13:32:52 -07:00
|
|
|
// is16Bit reports whether n can be represented as a signed 16 bit integer.
|
|
|
|
|
func is16Bit(n int64) bool {
|
|
|
|
|
return n == int64(int16(n))
|
|
|
|
|
}
|
|
|
|
|
|
2017-04-30 14:25:57 -04:00
|
|
|
// isU12Bit reports whether n can be represented as an unsigned 12 bit integer.
|
|
|
|
|
func isU12Bit(n int64) bool {
|
|
|
|
|
return 0 <= n && n < (1<<12)
|
|
|
|
|
}
|
|
|
|
|
|
2016-10-05 13:21:09 -07:00
|
|
|
// isU16Bit reports whether n can be represented as an unsigned 16 bit integer.
|
2016-09-26 10:06:10 -07:00
|
|
|
func isU16Bit(n int64) bool {
|
|
|
|
|
return n == int64(uint16(n))
|
|
|
|
|
}
|
|
|
|
|
|
2016-10-05 13:21:09 -07:00
|
|
|
// isU32Bit reports whether n can be represented as an unsigned 32 bit integer.
|
|
|
|
|
func isU32Bit(n int64) bool {
|
|
|
|
|
return n == int64(uint32(n))
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-12 14:50:10 -04:00
|
|
|
// is20Bit reports whether n can be represented as a signed 20 bit integer.
|
|
|
|
|
func is20Bit(n int64) bool {
|
|
|
|
|
return -(1<<19) <= n && n < (1<<19)
|
|
|
|
|
}
|
|
|
|
|
|
2015-09-03 18:24:22 -05:00
|
|
|
// b2i translates a boolean value to 0 or 1 for assigning to auxInt.
|
|
|
|
|
func b2i(b bool) int64 {
|
|
|
|
|
if b {
|
|
|
|
|
return 1
|
|
|
|
|
}
|
|
|
|
|
return 0
|
|
|
|
|
}
|
2015-09-04 06:33:56 -05:00
|
|
|
|
2018-04-26 20:56:03 -07:00
|
|
|
// shiftIsBounded reports whether (left/right) shift Value v is known to be bounded.
|
|
|
|
|
// A shift is bounded if it is shifting by less than the width of the shifted value.
|
|
|
|
|
func shiftIsBounded(v *Value) bool {
|
2018-04-29 17:40:47 -07:00
|
|
|
return v.AuxInt != 0
|
2018-04-26 20:56:03 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-11 19:36:54 -06:00
|
|
|
// i2f is used in rules for converting from an AuxInt to a float.
|
|
|
|
|
func i2f(i int64) float64 {
|
|
|
|
|
return math.Float64frombits(uint64(i))
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// i2f32 is used in rules for converting from an AuxInt to a float32.
|
|
|
|
|
func i2f32(i int64) float32 {
|
|
|
|
|
return float32(math.Float64frombits(uint64(i)))
|
|
|
|
|
}
|
|
|
|
|
|
2015-09-04 06:33:56 -05:00
|
|
|
// f2i is used in the rules for storing a float in AuxInt.
|
|
|
|
|
func f2i(f float64) int64 {
|
|
|
|
|
return int64(math.Float64bits(f))
|
|
|
|
|
}
|
2015-09-18 18:23:34 -07:00
|
|
|
|
2016-02-03 06:21:24 -05:00
|
|
|
// uaddOvf returns true if unsigned a+b would overflow.
|
|
|
|
|
func uaddOvf(a, b int64) bool {
|
|
|
|
|
return uint64(a)+uint64(b) < uint64(a)
|
|
|
|
|
}
|
|
|
|
|
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
// de-virtualize an InterCall
|
|
|
|
|
// 'sym' is the symbol for the itab
|
|
|
|
|
func devirt(v *Value, sym interface{}, offset int64) *obj.LSym {
|
|
|
|
|
f := v.Block.Func
|
2017-09-18 14:53:56 -07:00
|
|
|
n, ok := sym.(*obj.LSym)
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
if !ok {
|
|
|
|
|
return nil
|
|
|
|
|
}
|
2017-09-18 14:53:56 -07:00
|
|
|
lsym := f.fe.DerefItab(n, offset)
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
if f.pass.debug > 0 {
|
|
|
|
|
if lsym != nil {
|
2017-03-16 22:42:10 -07:00
|
|
|
f.Warnl(v.Pos, "de-virtualizing call")
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
} else {
|
2017-03-16 22:42:10 -07:00
|
|
|
f.Warnl(v.Pos, "couldn't de-virtualize call")
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return lsym
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-13 17:37:19 -06:00
|
|
|
// isSamePtr reports whether p1 and p2 point to the same address.
|
|
|
|
|
func isSamePtr(p1, p2 *Value) bool {
|
2016-02-24 12:58:47 -08:00
|
|
|
if p1 == p2 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
2016-03-04 18:55:09 -08:00
|
|
|
if p1.Op != p2.Op {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
switch p1.Op {
|
|
|
|
|
case OpOffPtr:
|
|
|
|
|
return p1.AuxInt == p2.AuxInt && isSamePtr(p1.Args[0], p2.Args[0])
|
|
|
|
|
case OpAddr:
|
|
|
|
|
// OpAddr's 0th arg is either OpSP or OpSB, which means that it is uniquely identified by its Op.
|
|
|
|
|
// Checking for value equality only works after [z]cse has run.
|
|
|
|
|
return p1.Aux == p2.Aux && p1.Args[0].Op == p2.Args[0].Op
|
|
|
|
|
case OpAddPtr:
|
|
|
|
|
return p1.Args[1] == p2.Args[1] && isSamePtr(p1.Args[0], p2.Args[0])
|
|
|
|
|
}
|
|
|
|
|
return false
|
2016-02-13 17:37:19 -06:00
|
|
|
}
|
|
|
|
|
|
2018-04-11 22:47:24 +01:00
|
|
|
// disjoint reports whether the memory region specified by [p1:p1+n1)
|
|
|
|
|
// does not overlap with [p2:p2+n2).
|
|
|
|
|
// A return value of false does not imply the regions overlap.
|
|
|
|
|
func disjoint(p1 *Value, n1 int64, p2 *Value, n2 int64) bool {
|
|
|
|
|
if n1 == 0 || n2 == 0 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if p1 == p2 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
baseAndOffset := func(ptr *Value) (base *Value, offset int64) {
|
|
|
|
|
base, offset = ptr, 0
|
|
|
|
|
if base.Op == OpOffPtr {
|
|
|
|
|
offset += base.AuxInt
|
|
|
|
|
base = base.Args[0]
|
|
|
|
|
}
|
|
|
|
|
return base, offset
|
|
|
|
|
}
|
|
|
|
|
p1, off1 := baseAndOffset(p1)
|
|
|
|
|
p2, off2 := baseAndOffset(p2)
|
|
|
|
|
if isSamePtr(p1, p2) {
|
|
|
|
|
return !overlap(off1, n1, off2, n2)
|
|
|
|
|
}
|
|
|
|
|
// p1 and p2 are not the same, so if they are both OpAddrs then
|
|
|
|
|
// they point to different variables.
|
|
|
|
|
// If one pointer is on the stack and the other is an argument
|
|
|
|
|
// then they can't overlap.
|
|
|
|
|
switch p1.Op {
|
|
|
|
|
case OpAddr:
|
|
|
|
|
if p2.Op == OpAddr || p2.Op == OpSP {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
return p2.Op == OpArg && p1.Args[0].Op == OpSP
|
|
|
|
|
case OpArg:
|
|
|
|
|
if p2.Op == OpSP {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
return p2.Op == OpAddr && p2.Args[0].Op == OpSP
|
|
|
|
|
case OpSP:
|
|
|
|
|
return p2.Op == OpAddr || p2.Op == OpArg || p2.Op == OpSP
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
2016-07-22 06:41:14 -04:00
|
|
|
// moveSize returns the number of bytes an aligned MOV instruction moves
|
|
|
|
|
func moveSize(align int64, c *Config) int64 {
|
|
|
|
|
switch {
|
2017-04-21 18:44:34 -07:00
|
|
|
case align%8 == 0 && c.PtrSize == 8:
|
2016-07-22 06:41:14 -04:00
|
|
|
return 8
|
|
|
|
|
case align%4 == 0:
|
|
|
|
|
return 4
|
|
|
|
|
case align%2 == 0:
|
|
|
|
|
return 2
|
|
|
|
|
}
|
|
|
|
|
return 1
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-28 21:45:33 -07:00
|
|
|
// mergePoint finds a block among a's blocks which dominates b and is itself
|
|
|
|
|
// dominated by all of a's blocks. Returns nil if it can't find one.
|
|
|
|
|
// Might return nil even if one does exist.
|
|
|
|
|
func mergePoint(b *Block, a ...*Value) *Block {
|
|
|
|
|
// Walk backward from b looking for one of the a's blocks.
|
|
|
|
|
|
|
|
|
|
// Max distance
|
|
|
|
|
d := 100
|
|
|
|
|
|
|
|
|
|
for d > 0 {
|
|
|
|
|
for _, x := range a {
|
|
|
|
|
if b == x.Block {
|
|
|
|
|
goto found
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if len(b.Preds) > 1 {
|
|
|
|
|
// Don't know which way to go back. Abort.
|
|
|
|
|
return nil
|
|
|
|
|
}
|
2016-04-28 16:52:47 -07:00
|
|
|
b = b.Preds[0].b
|
2016-03-28 21:45:33 -07:00
|
|
|
d--
|
|
|
|
|
}
|
|
|
|
|
return nil // too far away
|
|
|
|
|
found:
|
|
|
|
|
// At this point, r is the first value in a that we find by walking backwards.
|
|
|
|
|
// if we return anything, r will be it.
|
|
|
|
|
r := b
|
|
|
|
|
|
|
|
|
|
// Keep going, counting the other a's that we find. They must all dominate r.
|
|
|
|
|
na := 0
|
|
|
|
|
for d > 0 {
|
|
|
|
|
for _, x := range a {
|
|
|
|
|
if b == x.Block {
|
|
|
|
|
na++
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if na == len(a) {
|
|
|
|
|
// Found all of a in a backwards walk. We can return r.
|
|
|
|
|
return r
|
|
|
|
|
}
|
|
|
|
|
if len(b.Preds) > 1 {
|
|
|
|
|
return nil
|
|
|
|
|
}
|
2016-04-28 16:52:47 -07:00
|
|
|
b = b.Preds[0].b
|
2016-03-28 21:45:33 -07:00
|
|
|
d--
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
return nil // too far away
|
|
|
|
|
}
|
2016-04-20 15:02:48 -07:00
|
|
|
|
|
|
|
|
// clobber invalidates v. Returns true.
|
|
|
|
|
// clobber is used by rewrite rules to:
|
|
|
|
|
// A) make sure v is really dead and never used again.
|
|
|
|
|
// B) decrement use counts of v's args.
|
|
|
|
|
func clobber(v *Value) bool {
|
|
|
|
|
v.reset(OpInvalid)
|
|
|
|
|
// Note: leave v.Block intact. The Block field is used after clobber.
|
|
|
|
|
return true
|
|
|
|
|
}
|
2016-05-24 15:43:25 -07:00
|
|
|
|
2018-02-15 14:49:03 -05:00
|
|
|
// clobberIfDead resets v when use count is 1. Returns true.
|
|
|
|
|
// clobberIfDead is used by rewrite rules to decrement
|
|
|
|
|
// use counts of v's args when v is dead and never used.
|
|
|
|
|
func clobberIfDead(v *Value) bool {
|
|
|
|
|
if v.Uses == 1 {
|
|
|
|
|
v.reset(OpInvalid)
|
|
|
|
|
}
|
|
|
|
|
// Note: leave v.Block intact. The Block field is used after clobberIfDead.
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-16 15:02:47 -07:00
|
|
|
// noteRule is an easy way to track if a rule is matched when writing
|
|
|
|
|
// new ones. Make the rule of interest also conditional on
|
|
|
|
|
// noteRule("note to self: rule of interest matched")
|
|
|
|
|
// and that message will print when the rule matches.
|
|
|
|
|
func noteRule(s string) bool {
|
2016-10-25 05:45:52 -07:00
|
|
|
fmt.Println(s)
|
2016-09-16 15:02:47 -07:00
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-28 10:20:24 -04:00
|
|
|
// warnRule generates a compiler debug output with string s when
|
|
|
|
|
// cond is true and the rule is fired.
|
|
|
|
|
func warnRule(cond bool, v *Value, s string) bool {
|
|
|
|
|
if cond {
|
2017-03-16 22:42:10 -07:00
|
|
|
v.Block.Func.Warnl(v.Pos, s)
|
2016-09-28 10:20:24 -04:00
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-13 22:36:47 +00:00
|
|
|
// for a pseudo-op like (LessThan x), extract x
|
|
|
|
|
func flagArg(v *Value) *Value {
|
|
|
|
|
if len(v.Args) != 1 || !v.Args[0].Type.IsFlags() {
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
return v.Args[0]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// arm64Negate finds the complement to an ARM64 condition code,
|
|
|
|
|
// for example Equal -> NotEqual or LessThan -> GreaterEqual
|
|
|
|
|
//
|
|
|
|
|
// TODO: add floating-point conditions
|
|
|
|
|
func arm64Negate(op Op) Op {
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan:
|
|
|
|
|
return OpARM64GreaterEqual
|
|
|
|
|
case OpARM64LessThanU:
|
|
|
|
|
return OpARM64GreaterEqualU
|
|
|
|
|
case OpARM64GreaterThan:
|
|
|
|
|
return OpARM64LessEqual
|
|
|
|
|
case OpARM64GreaterThanU:
|
|
|
|
|
return OpARM64LessEqualU
|
|
|
|
|
case OpARM64LessEqual:
|
|
|
|
|
return OpARM64GreaterThan
|
|
|
|
|
case OpARM64LessEqualU:
|
|
|
|
|
return OpARM64GreaterThanU
|
|
|
|
|
case OpARM64GreaterEqual:
|
|
|
|
|
return OpARM64LessThan
|
|
|
|
|
case OpARM64GreaterEqualU:
|
|
|
|
|
return OpARM64LessThanU
|
|
|
|
|
case OpARM64Equal:
|
|
|
|
|
return OpARM64NotEqual
|
|
|
|
|
case OpARM64NotEqual:
|
|
|
|
|
return OpARM64Equal
|
|
|
|
|
default:
|
|
|
|
|
panic("unreachable")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// arm64Invert evaluates (InvertFlags op), which
|
|
|
|
|
// is the same as altering the condition codes such
|
|
|
|
|
// that the same result would be produced if the arguments
|
|
|
|
|
// to the flag-generating instruction were reversed, e.g.
|
|
|
|
|
// (InvertFlags (CMP x y)) -> (CMP y x)
|
|
|
|
|
//
|
|
|
|
|
// TODO: add floating-point conditions
|
|
|
|
|
func arm64Invert(op Op) Op {
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan:
|
|
|
|
|
return OpARM64GreaterThan
|
|
|
|
|
case OpARM64LessThanU:
|
|
|
|
|
return OpARM64GreaterThanU
|
|
|
|
|
case OpARM64GreaterThan:
|
|
|
|
|
return OpARM64LessThan
|
|
|
|
|
case OpARM64GreaterThanU:
|
|
|
|
|
return OpARM64LessThanU
|
|
|
|
|
case OpARM64LessEqual:
|
|
|
|
|
return OpARM64GreaterEqual
|
|
|
|
|
case OpARM64LessEqualU:
|
|
|
|
|
return OpARM64GreaterEqualU
|
|
|
|
|
case OpARM64GreaterEqual:
|
|
|
|
|
return OpARM64LessEqual
|
|
|
|
|
case OpARM64GreaterEqualU:
|
|
|
|
|
return OpARM64LessEqualU
|
|
|
|
|
case OpARM64Equal, OpARM64NotEqual:
|
|
|
|
|
return op
|
|
|
|
|
default:
|
|
|
|
|
panic("unreachable")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// evaluate an ARM64 op against a flags value
|
|
|
|
|
// that is potentially constant; return 1 for true,
|
|
|
|
|
// -1 for false, and 0 for not constant.
|
|
|
|
|
func ccARM64Eval(cc interface{}, flags *Value) int {
|
|
|
|
|
op := cc.(Op)
|
|
|
|
|
fop := flags.Op
|
|
|
|
|
switch fop {
|
|
|
|
|
case OpARM64InvertFlags:
|
|
|
|
|
return -ccARM64Eval(op, flags.Args[0])
|
|
|
|
|
case OpARM64FlagEQ:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64Equal, OpARM64GreaterEqual, OpARM64LessEqual,
|
|
|
|
|
OpARM64GreaterEqualU, OpARM64LessEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagLT_ULT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan, OpARM64LessThanU,
|
|
|
|
|
OpARM64LessEqual, OpARM64LessEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagLT_UGT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan, OpARM64GreaterThanU,
|
|
|
|
|
OpARM64LessEqual, OpARM64GreaterEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagGT_ULT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64GreaterThan, OpARM64LessThanU,
|
|
|
|
|
OpARM64GreaterEqual, OpARM64LessEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagGT_UGT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64GreaterThan, OpARM64GreaterThanU,
|
|
|
|
|
OpARM64GreaterEqual, OpARM64GreaterEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
default:
|
|
|
|
|
return 0
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-24 15:43:25 -07:00
|
|
|
// logRule logs the use of the rule s. This will only be enabled if
|
|
|
|
|
// rewrite rules were generated with the -log option, see gen/rulegen.go.
|
|
|
|
|
func logRule(s string) {
|
|
|
|
|
if ruleFile == nil {
|
|
|
|
|
// Open a log file to write log to. We open in append
|
|
|
|
|
// mode because all.bash runs the compiler lots of times,
|
|
|
|
|
// and we want the concatenation of all of those logs.
|
|
|
|
|
// This means, of course, that users need to rm the old log
|
|
|
|
|
// to get fresh data.
|
|
|
|
|
// TODO: all.bash runs compilers in parallel. Need to synchronize logging somehow?
|
|
|
|
|
w, err := os.OpenFile(filepath.Join(os.Getenv("GOROOT"), "src", "rulelog"),
|
|
|
|
|
os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
ruleFile = w
|
|
|
|
|
}
|
|
|
|
|
_, err := fmt.Fprintf(ruleFile, "rewrite %s\n", s)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2017-04-22 18:59:11 -07:00
|
|
|
var ruleFile io.Writer
|
2016-12-08 16:17:20 -08:00
|
|
|
|
|
|
|
|
func min(x, y int64) int64 {
|
|
|
|
|
if x < y {
|
|
|
|
|
return x
|
|
|
|
|
}
|
|
|
|
|
return y
|
|
|
|
|
}
|
2017-02-13 16:00:09 -08:00
|
|
|
|
2017-02-03 16:18:01 -05:00
|
|
|
func isConstZero(v *Value) bool {
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case OpConstNil:
|
|
|
|
|
return true
|
|
|
|
|
case OpConst64, OpConst32, OpConst16, OpConst8, OpConstBool, OpConst32F, OpConst64F:
|
|
|
|
|
return v.AuxInt == 0
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-04-03 10:17:48 -07:00
|
|
|
|
|
|
|
|
// reciprocalExact64 reports whether 1/c is exactly representable.
|
|
|
|
|
func reciprocalExact64(c float64) bool {
|
|
|
|
|
b := math.Float64bits(c)
|
|
|
|
|
man := b & (1<<52 - 1)
|
|
|
|
|
if man != 0 {
|
|
|
|
|
return false // not a power of 2, denormal, or NaN
|
|
|
|
|
}
|
|
|
|
|
exp := b >> 52 & (1<<11 - 1)
|
|
|
|
|
// exponent bias is 0x3ff. So taking the reciprocal of a number
|
|
|
|
|
// changes the exponent to 0x7fe-exp.
|
|
|
|
|
switch exp {
|
|
|
|
|
case 0:
|
|
|
|
|
return false // ±0
|
|
|
|
|
case 0x7ff:
|
|
|
|
|
return false // ±inf
|
|
|
|
|
case 0x7fe:
|
|
|
|
|
return false // exponent is not representable
|
|
|
|
|
default:
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// reciprocalExact32 reports whether 1/c is exactly representable.
|
|
|
|
|
func reciprocalExact32(c float32) bool {
|
|
|
|
|
b := math.Float32bits(c)
|
|
|
|
|
man := b & (1<<23 - 1)
|
|
|
|
|
if man != 0 {
|
|
|
|
|
return false // not a power of 2, denormal, or NaN
|
|
|
|
|
}
|
|
|
|
|
exp := b >> 23 & (1<<8 - 1)
|
|
|
|
|
// exponent bias is 0x7f. So taking the reciprocal of a number
|
|
|
|
|
// changes the exponent to 0xfe-exp.
|
|
|
|
|
switch exp {
|
|
|
|
|
case 0:
|
|
|
|
|
return false // ±0
|
|
|
|
|
case 0xff:
|
|
|
|
|
return false // ±inf
|
|
|
|
|
case 0xfe:
|
|
|
|
|
return false // exponent is not representable
|
|
|
|
|
default:
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
2017-04-25 10:53:10 +00:00
|
|
|
|
|
|
|
|
// check if an immediate can be directly encoded into an ARM's instruction
|
|
|
|
|
func isARMImmRot(v uint32) bool {
|
|
|
|
|
for i := 0; i < 16; i++ {
|
|
|
|
|
if v&^0xff == 0 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
v = v<<2 | v>>30
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false
|
|
|
|
|
}
|
cmd/compile: add generic rules to eliminate some unnecessary stores
Eliminates stores of values that have just been loaded from the same
location. Handles the common case where there are up to 3 intermediate
stores to non-overlapping struct fields.
For example the loads and stores of x.a, x.b and x.d in the following
function are now removed:
type T struct {
a, b, c, d int
}
func f(x *T) {
y := *x
y.c += 8
*x = y
}
Before this CL (s390x):
TEXT "".f(SB)
MOVD "".x(R15), R5
MOVD (R5), R1
MOVD 8(R5), R2
MOVD 16(R5), R0
MOVD 24(R5), R4
ADD $8, R0, R3
STMG R1, R4, (R5)
RET
After this CL (s390x):
TEXT "".f(SB)
MOVD "".x(R15), R1
MOVD 16(R1), R0
ADD $8, R0, R0
MOVD R0, 16(R1)
RET
In total these rules are triggered ~5091 times during all.bash,
which is broken down as:
Intermediate stores | Triggered
--------------------+----------
0 | 1434
1 | 2508
2 | 888
3 | 261
--------------------+----------
Change-Id: Ia4721ae40146aceec1fdd3e65b0e9283770bfba5
Reviewed-on: https://go-review.googlesource.com/38793
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-29 16:37:12 -04:00
|
|
|
|
|
|
|
|
// overlap reports whether the ranges given by the given offset and
|
|
|
|
|
// size pairs overlap.
|
|
|
|
|
func overlap(offset1, size1, offset2, size2 int64) bool {
|
|
|
|
|
if offset1 >= offset2 && offset2+size2 > offset1 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if offset2 >= offset1 && offset1+size1 > offset2 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-08-23 11:08:56 -05:00
|
|
|
|
2018-02-23 15:17:54 -05:00
|
|
|
func areAdjacentOffsets(off1, off2, size int64) bool {
|
|
|
|
|
return off1+size == off2 || off1 == off2+size
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-23 11:08:56 -05:00
|
|
|
// check if value zeroes out upper 32-bit of 64-bit register.
|
|
|
|
|
// depth limits recursion depth. In AMD64.rules 3 is used as limit,
|
|
|
|
|
// because it catches same amount of cases as 4.
|
|
|
|
|
func zeroUpper32Bits(x *Value, depth int) bool {
|
|
|
|
|
switch x.Op {
|
|
|
|
|
case OpAMD64MOVLconst, OpAMD64MOVLload, OpAMD64MOVLQZX, OpAMD64MOVLloadidx1,
|
|
|
|
|
OpAMD64MOVWload, OpAMD64MOVWloadidx1, OpAMD64MOVBload, OpAMD64MOVBloadidx1,
|
2018-05-08 09:11:00 -07:00
|
|
|
OpAMD64MOVLloadidx4, OpAMD64ADDLload, OpAMD64SUBLload, OpAMD64ANDLload,
|
|
|
|
|
OpAMD64ORLload, OpAMD64XORLload, OpAMD64CVTTSD2SL,
|
2017-08-23 11:08:56 -05:00
|
|
|
OpAMD64ADDL, OpAMD64ADDLconst, OpAMD64SUBL, OpAMD64SUBLconst,
|
|
|
|
|
OpAMD64ANDL, OpAMD64ANDLconst, OpAMD64ORL, OpAMD64ORLconst,
|
|
|
|
|
OpAMD64XORL, OpAMD64XORLconst, OpAMD64NEGL, OpAMD64NOTL:
|
|
|
|
|
return true
|
2018-02-26 14:45:58 -06:00
|
|
|
case OpArg:
|
2017-08-23 11:08:56 -05:00
|
|
|
return x.Type.Width == 4
|
2018-02-26 14:45:58 -06:00
|
|
|
case OpPhi, OpSelect0, OpSelect1:
|
2017-08-23 11:08:56 -05:00
|
|
|
// Phis can use each-other as an arguments, instead of tracking visited values,
|
|
|
|
|
// just limit recursion depth.
|
|
|
|
|
if depth <= 0 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
for i := range x.Args {
|
|
|
|
|
if !zeroUpper32Bits(x.Args[i], depth-1) {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-08-09 14:00:38 -05:00
|
|
|
|
2018-04-29 15:12:50 +01:00
|
|
|
// isInlinableMemmove reports whether the given arch performs a Move of the given size
|
|
|
|
|
// faster than memmove. It will only return true if replacing the memmove with a Move is
|
|
|
|
|
// safe, either because Move is small or because the arguments are disjoint.
|
|
|
|
|
// This is used as a check for replacing memmove with Move ops.
|
|
|
|
|
func isInlinableMemmove(dst, src *Value, sz int64, c *Config) bool {
|
|
|
|
|
// It is always safe to convert memmove into Move when its arguments are disjoint.
|
|
|
|
|
// Move ops may or may not be faster for large sizes depending on how the platform
|
|
|
|
|
// lowers them, so we only perform this optimization on platforms that we know to
|
|
|
|
|
// have fast Move ops.
|
2017-08-09 14:00:38 -05:00
|
|
|
switch c.arch {
|
|
|
|
|
case "amd64", "amd64p32":
|
|
|
|
|
return sz <= 16
|
2018-04-29 15:12:50 +01:00
|
|
|
case "386", "ppc64", "ppc64le", "arm64":
|
2017-08-09 14:00:38 -05:00
|
|
|
return sz <= 8
|
2018-04-29 15:12:50 +01:00
|
|
|
case "s390x":
|
|
|
|
|
return sz <= 8 || disjoint(dst, sz, src, sz)
|
2017-08-09 14:00:38 -05:00
|
|
|
case "arm", "mips", "mips64", "mipsle", "mips64le":
|
|
|
|
|
return sz <= 4
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3, #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-21 16:15:39 -05:00
|
|
|
|
|
|
|
|
// encodes the lsb and width for arm64 bitfield ops into the expected auxInt format.
|
|
|
|
|
func arm64BFAuxInt(lsb, width int64) int64 {
|
|
|
|
|
if lsb < 0 || lsb > 63 {
|
|
|
|
|
panic("ARM64 bit field lsb constant out of range")
|
|
|
|
|
}
|
|
|
|
|
if width < 1 || width > 64 {
|
|
|
|
|
panic("ARM64 bit field width constant out of range")
|
|
|
|
|
}
|
|
|
|
|
return width | lsb<<8
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// returns the lsb part of the auxInt field of arm64 bitfield ops.
|
|
|
|
|
func getARM64BFlsb(bfc int64) int64 {
|
|
|
|
|
return int64(uint64(bfc) >> 8)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// returns the width part of the auxInt field of arm64 bitfield ops.
|
|
|
|
|
func getARM64BFwidth(bfc int64) int64 {
|
|
|
|
|
return bfc & 0xff
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// checks if mask >> rshift applied at lsb is a valid arm64 bitfield op mask.
|
|
|
|
|
func isARM64BFMask(lsb, mask, rshift int64) bool {
|
|
|
|
|
shiftedMask := int64(uint64(mask) >> uint64(rshift))
|
|
|
|
|
return shiftedMask != 0 && isPowerOfTwo(shiftedMask+1) && nto(shiftedMask)+lsb < 64
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// returns the bitfield width of mask >> rshift for arm64 bitfield ops
|
|
|
|
|
func arm64BFWidth(mask, rshift int64) int64 {
|
|
|
|
|
shiftedMask := int64(uint64(mask) >> uint64(rshift))
|
|
|
|
|
if shiftedMask == 0 {
|
|
|
|
|
panic("ARM64 BF mask is zero")
|
|
|
|
|
}
|
|
|
|
|
return nto(shiftedMask)
|
|
|
|
|
}
|
2018-04-11 22:47:24 +01:00
|
|
|
|
|
|
|
|
// sizeof returns the size of t in bytes.
|
|
|
|
|
// It will panic if t is not a *types.Type.
|
|
|
|
|
func sizeof(t interface{}) int64 {
|
|
|
|
|
return t.(*types.Type).Size()
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// alignof returns the alignment of t in bytes.
|
|
|
|
|
// It will panic if t is not a *types.Type.
|
|
|
|
|
func alignof(t interface{}) int64 {
|
|
|
|
|
return t.(*types.Type).Alignment()
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// registerizable reports whether t is a primitive type that fits in
|
|
|
|
|
// a register. It assumes float64 values will always fit into registers
|
|
|
|
|
// even if that isn't strictly true.
|
|
|
|
|
// It will panic if t is not a *types.Type.
|
|
|
|
|
func registerizable(b *Block, t interface{}) bool {
|
|
|
|
|
typ := t.(*types.Type)
|
|
|
|
|
if typ.IsPtrShaped() || typ.IsFloat() {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if typ.IsInteger() {
|
|
|
|
|
return typ.Size() <= b.Func.Config.RegSize
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|