2015-03-23 17:02:11 -07:00
|
|
|
// Copyright 2015 The Go Authors. All rights reserved.
|
|
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
package ssa
|
|
|
|
|
|
2015-09-04 06:33:56 -05:00
|
|
|
import (
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
"cmd/compile/internal/types"
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
"cmd/internal/obj"
|
2018-10-09 22:55:36 -07:00
|
|
|
"cmd/internal/objabi"
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
"cmd/internal/src"
|
2018-10-09 22:55:36 -07:00
|
|
|
"encoding/binary"
|
2015-09-04 06:33:56 -05:00
|
|
|
"fmt"
|
2017-04-22 18:59:11 -07:00
|
|
|
"io"
|
2015-09-04 06:33:56 -05:00
|
|
|
"math"
|
2018-05-16 11:21:18 +01:00
|
|
|
"math/bits"
|
2016-05-24 15:43:25 -07:00
|
|
|
"os"
|
|
|
|
|
"path/filepath"
|
2015-09-04 06:33:56 -05:00
|
|
|
)
|
2015-03-23 17:02:11 -07:00
|
|
|
|
2017-03-17 10:50:20 -07:00
|
|
|
func applyRewrite(f *Func, rb blockRewriter, rv valueRewriter) {
|
2015-03-23 17:02:11 -07:00
|
|
|
// repeat rewrites until we find no more rewrites
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
pendingLines := f.cachedLineStarts // Holds statement boundaries that need to be moved to a new value/block
|
|
|
|
|
pendingLines.clear()
|
2015-03-23 17:02:11 -07:00
|
|
|
for {
|
|
|
|
|
change := false
|
|
|
|
|
for _, b := range f.Blocks {
|
2015-05-28 16:45:33 -07:00
|
|
|
if b.Control != nil && b.Control.Op == OpCopy {
|
|
|
|
|
for b.Control.Op == OpCopy {
|
2016-03-15 20:45:50 -07:00
|
|
|
b.SetControl(b.Control.Args[0])
|
2015-05-28 16:45:33 -07:00
|
|
|
}
|
|
|
|
|
}
|
2017-03-17 10:50:20 -07:00
|
|
|
if rb(b) {
|
2015-05-28 16:45:33 -07:00
|
|
|
change = true
|
|
|
|
|
}
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
for j, v := range b.Values {
|
2016-04-21 10:11:33 +02:00
|
|
|
change = phielimValue(v) || change
|
|
|
|
|
|
2016-04-11 21:23:11 -07:00
|
|
|
// Eliminate copy inputs.
|
|
|
|
|
// If any copy input becomes unused, mark it
|
|
|
|
|
// as invalid and discard its argument. Repeat
|
|
|
|
|
// recursively on the discarded argument.
|
|
|
|
|
// This phase helps remove phantom "dead copy" uses
|
|
|
|
|
// of a value so that a x.Uses==1 rule condition
|
|
|
|
|
// fires reliably.
|
|
|
|
|
for i, a := range v.Args {
|
|
|
|
|
if a.Op != OpCopy {
|
|
|
|
|
continue
|
|
|
|
|
}
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
aa := copySource(a)
|
|
|
|
|
v.SetArg(i, aa)
|
|
|
|
|
// If a, a copy, has a line boundary indicator, attempt to find a new value
|
|
|
|
|
// to hold it. The first candidate is the value that will replace a (aa),
|
|
|
|
|
// if it shares the same block and line and is eligible.
|
|
|
|
|
// The second option is v, which has a as an input. Because aa is earlier in
|
|
|
|
|
// the data flow, it is the better choice.
|
|
|
|
|
if a.Pos.IsStmt() == src.PosIsStmt {
|
|
|
|
|
if aa.Block == a.Block && aa.Pos.Line() == a.Pos.Line() && aa.Pos.IsStmt() != src.PosNotStmt {
|
|
|
|
|
aa.Pos = aa.Pos.WithIsStmt()
|
|
|
|
|
} else if v.Block == a.Block && v.Pos.Line() == a.Pos.Line() && v.Pos.IsStmt() != src.PosNotStmt {
|
|
|
|
|
v.Pos = v.Pos.WithIsStmt()
|
|
|
|
|
} else {
|
|
|
|
|
// Record the lost line and look for a new home after all rewrites are complete.
|
|
|
|
|
// TODO: it's possible (in FOR loops, in particular) for statement boundaries for the same
|
|
|
|
|
// line to appear in more than one block, but only one block is stored, so if both end
|
|
|
|
|
// up here, then one will be lost.
|
|
|
|
|
pendingLines.set(a.Pos.Line(), int32(a.Block.ID))
|
|
|
|
|
}
|
|
|
|
|
a.Pos = a.Pos.WithNotStmt()
|
|
|
|
|
}
|
2016-04-11 21:23:11 -07:00
|
|
|
change = true
|
|
|
|
|
for a.Uses == 0 {
|
|
|
|
|
b := a.Args[0]
|
|
|
|
|
a.reset(OpInvalid)
|
|
|
|
|
a = b
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2015-05-18 16:44:20 -07:00
|
|
|
// apply rewrite function
|
2017-03-17 10:50:20 -07:00
|
|
|
if rv(v) {
|
2015-03-23 17:02:11 -07:00
|
|
|
change = true
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
// If value changed to a poor choice for a statement boundary, move the boundary
|
|
|
|
|
if v.Pos.IsStmt() == src.PosIsStmt {
|
|
|
|
|
if k := nextGoodStatementIndex(v, j, b); k != j {
|
|
|
|
|
v.Pos = v.Pos.WithNotStmt()
|
|
|
|
|
b.Values[k].Pos = b.Values[k].Pos.WithIsStmt()
|
|
|
|
|
}
|
|
|
|
|
}
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if !change {
|
2016-04-11 21:23:11 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
}
|
2016-04-20 15:02:48 -07:00
|
|
|
// remove clobbered values
|
2016-04-11 21:23:11 -07:00
|
|
|
for _, b := range f.Blocks {
|
|
|
|
|
j := 0
|
|
|
|
|
for i, v := range b.Values {
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
vl := v.Pos.Line()
|
2016-04-11 21:23:11 -07:00
|
|
|
if v.Op == OpInvalid {
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
if v.Pos.IsStmt() == src.PosIsStmt {
|
|
|
|
|
pendingLines.set(vl, int32(b.ID))
|
|
|
|
|
}
|
2016-04-11 21:23:11 -07:00
|
|
|
f.freeValue(v)
|
|
|
|
|
continue
|
|
|
|
|
}
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
if v.Pos.IsStmt() != src.PosNotStmt && pendingLines.get(vl) == int32(b.ID) {
|
|
|
|
|
pendingLines.remove(vl)
|
|
|
|
|
v.Pos = v.Pos.WithIsStmt()
|
|
|
|
|
}
|
2016-04-11 21:23:11 -07:00
|
|
|
if i != j {
|
|
|
|
|
b.Values[j] = v
|
|
|
|
|
}
|
|
|
|
|
j++
|
|
|
|
|
}
|
cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
|
|
|
if pendingLines.get(b.Pos.Line()) == int32(b.ID) {
|
|
|
|
|
b.Pos = b.Pos.WithIsStmt()
|
|
|
|
|
pendingLines.remove(b.Pos.Line())
|
|
|
|
|
}
|
2016-04-11 21:23:11 -07:00
|
|
|
if j != len(b.Values) {
|
|
|
|
|
tail := b.Values[j:]
|
|
|
|
|
for j := range tail {
|
|
|
|
|
tail[j] = nil
|
|
|
|
|
}
|
|
|
|
|
b.Values = b.Values[:j]
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Common functions called from rewriting rules
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is64BitFloat(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 8 && t.IsFloat()
|
2015-08-12 16:38:11 -04:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is32BitFloat(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 4 && t.IsFloat()
|
2015-08-12 16:38:11 -04:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is64BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 8 && t.IsInteger()
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is32BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 4 && t.IsInteger()
|
2015-04-15 15:51:25 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is16BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 2 && t.IsInteger()
|
2015-06-14 11:38:46 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func is8BitInt(t *types.Type) bool {
|
2017-04-28 00:19:49 +00:00
|
|
|
return t.Size() == 1 && t.IsInteger()
|
2015-06-14 11:38:46 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func isPtr(t *types.Type) bool {
|
2016-03-28 10:55:44 -07:00
|
|
|
return t.IsPtrShaped()
|
2015-03-23 17:02:11 -07:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func isSigned(t *types.Type) bool {
|
2015-04-15 15:51:25 -07:00
|
|
|
return t.IsSigned()
|
2015-03-26 10:49:03 -07:00
|
|
|
}
|
|
|
|
|
|
2016-03-01 23:21:55 +00:00
|
|
|
// mergeSym merges two symbolic offsets. There is no real merging of
|
2015-08-23 21:14:25 -07:00
|
|
|
// offsets, we just pick the non-nil one.
|
2015-06-19 21:02:28 -07:00
|
|
|
func mergeSym(x, y interface{}) interface{} {
|
|
|
|
|
if x == nil {
|
|
|
|
|
return y
|
|
|
|
|
}
|
|
|
|
|
if y == nil {
|
|
|
|
|
return x
|
|
|
|
|
}
|
|
|
|
|
panic(fmt.Sprintf("mergeSym with two non-nil syms %s %s", x, y))
|
|
|
|
|
}
|
2015-08-23 21:14:25 -07:00
|
|
|
func canMergeSym(x, y interface{}) bool {
|
|
|
|
|
return x == nil || y == nil
|
|
|
|
|
}
|
2015-06-19 21:02:28 -07:00
|
|
|
|
2018-10-26 10:52:59 -07:00
|
|
|
// canMergeLoadClobber reports whether the load can be merged into target without
|
2016-09-14 10:42:14 -04:00
|
|
|
// invalidating the schedule.
|
2017-03-18 11:16:30 -07:00
|
|
|
// It also checks that the other non-load argument x is something we
|
2018-10-26 10:52:59 -07:00
|
|
|
// are ok with clobbering.
|
|
|
|
|
func canMergeLoadClobber(target, load, x *Value) bool {
|
2017-03-18 11:16:30 -07:00
|
|
|
// The register containing x is going to get clobbered.
|
|
|
|
|
// Don't merge if we still need the value of x.
|
|
|
|
|
// We don't have liveness information here, but we can
|
|
|
|
|
// approximate x dying with:
|
|
|
|
|
// 1) target is x's only use.
|
|
|
|
|
// 2) target is not in a deeper loop than x.
|
|
|
|
|
if x.Uses != 1 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
loopnest := x.Block.Func.loopnest()
|
|
|
|
|
loopnest.calculateDepths()
|
|
|
|
|
if loopnest.depth(target.Block.ID) > loopnest.depth(x.Block.ID) {
|
|
|
|
|
return false
|
|
|
|
|
}
|
2018-10-26 10:52:59 -07:00
|
|
|
return canMergeLoad(target, load)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// canMergeLoad reports whether the load can be merged into target without
|
|
|
|
|
// invalidating the schedule.
|
|
|
|
|
func canMergeLoad(target, load *Value) bool {
|
|
|
|
|
if target.Block.ID != load.Block.ID {
|
|
|
|
|
// If the load is in a different block do not merge it.
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// We can't merge the load into the target if the load
|
|
|
|
|
// has more than one use.
|
|
|
|
|
if load.Uses != 1 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-03-18 11:16:30 -07:00
|
|
|
|
2017-03-03 13:44:18 -08:00
|
|
|
mem := load.MemoryArg()
|
2016-09-14 10:42:14 -04:00
|
|
|
|
|
|
|
|
// We need the load's memory arg to still be alive at target. That
|
|
|
|
|
// can't be the case if one of target's args depends on a memory
|
|
|
|
|
// state that is a successor of load's memory arg.
|
|
|
|
|
//
|
|
|
|
|
// For example, it would be invalid to merge load into target in
|
|
|
|
|
// the following situation because newmem has killed oldmem
|
|
|
|
|
// before target is reached:
|
|
|
|
|
// load = read ... oldmem
|
|
|
|
|
// newmem = write ... oldmem
|
|
|
|
|
// arg0 = read ... newmem
|
|
|
|
|
// target = add arg0 load
|
|
|
|
|
//
|
|
|
|
|
// If the argument comes from a different block then we can exclude
|
|
|
|
|
// it immediately because it must dominate load (which is in the
|
|
|
|
|
// same block as target).
|
|
|
|
|
var args []*Value
|
|
|
|
|
for _, a := range target.Args {
|
|
|
|
|
if a != load && a.Block.ID == target.Block.ID {
|
|
|
|
|
args = append(args, a)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// memPreds contains memory states known to be predecessors of load's
|
|
|
|
|
// memory state. It is lazily initialized.
|
|
|
|
|
var memPreds map[*Value]bool
|
|
|
|
|
for i := 0; len(args) > 0; i++ {
|
|
|
|
|
const limit = 100
|
|
|
|
|
if i >= limit {
|
|
|
|
|
// Give up if we have done a lot of iterations.
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
v := args[len(args)-1]
|
|
|
|
|
args = args[:len(args)-1]
|
|
|
|
|
if target.Block.ID != v.Block.ID {
|
|
|
|
|
// Since target and load are in the same block
|
|
|
|
|
// we can stop searching when we leave the block.
|
2018-09-18 01:22:59 +03:00
|
|
|
continue
|
2016-09-14 10:42:14 -04:00
|
|
|
}
|
|
|
|
|
if v.Op == OpPhi {
|
|
|
|
|
// A Phi implies we have reached the top of the block.
|
2017-06-06 15:25:29 -07:00
|
|
|
// The memory phi, if it exists, is always
|
|
|
|
|
// the first logical store in the block.
|
2018-09-18 01:22:59 +03:00
|
|
|
continue
|
2016-09-14 10:42:14 -04:00
|
|
|
}
|
|
|
|
|
if v.Type.IsTuple() && v.Type.FieldType(1).IsMemory() {
|
|
|
|
|
// We could handle this situation however it is likely
|
|
|
|
|
// to be very rare.
|
|
|
|
|
return false
|
|
|
|
|
}
|
2018-12-11 16:12:57 -08:00
|
|
|
if v.Op.SymEffect()&SymAddr != 0 {
|
|
|
|
|
// This case prevents an operation that calculates the
|
|
|
|
|
// address of a local variable from being forced to schedule
|
|
|
|
|
// before its corresponding VarDef.
|
|
|
|
|
// See issue 28445.
|
|
|
|
|
// v1 = LOAD ...
|
|
|
|
|
// v2 = VARDEF
|
|
|
|
|
// v3 = LEAQ
|
|
|
|
|
// v4 = CMPQ v1 v3
|
|
|
|
|
// We don't want to combine the CMPQ with the load, because
|
|
|
|
|
// that would force the CMPQ to schedule before the VARDEF, which
|
|
|
|
|
// in turn requires the LEAQ to schedule before the VARDEF.
|
|
|
|
|
return false
|
|
|
|
|
}
|
2016-09-14 10:42:14 -04:00
|
|
|
if v.Type.IsMemory() {
|
|
|
|
|
if memPreds == nil {
|
|
|
|
|
// Initialise a map containing memory states
|
|
|
|
|
// known to be predecessors of load's memory
|
|
|
|
|
// state.
|
|
|
|
|
memPreds = make(map[*Value]bool)
|
|
|
|
|
m := mem
|
|
|
|
|
const limit = 50
|
|
|
|
|
for i := 0; i < limit; i++ {
|
|
|
|
|
if m.Op == OpPhi {
|
2017-06-06 15:25:29 -07:00
|
|
|
// The memory phi, if it exists, is always
|
|
|
|
|
// the first logical store in the block.
|
2016-09-14 10:42:14 -04:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
if m.Block.ID != target.Block.ID {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
if !m.Type.IsMemory() {
|
|
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
memPreds[m] = true
|
|
|
|
|
if len(m.Args) == 0 {
|
|
|
|
|
break
|
|
|
|
|
}
|
2017-03-03 13:44:18 -08:00
|
|
|
m = m.MemoryArg()
|
2016-09-14 10:42:14 -04:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// We can merge if v is a predecessor of mem.
|
|
|
|
|
//
|
|
|
|
|
// For example, we can merge load into target in the
|
|
|
|
|
// following scenario:
|
|
|
|
|
// x = read ... v
|
|
|
|
|
// mem = write ... v
|
|
|
|
|
// load = read ... mem
|
|
|
|
|
// target = add x load
|
|
|
|
|
if memPreds[v] {
|
2018-09-18 01:22:59 +03:00
|
|
|
continue
|
2016-09-14 10:42:14 -04:00
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
if len(v.Args) > 0 && v.Args[len(v.Args)-1] == mem {
|
|
|
|
|
// If v takes mem as an input then we know mem
|
|
|
|
|
// is valid at this point.
|
2018-09-18 01:22:59 +03:00
|
|
|
continue
|
2016-09-14 10:42:14 -04:00
|
|
|
}
|
|
|
|
|
for _, a := range v.Args {
|
|
|
|
|
if target.Block.ID == a.Block.ID {
|
|
|
|
|
args = append(args, a)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
2017-03-18 11:16:30 -07:00
|
|
|
|
2016-09-14 10:42:14 -04:00
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2018-11-22 11:46:44 +01:00
|
|
|
// isSameSym reports whether sym is the same as the given named symbol
|
2016-08-26 15:41:51 -04:00
|
|
|
func isSameSym(sym interface{}, name string) bool {
|
|
|
|
|
s, ok := sym.(fmt.Stringer)
|
|
|
|
|
return ok && s.String() == name
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-11 20:43:15 -06:00
|
|
|
// nlz returns the number of leading zeros.
|
|
|
|
|
func nlz(x int64) int64 {
|
2018-05-16 11:21:18 +01:00
|
|
|
return int64(bits.LeadingZeros64(uint64(x)))
|
2016-02-11 20:43:15 -06:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// ntz returns the number of trailing zeros.
|
|
|
|
|
func ntz(x int64) int64 {
|
2018-05-16 11:21:18 +01:00
|
|
|
return int64(bits.TrailingZeros64(uint64(x)))
|
2016-02-11 20:43:15 -06:00
|
|
|
}
|
|
|
|
|
|
2017-08-09 05:01:26 +00:00
|
|
|
func oneBit(x int64) bool {
|
2018-05-16 11:21:18 +01:00
|
|
|
return bits.OnesCount64(uint64(x)) == 1
|
2017-08-09 05:01:26 +00:00
|
|
|
}
|
|
|
|
|
|
2016-02-11 20:43:15 -06:00
|
|
|
// nlo returns the number of leading ones.
|
|
|
|
|
func nlo(x int64) int64 {
|
|
|
|
|
return nlz(^x)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// nto returns the number of trailing ones.
|
|
|
|
|
func nto(x int64) int64 {
|
|
|
|
|
return ntz(^x)
|
|
|
|
|
}
|
|
|
|
|
|
2017-02-13 16:00:09 -08:00
|
|
|
// log2 returns logarithm in base 2 of uint64(n), with log2(0) = -1.
|
2016-09-07 14:04:31 -07:00
|
|
|
// Rounds down.
|
2018-05-16 11:21:18 +01:00
|
|
|
func log2(n int64) int64 {
|
|
|
|
|
return int64(bits.Len64(uint64(n))) - 1
|
2015-07-17 12:26:35 +02:00
|
|
|
}
|
|
|
|
|
|
2018-02-20 09:39:09 +01:00
|
|
|
// log2uint32 returns logarithm in base 2 of uint32(n), with log2(0) = -1.
|
|
|
|
|
// Rounds down.
|
2018-05-16 11:21:18 +01:00
|
|
|
func log2uint32(n int64) int64 {
|
|
|
|
|
return int64(bits.Len32(uint32(n))) - 1
|
2018-02-20 09:39:09 +01:00
|
|
|
}
|
|
|
|
|
|
2015-07-25 12:53:58 -05:00
|
|
|
// isPowerOfTwo reports whether n is a power of 2.
|
2015-07-17 12:26:35 +02:00
|
|
|
func isPowerOfTwo(n int64) bool {
|
|
|
|
|
return n > 0 && n&(n-1) == 0
|
|
|
|
|
}
|
2015-07-25 12:53:58 -05:00
|
|
|
|
2018-02-20 09:39:09 +01:00
|
|
|
// isUint64PowerOfTwo reports whether uint64(n) is a power of 2.
|
|
|
|
|
func isUint64PowerOfTwo(in int64) bool {
|
|
|
|
|
n := uint64(in)
|
|
|
|
|
return n > 0 && n&(n-1) == 0
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// isUint32PowerOfTwo reports whether uint32(n) is a power of 2.
|
|
|
|
|
func isUint32PowerOfTwo(in int64) bool {
|
|
|
|
|
n := uint64(uint32(in))
|
|
|
|
|
return n > 0 && n&(n-1) == 0
|
|
|
|
|
}
|
|
|
|
|
|
2015-07-25 12:53:58 -05:00
|
|
|
// is32Bit reports whether n can be represented as a signed 32 bit integer.
|
|
|
|
|
func is32Bit(n int64) bool {
|
|
|
|
|
return n == int64(int32(n))
|
|
|
|
|
}
|
2015-09-03 18:24:22 -05:00
|
|
|
|
2016-07-06 13:32:52 -07:00
|
|
|
// is16Bit reports whether n can be represented as a signed 16 bit integer.
|
|
|
|
|
func is16Bit(n int64) bool {
|
|
|
|
|
return n == int64(int16(n))
|
|
|
|
|
}
|
|
|
|
|
|
2017-04-30 14:25:57 -04:00
|
|
|
// isU12Bit reports whether n can be represented as an unsigned 12 bit integer.
|
|
|
|
|
func isU12Bit(n int64) bool {
|
|
|
|
|
return 0 <= n && n < (1<<12)
|
|
|
|
|
}
|
|
|
|
|
|
2016-10-05 13:21:09 -07:00
|
|
|
// isU16Bit reports whether n can be represented as an unsigned 16 bit integer.
|
2016-09-26 10:06:10 -07:00
|
|
|
func isU16Bit(n int64) bool {
|
|
|
|
|
return n == int64(uint16(n))
|
|
|
|
|
}
|
|
|
|
|
|
2016-10-05 13:21:09 -07:00
|
|
|
// isU32Bit reports whether n can be represented as an unsigned 32 bit integer.
|
|
|
|
|
func isU32Bit(n int64) bool {
|
|
|
|
|
return n == int64(uint32(n))
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-12 14:50:10 -04:00
|
|
|
// is20Bit reports whether n can be represented as a signed 20 bit integer.
|
|
|
|
|
func is20Bit(n int64) bool {
|
|
|
|
|
return -(1<<19) <= n && n < (1<<19)
|
|
|
|
|
}
|
|
|
|
|
|
2015-09-03 18:24:22 -05:00
|
|
|
// b2i translates a boolean value to 0 or 1 for assigning to auxInt.
|
|
|
|
|
func b2i(b bool) int64 {
|
|
|
|
|
if b {
|
|
|
|
|
return 1
|
|
|
|
|
}
|
|
|
|
|
return 0
|
|
|
|
|
}
|
2015-09-04 06:33:56 -05:00
|
|
|
|
2018-04-26 20:56:03 -07:00
|
|
|
// shiftIsBounded reports whether (left/right) shift Value v is known to be bounded.
|
|
|
|
|
// A shift is bounded if it is shifting by less than the width of the shifted value.
|
|
|
|
|
func shiftIsBounded(v *Value) bool {
|
2018-04-29 17:40:47 -07:00
|
|
|
return v.AuxInt != 0
|
2018-04-26 20:56:03 -07:00
|
|
|
}
|
|
|
|
|
|
2018-09-03 12:14:31 +01:00
|
|
|
// truncate64Fto32F converts a float64 value to a float32 preserving the bit pattern
|
|
|
|
|
// of the mantissa. It will panic if the truncation results in lost information.
|
|
|
|
|
func truncate64Fto32F(f float64) float32 {
|
|
|
|
|
if !isExactFloat32(f) {
|
|
|
|
|
panic("truncate64Fto32F: truncation is not exact")
|
|
|
|
|
}
|
|
|
|
|
if !math.IsNaN(f) {
|
|
|
|
|
return float32(f)
|
|
|
|
|
}
|
|
|
|
|
// NaN bit patterns aren't necessarily preserved across conversion
|
|
|
|
|
// instructions so we need to do the conversion manually.
|
|
|
|
|
b := math.Float64bits(f)
|
|
|
|
|
m := b & ((1 << 52) - 1) // mantissa (a.k.a. significand)
|
|
|
|
|
// | sign | exponent | mantissa |
|
|
|
|
|
r := uint32(((b >> 32) & (1 << 31)) | 0x7f800000 | (m >> (52 - 23)))
|
|
|
|
|
return math.Float32frombits(r)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// extend32Fto64F converts a float32 value to a float64 value preserving the bit
|
|
|
|
|
// pattern of the mantissa.
|
|
|
|
|
func extend32Fto64F(f float32) float64 {
|
|
|
|
|
if !math.IsNaN(float64(f)) {
|
|
|
|
|
return float64(f)
|
|
|
|
|
}
|
|
|
|
|
// NaN bit patterns aren't necessarily preserved across conversion
|
|
|
|
|
// instructions so we need to do the conversion manually.
|
|
|
|
|
b := uint64(math.Float32bits(f))
|
|
|
|
|
// | sign | exponent | mantissa |
|
|
|
|
|
r := ((b << 32) & (1 << 63)) | (0x7ff << 52) | ((b & 0x7fffff) << (52 - 23))
|
|
|
|
|
return math.Float64frombits(r)
|
|
|
|
|
}
|
|
|
|
|
|
2018-08-06 19:50:38 +10:00
|
|
|
// NeedsFixUp reports whether the division needs fix-up code.
|
|
|
|
|
func NeedsFixUp(v *Value) bool {
|
|
|
|
|
return v.AuxInt == 0
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// i2f is used in rules for converting from an AuxInt to a float.
|
|
|
|
|
func i2f(i int64) float64 {
|
|
|
|
|
return math.Float64frombits(uint64(i))
|
|
|
|
|
}
|
|
|
|
|
|
2018-09-12 12:16:50 +01:00
|
|
|
// auxFrom64F encodes a float64 value so it can be stored in an AuxInt.
|
|
|
|
|
func auxFrom64F(f float64) int64 {
|
|
|
|
|
return int64(math.Float64bits(f))
|
2016-03-11 19:36:54 -06:00
|
|
|
}
|
|
|
|
|
|
2018-09-12 12:16:50 +01:00
|
|
|
// auxFrom32F encodes a float32 value so it can be stored in an AuxInt.
|
|
|
|
|
func auxFrom32F(f float32) int64 {
|
|
|
|
|
return int64(math.Float64bits(extend32Fto64F(f)))
|
2016-03-11 19:36:54 -06:00
|
|
|
}
|
|
|
|
|
|
2018-09-12 12:16:50 +01:00
|
|
|
// auxTo32F decodes a float32 from the AuxInt value provided.
|
|
|
|
|
func auxTo32F(i int64) float32 {
|
|
|
|
|
return truncate64Fto32F(math.Float64frombits(uint64(i)))
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// auxTo64F decodes a float64 from the AuxInt value provided.
|
|
|
|
|
func auxTo64F(i int64) float64 {
|
|
|
|
|
return math.Float64frombits(uint64(i))
|
2015-09-04 06:33:56 -05:00
|
|
|
}
|
2015-09-18 18:23:34 -07:00
|
|
|
|
2018-11-02 15:18:43 +00:00
|
|
|
// uaddOvf reports whether unsigned a+b would overflow.
|
2016-02-03 06:21:24 -05:00
|
|
|
func uaddOvf(a, b int64) bool {
|
|
|
|
|
return uint64(a)+uint64(b) < uint64(a)
|
|
|
|
|
}
|
|
|
|
|
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
// de-virtualize an InterCall
|
|
|
|
|
// 'sym' is the symbol for the itab
|
|
|
|
|
func devirt(v *Value, sym interface{}, offset int64) *obj.LSym {
|
|
|
|
|
f := v.Block.Func
|
2017-09-18 14:53:56 -07:00
|
|
|
n, ok := sym.(*obj.LSym)
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
if !ok {
|
|
|
|
|
return nil
|
|
|
|
|
}
|
2017-09-18 14:53:56 -07:00
|
|
|
lsym := f.fe.DerefItab(n, offset)
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
if f.pass.debug > 0 {
|
|
|
|
|
if lsym != nil {
|
2017-03-16 22:42:10 -07:00
|
|
|
f.Warnl(v.Pos, "de-virtualizing call")
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
} else {
|
2017-03-16 22:42:10 -07:00
|
|
|
f.Warnl(v.Pos, "couldn't de-virtualize call")
|
cmd/compile: de-virtualize interface calls
With this change, code like
h := sha1.New()
h.Write(buf)
sum := h.Sum()
gets compiled into static calls rather than
interface calls, because the compiler is able
to prove that 'h' is really a *sha1.digest.
The InterCall re-write rule hits a few dozen times
during make.bash, and hundreds of times during all.bash.
The most common pattern identified by the compiler
is a constructor like
func New() Interface { return &impl{...} }
where the constructor gets inlined into the caller,
and the result is used immediately. Examples include
{sha1,md5,crc32,crc64,...}.New, base64.NewEncoder,
base64.NewDecoder, errors.New, net.Pipe, and so on.
Some existing benchmarks that change on darwin/amd64:
Crc64/ISO4KB-8 2.67µs ± 1% 2.66µs ± 0% -0.36% (p=0.015 n=10+10)
Crc64/ISO1KB-8 694ns ± 0% 690ns ± 1% -0.59% (p=0.001 n=10+10)
Adler32KB-8 473ns ± 1% 471ns ± 0% -0.39% (p=0.010 n=10+9)
On architectures like amd64, the reduction in code size
appears to contribute more to benchmark improvements than just
removing the indirect call, since that branch gets predicted
accurately when called in a loop.
Updates #19361
Change-Id: I57d4dc21ef40a05ec0fbd55a9bb0eb74cdc67a3d
Reviewed-on: https://go-review.googlesource.com/38139
Run-TryBot: Philip Hofer <phofer@umich.edu>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-13 15:03:17 -07:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return lsym
|
|
|
|
|
}
|
|
|
|
|
|
2016-02-13 17:37:19 -06:00
|
|
|
// isSamePtr reports whether p1 and p2 point to the same address.
|
|
|
|
|
func isSamePtr(p1, p2 *Value) bool {
|
2016-02-24 12:58:47 -08:00
|
|
|
if p1 == p2 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
2016-03-04 18:55:09 -08:00
|
|
|
if p1.Op != p2.Op {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
switch p1.Op {
|
|
|
|
|
case OpOffPtr:
|
|
|
|
|
return p1.AuxInt == p2.AuxInt && isSamePtr(p1.Args[0], p2.Args[0])
|
2018-07-03 11:34:38 -04:00
|
|
|
case OpAddr, OpLocalAddr:
|
2016-03-04 18:55:09 -08:00
|
|
|
// OpAddr's 0th arg is either OpSP or OpSB, which means that it is uniquely identified by its Op.
|
|
|
|
|
// Checking for value equality only works after [z]cse has run.
|
|
|
|
|
return p1.Aux == p2.Aux && p1.Args[0].Op == p2.Args[0].Op
|
|
|
|
|
case OpAddPtr:
|
|
|
|
|
return p1.Args[1] == p2.Args[1] && isSamePtr(p1.Args[0], p2.Args[0])
|
|
|
|
|
}
|
|
|
|
|
return false
|
2016-02-13 17:37:19 -06:00
|
|
|
}
|
|
|
|
|
|
2018-10-28 12:01:11 -07:00
|
|
|
func isStackPtr(v *Value) bool {
|
|
|
|
|
for v.Op == OpOffPtr || v.Op == OpAddPtr {
|
|
|
|
|
v = v.Args[0]
|
|
|
|
|
}
|
|
|
|
|
return v.Op == OpSP || v.Op == OpLocalAddr
|
|
|
|
|
}
|
|
|
|
|
|
2018-04-11 22:47:24 +01:00
|
|
|
// disjoint reports whether the memory region specified by [p1:p1+n1)
|
|
|
|
|
// does not overlap with [p2:p2+n2).
|
|
|
|
|
// A return value of false does not imply the regions overlap.
|
|
|
|
|
func disjoint(p1 *Value, n1 int64, p2 *Value, n2 int64) bool {
|
|
|
|
|
if n1 == 0 || n2 == 0 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if p1 == p2 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
baseAndOffset := func(ptr *Value) (base *Value, offset int64) {
|
|
|
|
|
base, offset = ptr, 0
|
2018-10-28 11:19:33 -07:00
|
|
|
for base.Op == OpOffPtr {
|
2018-04-11 22:47:24 +01:00
|
|
|
offset += base.AuxInt
|
|
|
|
|
base = base.Args[0]
|
|
|
|
|
}
|
|
|
|
|
return base, offset
|
|
|
|
|
}
|
|
|
|
|
p1, off1 := baseAndOffset(p1)
|
|
|
|
|
p2, off2 := baseAndOffset(p2)
|
|
|
|
|
if isSamePtr(p1, p2) {
|
|
|
|
|
return !overlap(off1, n1, off2, n2)
|
|
|
|
|
}
|
|
|
|
|
// p1 and p2 are not the same, so if they are both OpAddrs then
|
|
|
|
|
// they point to different variables.
|
|
|
|
|
// If one pointer is on the stack and the other is an argument
|
|
|
|
|
// then they can't overlap.
|
|
|
|
|
switch p1.Op {
|
2018-07-03 11:34:38 -04:00
|
|
|
case OpAddr, OpLocalAddr:
|
|
|
|
|
if p2.Op == OpAddr || p2.Op == OpLocalAddr || p2.Op == OpSP {
|
2018-04-11 22:47:24 +01:00
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
return p2.Op == OpArg && p1.Args[0].Op == OpSP
|
|
|
|
|
case OpArg:
|
2018-07-03 11:34:38 -04:00
|
|
|
if p2.Op == OpSP || p2.Op == OpLocalAddr {
|
2018-04-11 22:47:24 +01:00
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
case OpSP:
|
2018-07-03 11:34:38 -04:00
|
|
|
return p2.Op == OpAddr || p2.Op == OpLocalAddr || p2.Op == OpArg || p2.Op == OpSP
|
2018-04-11 22:47:24 +01:00
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
2016-07-22 06:41:14 -04:00
|
|
|
// moveSize returns the number of bytes an aligned MOV instruction moves
|
|
|
|
|
func moveSize(align int64, c *Config) int64 {
|
|
|
|
|
switch {
|
2017-04-21 18:44:34 -07:00
|
|
|
case align%8 == 0 && c.PtrSize == 8:
|
2016-07-22 06:41:14 -04:00
|
|
|
return 8
|
|
|
|
|
case align%4 == 0:
|
|
|
|
|
return 4
|
|
|
|
|
case align%2 == 0:
|
|
|
|
|
return 2
|
|
|
|
|
}
|
|
|
|
|
return 1
|
|
|
|
|
}
|
|
|
|
|
|
2016-03-28 21:45:33 -07:00
|
|
|
// mergePoint finds a block among a's blocks which dominates b and is itself
|
|
|
|
|
// dominated by all of a's blocks. Returns nil if it can't find one.
|
|
|
|
|
// Might return nil even if one does exist.
|
|
|
|
|
func mergePoint(b *Block, a ...*Value) *Block {
|
|
|
|
|
// Walk backward from b looking for one of the a's blocks.
|
|
|
|
|
|
|
|
|
|
// Max distance
|
|
|
|
|
d := 100
|
|
|
|
|
|
|
|
|
|
for d > 0 {
|
|
|
|
|
for _, x := range a {
|
|
|
|
|
if b == x.Block {
|
|
|
|
|
goto found
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if len(b.Preds) > 1 {
|
|
|
|
|
// Don't know which way to go back. Abort.
|
|
|
|
|
return nil
|
|
|
|
|
}
|
2016-04-28 16:52:47 -07:00
|
|
|
b = b.Preds[0].b
|
2016-03-28 21:45:33 -07:00
|
|
|
d--
|
|
|
|
|
}
|
|
|
|
|
return nil // too far away
|
|
|
|
|
found:
|
|
|
|
|
// At this point, r is the first value in a that we find by walking backwards.
|
|
|
|
|
// if we return anything, r will be it.
|
|
|
|
|
r := b
|
|
|
|
|
|
|
|
|
|
// Keep going, counting the other a's that we find. They must all dominate r.
|
|
|
|
|
na := 0
|
|
|
|
|
for d > 0 {
|
|
|
|
|
for _, x := range a {
|
|
|
|
|
if b == x.Block {
|
|
|
|
|
na++
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
if na == len(a) {
|
|
|
|
|
// Found all of a in a backwards walk. We can return r.
|
|
|
|
|
return r
|
|
|
|
|
}
|
|
|
|
|
if len(b.Preds) > 1 {
|
|
|
|
|
return nil
|
|
|
|
|
}
|
2016-04-28 16:52:47 -07:00
|
|
|
b = b.Preds[0].b
|
2016-03-28 21:45:33 -07:00
|
|
|
d--
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
return nil // too far away
|
|
|
|
|
}
|
2016-04-20 15:02:48 -07:00
|
|
|
|
|
|
|
|
// clobber invalidates v. Returns true.
|
|
|
|
|
// clobber is used by rewrite rules to:
|
|
|
|
|
// A) make sure v is really dead and never used again.
|
|
|
|
|
// B) decrement use counts of v's args.
|
|
|
|
|
func clobber(v *Value) bool {
|
|
|
|
|
v.reset(OpInvalid)
|
|
|
|
|
// Note: leave v.Block intact. The Block field is used after clobber.
|
|
|
|
|
return true
|
|
|
|
|
}
|
2016-05-24 15:43:25 -07:00
|
|
|
|
2018-02-15 14:49:03 -05:00
|
|
|
// clobberIfDead resets v when use count is 1. Returns true.
|
|
|
|
|
// clobberIfDead is used by rewrite rules to decrement
|
|
|
|
|
// use counts of v's args when v is dead and never used.
|
|
|
|
|
func clobberIfDead(v *Value) bool {
|
|
|
|
|
if v.Uses == 1 {
|
|
|
|
|
v.reset(OpInvalid)
|
|
|
|
|
}
|
|
|
|
|
// Note: leave v.Block intact. The Block field is used after clobberIfDead.
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2016-09-16 15:02:47 -07:00
|
|
|
// noteRule is an easy way to track if a rule is matched when writing
|
|
|
|
|
// new ones. Make the rule of interest also conditional on
|
|
|
|
|
// noteRule("note to self: rule of interest matched")
|
|
|
|
|
// and that message will print when the rule matches.
|
|
|
|
|
func noteRule(s string) bool {
|
2016-10-25 05:45:52 -07:00
|
|
|
fmt.Println(s)
|
2016-09-16 15:02:47 -07:00
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2018-05-28 14:47:35 -07:00
|
|
|
// warnRule generates compiler debug output with string s when
|
|
|
|
|
// v is not in autogenerated code, cond is true and the rule has fired.
|
2016-09-28 10:20:24 -04:00
|
|
|
func warnRule(cond bool, v *Value, s string) bool {
|
2018-05-28 14:47:35 -07:00
|
|
|
if pos := v.Pos; pos.Line() > 1 && cond {
|
|
|
|
|
v.Block.Func.Warnl(pos, s)
|
2016-09-28 10:20:24 -04:00
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-13 22:36:47 +00:00
|
|
|
// for a pseudo-op like (LessThan x), extract x
|
|
|
|
|
func flagArg(v *Value) *Value {
|
|
|
|
|
if len(v.Args) != 1 || !v.Args[0].Type.IsFlags() {
|
|
|
|
|
return nil
|
|
|
|
|
}
|
|
|
|
|
return v.Args[0]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// arm64Negate finds the complement to an ARM64 condition code,
|
|
|
|
|
// for example Equal -> NotEqual or LessThan -> GreaterEqual
|
|
|
|
|
//
|
|
|
|
|
// TODO: add floating-point conditions
|
|
|
|
|
func arm64Negate(op Op) Op {
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan:
|
|
|
|
|
return OpARM64GreaterEqual
|
|
|
|
|
case OpARM64LessThanU:
|
|
|
|
|
return OpARM64GreaterEqualU
|
|
|
|
|
case OpARM64GreaterThan:
|
|
|
|
|
return OpARM64LessEqual
|
|
|
|
|
case OpARM64GreaterThanU:
|
|
|
|
|
return OpARM64LessEqualU
|
|
|
|
|
case OpARM64LessEqual:
|
|
|
|
|
return OpARM64GreaterThan
|
|
|
|
|
case OpARM64LessEqualU:
|
|
|
|
|
return OpARM64GreaterThanU
|
|
|
|
|
case OpARM64GreaterEqual:
|
|
|
|
|
return OpARM64LessThan
|
|
|
|
|
case OpARM64GreaterEqualU:
|
|
|
|
|
return OpARM64LessThanU
|
|
|
|
|
case OpARM64Equal:
|
|
|
|
|
return OpARM64NotEqual
|
|
|
|
|
case OpARM64NotEqual:
|
|
|
|
|
return OpARM64Equal
|
2019-03-11 03:51:06 +00:00
|
|
|
case OpARM64LessThanF:
|
|
|
|
|
return OpARM64GreaterEqualF
|
|
|
|
|
case OpARM64GreaterThanF:
|
|
|
|
|
return OpARM64LessEqualF
|
|
|
|
|
case OpARM64LessEqualF:
|
|
|
|
|
return OpARM64GreaterThanF
|
|
|
|
|
case OpARM64GreaterEqualF:
|
|
|
|
|
return OpARM64LessThanF
|
2017-08-13 22:36:47 +00:00
|
|
|
default:
|
|
|
|
|
panic("unreachable")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// arm64Invert evaluates (InvertFlags op), which
|
|
|
|
|
// is the same as altering the condition codes such
|
|
|
|
|
// that the same result would be produced if the arguments
|
|
|
|
|
// to the flag-generating instruction were reversed, e.g.
|
|
|
|
|
// (InvertFlags (CMP x y)) -> (CMP y x)
|
|
|
|
|
//
|
|
|
|
|
// TODO: add floating-point conditions
|
|
|
|
|
func arm64Invert(op Op) Op {
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan:
|
|
|
|
|
return OpARM64GreaterThan
|
|
|
|
|
case OpARM64LessThanU:
|
|
|
|
|
return OpARM64GreaterThanU
|
|
|
|
|
case OpARM64GreaterThan:
|
|
|
|
|
return OpARM64LessThan
|
|
|
|
|
case OpARM64GreaterThanU:
|
|
|
|
|
return OpARM64LessThanU
|
|
|
|
|
case OpARM64LessEqual:
|
|
|
|
|
return OpARM64GreaterEqual
|
|
|
|
|
case OpARM64LessEqualU:
|
|
|
|
|
return OpARM64GreaterEqualU
|
|
|
|
|
case OpARM64GreaterEqual:
|
|
|
|
|
return OpARM64LessEqual
|
|
|
|
|
case OpARM64GreaterEqualU:
|
|
|
|
|
return OpARM64LessEqualU
|
|
|
|
|
case OpARM64Equal, OpARM64NotEqual:
|
|
|
|
|
return op
|
2019-03-11 03:51:06 +00:00
|
|
|
case OpARM64LessThanF:
|
|
|
|
|
return OpARM64GreaterThanF
|
|
|
|
|
case OpARM64GreaterThanF:
|
|
|
|
|
return OpARM64LessThanF
|
|
|
|
|
case OpARM64LessEqualF:
|
|
|
|
|
return OpARM64GreaterEqualF
|
|
|
|
|
case OpARM64GreaterEqualF:
|
|
|
|
|
return OpARM64LessEqualF
|
2017-08-13 22:36:47 +00:00
|
|
|
default:
|
|
|
|
|
panic("unreachable")
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// evaluate an ARM64 op against a flags value
|
|
|
|
|
// that is potentially constant; return 1 for true,
|
|
|
|
|
// -1 for false, and 0 for not constant.
|
|
|
|
|
func ccARM64Eval(cc interface{}, flags *Value) int {
|
|
|
|
|
op := cc.(Op)
|
|
|
|
|
fop := flags.Op
|
|
|
|
|
switch fop {
|
|
|
|
|
case OpARM64InvertFlags:
|
|
|
|
|
return -ccARM64Eval(op, flags.Args[0])
|
|
|
|
|
case OpARM64FlagEQ:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64Equal, OpARM64GreaterEqual, OpARM64LessEqual,
|
|
|
|
|
OpARM64GreaterEqualU, OpARM64LessEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagLT_ULT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan, OpARM64LessThanU,
|
|
|
|
|
OpARM64LessEqual, OpARM64LessEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagLT_UGT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64LessThan, OpARM64GreaterThanU,
|
|
|
|
|
OpARM64LessEqual, OpARM64GreaterEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagGT_ULT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64GreaterThan, OpARM64LessThanU,
|
|
|
|
|
OpARM64GreaterEqual, OpARM64LessEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
case OpARM64FlagGT_UGT:
|
|
|
|
|
switch op {
|
|
|
|
|
case OpARM64GreaterThan, OpARM64GreaterThanU,
|
|
|
|
|
OpARM64GreaterEqual, OpARM64GreaterEqualU:
|
|
|
|
|
return 1
|
|
|
|
|
default:
|
|
|
|
|
return -1
|
|
|
|
|
}
|
|
|
|
|
default:
|
|
|
|
|
return 0
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2016-05-24 15:43:25 -07:00
|
|
|
// logRule logs the use of the rule s. This will only be enabled if
|
|
|
|
|
// rewrite rules were generated with the -log option, see gen/rulegen.go.
|
|
|
|
|
func logRule(s string) {
|
|
|
|
|
if ruleFile == nil {
|
|
|
|
|
// Open a log file to write log to. We open in append
|
|
|
|
|
// mode because all.bash runs the compiler lots of times,
|
|
|
|
|
// and we want the concatenation of all of those logs.
|
|
|
|
|
// This means, of course, that users need to rm the old log
|
|
|
|
|
// to get fresh data.
|
|
|
|
|
// TODO: all.bash runs compilers in parallel. Need to synchronize logging somehow?
|
|
|
|
|
w, err := os.OpenFile(filepath.Join(os.Getenv("GOROOT"), "src", "rulelog"),
|
|
|
|
|
os.O_CREATE|os.O_WRONLY|os.O_APPEND, 0666)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
ruleFile = w
|
|
|
|
|
}
|
|
|
|
|
_, err := fmt.Fprintf(ruleFile, "rewrite %s\n", s)
|
|
|
|
|
if err != nil {
|
|
|
|
|
panic(err)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2017-04-22 18:59:11 -07:00
|
|
|
var ruleFile io.Writer
|
2016-12-08 16:17:20 -08:00
|
|
|
|
|
|
|
|
func min(x, y int64) int64 {
|
|
|
|
|
if x < y {
|
|
|
|
|
return x
|
|
|
|
|
}
|
|
|
|
|
return y
|
|
|
|
|
}
|
2017-02-13 16:00:09 -08:00
|
|
|
|
2017-02-03 16:18:01 -05:00
|
|
|
func isConstZero(v *Value) bool {
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case OpConstNil:
|
|
|
|
|
return true
|
|
|
|
|
case OpConst64, OpConst32, OpConst16, OpConst8, OpConstBool, OpConst32F, OpConst64F:
|
|
|
|
|
return v.AuxInt == 0
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-04-03 10:17:48 -07:00
|
|
|
|
|
|
|
|
// reciprocalExact64 reports whether 1/c is exactly representable.
|
|
|
|
|
func reciprocalExact64(c float64) bool {
|
|
|
|
|
b := math.Float64bits(c)
|
|
|
|
|
man := b & (1<<52 - 1)
|
|
|
|
|
if man != 0 {
|
|
|
|
|
return false // not a power of 2, denormal, or NaN
|
|
|
|
|
}
|
|
|
|
|
exp := b >> 52 & (1<<11 - 1)
|
|
|
|
|
// exponent bias is 0x3ff. So taking the reciprocal of a number
|
|
|
|
|
// changes the exponent to 0x7fe-exp.
|
|
|
|
|
switch exp {
|
|
|
|
|
case 0:
|
|
|
|
|
return false // ±0
|
|
|
|
|
case 0x7ff:
|
|
|
|
|
return false // ±inf
|
|
|
|
|
case 0x7fe:
|
|
|
|
|
return false // exponent is not representable
|
|
|
|
|
default:
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// reciprocalExact32 reports whether 1/c is exactly representable.
|
|
|
|
|
func reciprocalExact32(c float32) bool {
|
|
|
|
|
b := math.Float32bits(c)
|
|
|
|
|
man := b & (1<<23 - 1)
|
|
|
|
|
if man != 0 {
|
|
|
|
|
return false // not a power of 2, denormal, or NaN
|
|
|
|
|
}
|
|
|
|
|
exp := b >> 23 & (1<<8 - 1)
|
|
|
|
|
// exponent bias is 0x7f. So taking the reciprocal of a number
|
|
|
|
|
// changes the exponent to 0xfe-exp.
|
|
|
|
|
switch exp {
|
|
|
|
|
case 0:
|
|
|
|
|
return false // ±0
|
|
|
|
|
case 0xff:
|
|
|
|
|
return false // ±inf
|
|
|
|
|
case 0xfe:
|
|
|
|
|
return false // exponent is not representable
|
|
|
|
|
default:
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
}
|
2017-04-25 10:53:10 +00:00
|
|
|
|
|
|
|
|
// check if an immediate can be directly encoded into an ARM's instruction
|
|
|
|
|
func isARMImmRot(v uint32) bool {
|
|
|
|
|
for i := 0; i < 16; i++ {
|
|
|
|
|
if v&^0xff == 0 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
v = v<<2 | v>>30
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
return false
|
|
|
|
|
}
|
cmd/compile: add generic rules to eliminate some unnecessary stores
Eliminates stores of values that have just been loaded from the same
location. Handles the common case where there are up to 3 intermediate
stores to non-overlapping struct fields.
For example the loads and stores of x.a, x.b and x.d in the following
function are now removed:
type T struct {
a, b, c, d int
}
func f(x *T) {
y := *x
y.c += 8
*x = y
}
Before this CL (s390x):
TEXT "".f(SB)
MOVD "".x(R15), R5
MOVD (R5), R1
MOVD 8(R5), R2
MOVD 16(R5), R0
MOVD 24(R5), R4
ADD $8, R0, R3
STMG R1, R4, (R5)
RET
After this CL (s390x):
TEXT "".f(SB)
MOVD "".x(R15), R1
MOVD 16(R1), R0
ADD $8, R0, R0
MOVD R0, 16(R1)
RET
In total these rules are triggered ~5091 times during all.bash,
which is broken down as:
Intermediate stores | Triggered
--------------------+----------
0 | 1434
1 | 2508
2 | 888
3 | 261
--------------------+----------
Change-Id: Ia4721ae40146aceec1fdd3e65b0e9283770bfba5
Reviewed-on: https://go-review.googlesource.com/38793
Run-TryBot: Michael Munday <munday@ca.ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-29 16:37:12 -04:00
|
|
|
|
|
|
|
|
// overlap reports whether the ranges given by the given offset and
|
|
|
|
|
// size pairs overlap.
|
|
|
|
|
func overlap(offset1, size1, offset2, size2 int64) bool {
|
|
|
|
|
if offset1 >= offset2 && offset2+size2 > offset1 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if offset2 >= offset1 && offset1+size1 > offset2 {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-08-23 11:08:56 -05:00
|
|
|
|
2018-02-23 15:17:54 -05:00
|
|
|
func areAdjacentOffsets(off1, off2, size int64) bool {
|
|
|
|
|
return off1+size == off2 || off1 == off2+size
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-23 11:08:56 -05:00
|
|
|
// check if value zeroes out upper 32-bit of 64-bit register.
|
|
|
|
|
// depth limits recursion depth. In AMD64.rules 3 is used as limit,
|
|
|
|
|
// because it catches same amount of cases as 4.
|
|
|
|
|
func zeroUpper32Bits(x *Value, depth int) bool {
|
|
|
|
|
switch x.Op {
|
|
|
|
|
case OpAMD64MOVLconst, OpAMD64MOVLload, OpAMD64MOVLQZX, OpAMD64MOVLloadidx1,
|
|
|
|
|
OpAMD64MOVWload, OpAMD64MOVWloadidx1, OpAMD64MOVBload, OpAMD64MOVBloadidx1,
|
2018-05-08 09:11:00 -07:00
|
|
|
OpAMD64MOVLloadidx4, OpAMD64ADDLload, OpAMD64SUBLload, OpAMD64ANDLload,
|
|
|
|
|
OpAMD64ORLload, OpAMD64XORLload, OpAMD64CVTTSD2SL,
|
2017-08-23 11:08:56 -05:00
|
|
|
OpAMD64ADDL, OpAMD64ADDLconst, OpAMD64SUBL, OpAMD64SUBLconst,
|
|
|
|
|
OpAMD64ANDL, OpAMD64ANDLconst, OpAMD64ORL, OpAMD64ORLconst,
|
|
|
|
|
OpAMD64XORL, OpAMD64XORLconst, OpAMD64NEGL, OpAMD64NOTL:
|
|
|
|
|
return true
|
2018-02-26 14:45:58 -06:00
|
|
|
case OpArg:
|
2017-08-23 11:08:56 -05:00
|
|
|
return x.Type.Width == 4
|
2018-02-26 14:45:58 -06:00
|
|
|
case OpPhi, OpSelect0, OpSelect1:
|
2017-08-23 11:08:56 -05:00
|
|
|
// Phis can use each-other as an arguments, instead of tracking visited values,
|
|
|
|
|
// just limit recursion depth.
|
|
|
|
|
if depth <= 0 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
for i := range x.Args {
|
|
|
|
|
if !zeroUpper32Bits(x.Args[i], depth-1) {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2017-08-09 14:00:38 -05:00
|
|
|
|
2018-05-31 16:38:18 -05:00
|
|
|
// zeroUpper48Bits is similar to zeroUpper32Bits, but for upper 48 bits
|
|
|
|
|
func zeroUpper48Bits(x *Value, depth int) bool {
|
|
|
|
|
switch x.Op {
|
|
|
|
|
case OpAMD64MOVWQZX, OpAMD64MOVWload, OpAMD64MOVWloadidx1, OpAMD64MOVWloadidx2:
|
|
|
|
|
return true
|
|
|
|
|
case OpArg:
|
|
|
|
|
return x.Type.Width == 2
|
|
|
|
|
case OpPhi, OpSelect0, OpSelect1:
|
|
|
|
|
// Phis can use each-other as an arguments, instead of tracking visited values,
|
|
|
|
|
// just limit recursion depth.
|
|
|
|
|
if depth <= 0 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
for i := range x.Args {
|
|
|
|
|
if !zeroUpper48Bits(x.Args[i], depth-1) {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// zeroUpper56Bits is similar to zeroUpper32Bits, but for upper 56 bits
|
|
|
|
|
func zeroUpper56Bits(x *Value, depth int) bool {
|
|
|
|
|
switch x.Op {
|
|
|
|
|
case OpAMD64MOVBQZX, OpAMD64MOVBload, OpAMD64MOVBloadidx1:
|
|
|
|
|
return true
|
|
|
|
|
case OpArg:
|
|
|
|
|
return x.Type.Width == 1
|
|
|
|
|
case OpPhi, OpSelect0, OpSelect1:
|
|
|
|
|
// Phis can use each-other as an arguments, instead of tracking visited values,
|
|
|
|
|
// just limit recursion depth.
|
|
|
|
|
if depth <= 0 {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
for i := range x.Args {
|
|
|
|
|
if !zeroUpper56Bits(x.Args[i], depth-1) {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
|
2018-04-29 15:12:50 +01:00
|
|
|
// isInlinableMemmove reports whether the given arch performs a Move of the given size
|
|
|
|
|
// faster than memmove. It will only return true if replacing the memmove with a Move is
|
|
|
|
|
// safe, either because Move is small or because the arguments are disjoint.
|
|
|
|
|
// This is used as a check for replacing memmove with Move ops.
|
|
|
|
|
func isInlinableMemmove(dst, src *Value, sz int64, c *Config) bool {
|
|
|
|
|
// It is always safe to convert memmove into Move when its arguments are disjoint.
|
|
|
|
|
// Move ops may or may not be faster for large sizes depending on how the platform
|
|
|
|
|
// lowers them, so we only perform this optimization on platforms that we know to
|
|
|
|
|
// have fast Move ops.
|
2017-08-09 14:00:38 -05:00
|
|
|
switch c.arch {
|
|
|
|
|
case "amd64", "amd64p32":
|
2018-05-09 15:49:22 -05:00
|
|
|
return sz <= 16 || (sz < 1024 && disjoint(dst, sz, src, sz))
|
2018-04-29 15:12:50 +01:00
|
|
|
case "386", "ppc64", "ppc64le", "arm64":
|
2017-08-09 14:00:38 -05:00
|
|
|
return sz <= 8
|
2018-04-29 15:12:50 +01:00
|
|
|
case "s390x":
|
|
|
|
|
return sz <= 8 || disjoint(dst, sz, src, sz)
|
2017-08-09 14:00:38 -05:00
|
|
|
case "arm", "mips", "mips64", "mipsle", "mips64le":
|
|
|
|
|
return sz <= 4
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3, #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-21 16:15:39 -05:00
|
|
|
|
2019-02-11 09:40:02 +00:00
|
|
|
// encodes the lsb and width for arm(64) bitfield ops into the expected auxInt format.
|
|
|
|
|
func armBFAuxInt(lsb, width int64) int64 {
|
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3, #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-21 16:15:39 -05:00
|
|
|
if lsb < 0 || lsb > 63 {
|
2019-02-11 09:40:02 +00:00
|
|
|
panic("ARM(64) bit field lsb constant out of range")
|
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3, #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-21 16:15:39 -05:00
|
|
|
}
|
|
|
|
|
if width < 1 || width > 64 {
|
2019-02-11 09:40:02 +00:00
|
|
|
panic("ARM(64) bit field width constant out of range")
|
cmd/compile/internal/ssa: add patterns for arm64 bitfield opcodes
Add patterns to match common idioms for EXTR, BFI, BFXIL, SBFIZ, SBFX,
UBFIZ and UBFX opcodes.
go1 benchmarks results on Amberwing:
name old time/op new time/op delta
FmtManyArgs 786ns ± 2% 714ns ± 1% -9.20% (p=0.000 n=10+10)
Gzip 437ms ± 0% 402ms ± 0% -7.99% (p=0.000 n=10+10)
FmtFprintfIntInt 196ns ± 0% 182ns ± 0% -7.28% (p=0.000 n=10+9)
FmtFprintfPrefixedInt 207ns ± 0% 199ns ± 0% -3.86% (p=0.000 n=10+10)
FmtFprintfFloat 324ns ± 0% 316ns ± 0% -2.47% (p=0.000 n=10+8)
FmtFprintfInt 119ns ± 0% 117ns ± 0% -1.68% (p=0.000 n=10+9)
GobDecode 12.8ms ± 2% 12.6ms ± 1% -1.62% (p=0.002 n=10+10)
JSONDecode 94.4ms ± 1% 93.4ms ± 0% -1.10% (p=0.000 n=10+10)
RegexpMatchEasy0_32 247ns ± 0% 245ns ± 0% -0.65% (p=0.000 n=10+10)
RegexpMatchMedium_32 314ns ± 0% 312ns ± 0% -0.64% (p=0.000 n=10+10)
RegexpMatchEasy0_1K 541ns ± 0% 538ns ± 0% -0.55% (p=0.000 n=10+9)
TimeParse 450ns ± 1% 448ns ± 1% -0.42% (p=0.035 n=9+9)
RegexpMatchEasy1_32 244ns ± 0% 243ns ± 0% -0.41% (p=0.000 n=10+10)
GoParse 6.03ms ± 0% 6.00ms ± 0% -0.40% (p=0.002 n=10+10)
RegexpMatchEasy1_1K 779ns ± 0% 777ns ± 0% -0.26% (p=0.000 n=10+10)
RegexpMatchHard_32 2.75µs ± 0% 2.74µs ± 1% -0.06% (p=0.026 n=9+9)
BinaryTree17 11.7s ± 0% 11.6s ± 0% ~ (p=0.089 n=10+10)
HTTPClientServer 89.1µs ± 1% 89.5µs ± 2% ~ (p=0.436 n=10+10)
RegexpMatchHard_1K 78.9µs ± 0% 79.5µs ± 2% ~ (p=0.469 n=10+10)
FmtFprintfEmpty 58.5ns ± 0% 58.5ns ± 0% ~ (all equal)
GobEncode 12.0ms ± 1% 12.1ms ± 0% ~ (p=0.075 n=10+10)
Revcomp 669ms ± 0% 668ms ± 0% ~ (p=0.091 n=7+9)
Mandelbrot200 5.35ms ± 0% 5.36ms ± 0% +0.07% (p=0.000 n=9+9)
RegexpMatchMedium_1K 52.1µs ± 0% 52.1µs ± 0% +0.10% (p=0.000 n=9+9)
Fannkuch11 3.25s ± 0% 3.26s ± 0% +0.36% (p=0.000 n=9+10)
FmtFprintfString 114ns ± 1% 115ns ± 0% +0.52% (p=0.011 n=10+10)
JSONEncode 20.2ms ± 0% 20.3ms ± 0% +0.65% (p=0.000 n=10+10)
Template 91.3ms ± 0% 92.3ms ± 0% +1.08% (p=0.000 n=10+10)
TimeFormat 484ns ± 0% 495ns ± 1% +2.30% (p=0.000 n=9+10)
There are some opportunities to improve this change further by adding
patterns to match the "extended register" versions of ADD/SUB/CMP, but I
think that should be evaluated on its own. The regressions in Template
and TimeFormat would likely be recovered by this, as they seem to be due
to generating:
ubfiz x0, x0, #3, #8
add x1, x2, x0
instead of
add x1, x2, x0, lsl #3
Change-Id: I5644a8d70ac7a98e784a377a2b76ab47a3415a4b
Reviewed-on: https://go-review.googlesource.com/88355
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2018-02-21 16:15:39 -05:00
|
|
|
}
|
|
|
|
|
return width | lsb<<8
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// returns the lsb part of the auxInt field of arm64 bitfield ops.
|
|
|
|
|
func getARM64BFlsb(bfc int64) int64 {
|
|
|
|
|
return int64(uint64(bfc) >> 8)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// returns the width part of the auxInt field of arm64 bitfield ops.
|
|
|
|
|
func getARM64BFwidth(bfc int64) int64 {
|
|
|
|
|
return bfc & 0xff
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// checks if mask >> rshift applied at lsb is a valid arm64 bitfield op mask.
|
|
|
|
|
func isARM64BFMask(lsb, mask, rshift int64) bool {
|
|
|
|
|
shiftedMask := int64(uint64(mask) >> uint64(rshift))
|
|
|
|
|
return shiftedMask != 0 && isPowerOfTwo(shiftedMask+1) && nto(shiftedMask)+lsb < 64
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// returns the bitfield width of mask >> rshift for arm64 bitfield ops
|
|
|
|
|
func arm64BFWidth(mask, rshift int64) int64 {
|
|
|
|
|
shiftedMask := int64(uint64(mask) >> uint64(rshift))
|
|
|
|
|
if shiftedMask == 0 {
|
|
|
|
|
panic("ARM64 BF mask is zero")
|
|
|
|
|
}
|
|
|
|
|
return nto(shiftedMask)
|
|
|
|
|
}
|
2018-04-11 22:47:24 +01:00
|
|
|
|
|
|
|
|
// sizeof returns the size of t in bytes.
|
|
|
|
|
// It will panic if t is not a *types.Type.
|
|
|
|
|
func sizeof(t interface{}) int64 {
|
|
|
|
|
return t.(*types.Type).Size()
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// alignof returns the alignment of t in bytes.
|
|
|
|
|
// It will panic if t is not a *types.Type.
|
|
|
|
|
func alignof(t interface{}) int64 {
|
|
|
|
|
return t.(*types.Type).Alignment()
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// registerizable reports whether t is a primitive type that fits in
|
|
|
|
|
// a register. It assumes float64 values will always fit into registers
|
|
|
|
|
// even if that isn't strictly true.
|
|
|
|
|
// It will panic if t is not a *types.Type.
|
|
|
|
|
func registerizable(b *Block, t interface{}) bool {
|
|
|
|
|
typ := t.(*types.Type)
|
|
|
|
|
if typ.IsPtrShaped() || typ.IsFloat() {
|
|
|
|
|
return true
|
|
|
|
|
}
|
|
|
|
|
if typ.IsInteger() {
|
|
|
|
|
return typ.Size() <= b.Func.Config.RegSize
|
|
|
|
|
}
|
|
|
|
|
return false
|
|
|
|
|
}
|
2018-06-27 11:40:24 -05:00
|
|
|
|
|
|
|
|
// needRaceCleanup reports whether this call to racefuncenter/exit isn't needed.
|
|
|
|
|
func needRaceCleanup(sym interface{}, v *Value) bool {
|
|
|
|
|
f := v.Block.Func
|
|
|
|
|
if !f.Config.Race {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
if !isSameSym(sym, "runtime.racefuncenter") && !isSameSym(sym, "runtime.racefuncexit") {
|
|
|
|
|
return false
|
|
|
|
|
}
|
|
|
|
|
for _, b := range f.Blocks {
|
|
|
|
|
for _, v := range b.Values {
|
2018-12-28 12:43:48 -08:00
|
|
|
switch v.Op {
|
|
|
|
|
case OpStaticCall:
|
2018-06-27 11:40:24 -05:00
|
|
|
// Check for racefuncenter will encounter racefuncexit and vice versa.
|
|
|
|
|
// Allow calls to panic*
|
2019-04-03 13:16:58 -07:00
|
|
|
s := v.Aux.(fmt.Stringer).String()
|
|
|
|
|
switch s {
|
|
|
|
|
case "runtime.racefuncenter", "runtime.racefuncexit",
|
|
|
|
|
"runtime.panicdivide", "runtime.panicwrap",
|
|
|
|
|
"runtime.panicshift":
|
|
|
|
|
continue
|
2018-06-27 11:40:24 -05:00
|
|
|
}
|
2019-04-03 13:16:58 -07:00
|
|
|
// If we encountered any call, we need to keep racefunc*,
|
|
|
|
|
// for accurate stacktraces.
|
|
|
|
|
return false
|
|
|
|
|
case OpPanicBounds, OpPanicExtend:
|
|
|
|
|
// Note: these are panic generators that are ok (like the static calls above).
|
2018-12-28 12:43:48 -08:00
|
|
|
case OpClosureCall, OpInterCall:
|
|
|
|
|
// We must keep the race functions if there are any other call types.
|
|
|
|
|
return false
|
2018-06-27 11:40:24 -05:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
return true
|
|
|
|
|
}
|
2018-10-09 22:55:36 -07:00
|
|
|
|
|
|
|
|
// symIsRO reports whether sym is a read-only global.
|
|
|
|
|
func symIsRO(sym interface{}) bool {
|
|
|
|
|
lsym := sym.(*obj.LSym)
|
|
|
|
|
return lsym.Type == objabi.SRODATA && len(lsym.R) == 0
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// read8 reads one byte from the read-only global sym at offset off.
|
|
|
|
|
func read8(sym interface{}, off int64) uint8 {
|
|
|
|
|
lsym := sym.(*obj.LSym)
|
2019-02-15 15:01:29 -05:00
|
|
|
if off >= int64(len(lsym.P)) || off < 0 {
|
2018-12-13 09:31:21 -08:00
|
|
|
// Invalid index into the global sym.
|
|
|
|
|
// This can happen in dead code, so we don't want to panic.
|
|
|
|
|
// Just return any value, it will eventually get ignored.
|
|
|
|
|
// See issue 29215.
|
|
|
|
|
return 0
|
|
|
|
|
}
|
2018-10-09 22:55:36 -07:00
|
|
|
return lsym.P[off]
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// read16 reads two bytes from the read-only global sym at offset off.
|
|
|
|
|
func read16(sym interface{}, off int64, bigEndian bool) uint16 {
|
|
|
|
|
lsym := sym.(*obj.LSym)
|
2019-02-15 15:01:29 -05:00
|
|
|
if off >= int64(len(lsym.P))-1 || off < 0 {
|
2018-12-13 09:31:21 -08:00
|
|
|
return 0
|
|
|
|
|
}
|
2018-10-09 22:55:36 -07:00
|
|
|
if bigEndian {
|
|
|
|
|
return binary.BigEndian.Uint16(lsym.P[off:])
|
|
|
|
|
} else {
|
|
|
|
|
return binary.LittleEndian.Uint16(lsym.P[off:])
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// read32 reads four bytes from the read-only global sym at offset off.
|
|
|
|
|
func read32(sym interface{}, off int64, bigEndian bool) uint32 {
|
|
|
|
|
lsym := sym.(*obj.LSym)
|
2019-02-15 15:01:29 -05:00
|
|
|
if off >= int64(len(lsym.P))-3 || off < 0 {
|
2018-12-13 09:31:21 -08:00
|
|
|
return 0
|
|
|
|
|
}
|
2018-10-09 22:55:36 -07:00
|
|
|
if bigEndian {
|
|
|
|
|
return binary.BigEndian.Uint32(lsym.P[off:])
|
|
|
|
|
} else {
|
|
|
|
|
return binary.LittleEndian.Uint32(lsym.P[off:])
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// read64 reads eight bytes from the read-only global sym at offset off.
|
|
|
|
|
func read64(sym interface{}, off int64, bigEndian bool) uint64 {
|
|
|
|
|
lsym := sym.(*obj.LSym)
|
2019-02-15 15:01:29 -05:00
|
|
|
if off >= int64(len(lsym.P))-7 || off < 0 {
|
2018-12-13 09:31:21 -08:00
|
|
|
return 0
|
|
|
|
|
}
|
2018-10-09 22:55:36 -07:00
|
|
|
if bigEndian {
|
|
|
|
|
return binary.BigEndian.Uint64(lsym.P[off:])
|
|
|
|
|
} else {
|
|
|
|
|
return binary.LittleEndian.Uint64(lsym.P[off:])
|
|
|
|
|
}
|
|
|
|
|
}
|