2016-03-12 14:07:40 -08:00
|
|
|
// Copyright 2016 The Go Authors. All rights reserved.
|
|
|
|
|
// Use of this source code is governed by a BSD-style
|
|
|
|
|
// license that can be found in the LICENSE file.
|
|
|
|
|
|
|
|
|
|
package amd64
|
|
|
|
|
|
|
|
|
|
import (
|
|
|
|
|
"fmt"
|
|
|
|
|
"math"
|
|
|
|
|
|
2020-11-19 20:49:23 -05:00
|
|
|
"cmd/compile/internal/base"
|
[dev.regabi] cmd/compile: group known symbols, packages, names [generated]
There are a handful of pre-computed magic symbols known by
package gc, and we need a place to store them.
If we keep them together, the need for type *ir.Name means that
package ir is the lowest package in the import hierarchy that they
can go in. And package ir needs gopkg for methodSymSuffix
(in a later CL), so they can't go any higher either, at least not all together.
So package ir it is.
Rather than dump them all into the top-level package ir
namespace, however, we introduce global structs, Syms, Pkgs, and Names,
and make the known symbols, packages, and names fields of those.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add go.go:$ \
// Names holds known names. \
var Names struct{} \
\
// Syms holds known symbols. \
var Syms struct {} \
\
// Pkgs holds known packages. \
var Pkgs struct {} \
mv staticuint64s Names.Staticuint64s
mv zerobase Names.Zerobase
mv assertE2I Syms.AssertE2I
mv assertE2I2 Syms.AssertE2I2
mv assertI2I Syms.AssertI2I
mv assertI2I2 Syms.AssertI2I2
mv deferproc Syms.Deferproc
mv deferprocStack Syms.DeferprocStack
mv Deferreturn Syms.Deferreturn
mv Duffcopy Syms.Duffcopy
mv Duffzero Syms.Duffzero
mv gcWriteBarrier Syms.GCWriteBarrier
mv goschedguarded Syms.Goschedguarded
mv growslice Syms.Growslice
mv msanread Syms.Msanread
mv msanwrite Syms.Msanwrite
mv msanmove Syms.Msanmove
mv newobject Syms.Newobject
mv newproc Syms.Newproc
mv panicdivide Syms.Panicdivide
mv panicshift Syms.Panicshift
mv panicdottypeE Syms.PanicdottypeE
mv panicdottypeI Syms.PanicdottypeI
mv panicnildottype Syms.Panicnildottype
mv panicoverflow Syms.Panicoverflow
mv raceread Syms.Raceread
mv racereadrange Syms.Racereadrange
mv racewrite Syms.Racewrite
mv racewriterange Syms.Racewriterange
mv SigPanic Syms.SigPanic
mv typedmemclr Syms.Typedmemclr
mv typedmemmove Syms.Typedmemmove
mv Udiv Syms.Udiv
mv writeBarrier Syms.WriteBarrier
mv zerobaseSym Syms.Zerobase
mv arm64HasATOMICS Syms.ARM64HasATOMICS
mv armHasVFPv4 Syms.ARMHasVFPv4
mv x86HasFMA Syms.X86HasFMA
mv x86HasPOPCNT Syms.X86HasPOPCNT
mv x86HasSSE41 Syms.X86HasSSE41
mv WasmDiv Syms.WasmDiv
mv WasmMove Syms.WasmMove
mv WasmZero Syms.WasmZero
mv WasmTruncS Syms.WasmTruncS
mv WasmTruncU Syms.WasmTruncU
mv gopkg Pkgs.Go
mv itabpkg Pkgs.Itab
mv itablinkpkg Pkgs.Itablink
mv mappkg Pkgs.Map
mv msanpkg Pkgs.Msan
mv racepkg Pkgs.Race
mv Runtimepkg Pkgs.Runtime
mv trackpkg Pkgs.Track
mv unsafepkg Pkgs.Unsafe
mv Names Syms Pkgs symtab.go
mv symtab.go cmd/compile/internal/ir
'
Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/279235
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:10:25 -05:00
|
|
|
"cmd/compile/internal/ir"
|
2019-10-29 14:24:43 -04:00
|
|
|
"cmd/compile/internal/logopt"
|
2021-04-11 12:42:49 -04:00
|
|
|
"cmd/compile/internal/objw"
|
2016-03-12 14:07:40 -08:00
|
|
|
"cmd/compile/internal/ssa"
|
2020-12-23 00:57:10 -05:00
|
|
|
"cmd/compile/internal/ssagen"
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
"cmd/compile/internal/types"
|
2016-03-12 14:07:40 -08:00
|
|
|
"cmd/internal/obj"
|
|
|
|
|
"cmd/internal/obj/x86"
|
2025-06-18 15:14:00 -07:00
|
|
|
"internal/abi"
|
2016-03-12 14:07:40 -08:00
|
|
|
)
|
|
|
|
|
|
2022-11-14 20:13:10 +08:00
|
|
|
// ssaMarkMoves marks any MOVXconst ops that need to avoid clobbering flags.
|
2020-12-23 00:57:10 -05:00
|
|
|
func ssaMarkMoves(s *ssagen.State, b *ssa.Block) {
|
2016-03-12 14:07:40 -08:00
|
|
|
flive := b.FlagsLiveAtEnd
|
2019-08-12 20:19:58 +01:00
|
|
|
for _, c := range b.ControlValues() {
|
|
|
|
|
flive = c.Type.IsFlags() || flive
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
for i := len(b.Values) - 1; i >= 0; i-- {
|
|
|
|
|
v := b.Values[i]
|
2016-04-22 13:09:18 -07:00
|
|
|
if flive && (v.Op == ssa.OpAMD64MOVLconst || v.Op == ssa.OpAMD64MOVQconst) {
|
2016-03-12 14:07:40 -08:00
|
|
|
// The "mark" is any non-nil Aux value.
|
2023-01-10 08:36:00 +01:00
|
|
|
v.Aux = ssa.AuxMark
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
if v.Type.IsFlags() {
|
|
|
|
|
flive = false
|
|
|
|
|
}
|
|
|
|
|
for _, a := range v.Args {
|
|
|
|
|
if a.Type.IsFlags() {
|
|
|
|
|
flive = true
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// loadByType returns the load instruction of the given type.
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func loadByType(t *types.Type) obj.As {
|
2016-03-12 14:07:40 -08:00
|
|
|
// Avoid partial register write
|
2020-10-30 11:55:18 +01:00
|
|
|
if !t.IsFloat() {
|
|
|
|
|
switch t.Size() {
|
|
|
|
|
case 1:
|
2016-03-12 14:07:40 -08:00
|
|
|
return x86.AMOVBLZX
|
2020-10-30 11:55:18 +01:00
|
|
|
case 2:
|
2016-03-12 14:07:40 -08:00
|
|
|
return x86.AMOVWLZX
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
// Otherwise, there's no difference between load and store opcodes.
|
|
|
|
|
return storeByType(t)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// storeByType returns the store instruction of the given type.
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func storeByType(t *types.Type) obj.As {
|
2017-04-28 00:19:49 +00:00
|
|
|
width := t.Size()
|
2016-03-12 14:07:40 -08:00
|
|
|
if t.IsFloat() {
|
|
|
|
|
switch width {
|
|
|
|
|
case 4:
|
|
|
|
|
return x86.AMOVSS
|
|
|
|
|
case 8:
|
|
|
|
|
return x86.AMOVSD
|
|
|
|
|
}
|
|
|
|
|
} else {
|
|
|
|
|
switch width {
|
|
|
|
|
case 1:
|
|
|
|
|
return x86.AMOVB
|
|
|
|
|
case 2:
|
|
|
|
|
return x86.AMOVW
|
|
|
|
|
case 4:
|
|
|
|
|
return x86.AMOVL
|
|
|
|
|
case 8:
|
|
|
|
|
return x86.AMOVQ
|
2022-06-20 17:06:09 -07:00
|
|
|
case 16:
|
|
|
|
|
return x86.AMOVUPS
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
}
|
2020-11-13 16:54:48 -05:00
|
|
|
panic(fmt.Sprintf("bad store type %v", t))
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// moveByType returns the reg->reg move instruction of the given type.
|
cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.
In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.
Though this is a big CL, most of the changes are
mechanical and uninteresting.
Interesting bits:
* Add new singleton globals to package types for the special
SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
and TTUPLE, for SSA tuple types.
ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
to package types.
* We had picked the name "types" in our rules for the handy
list of types provided by ssa.Config. That conflicted with
the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
and probably also some other duplicated Type methods
designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
and they were not particularly careful about types in general.
Of necessity, this CL switches them to use *types.Type;
it does not make them more type-accurate.
Unfortunately, using types.Type means initializing a bit
of the types universe.
This is prime for refactoring and improvement.
This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.
name old alloc/op new alloc/op delta
Template 37.9MB ± 0% 37.7MB ± 0% -0.57% (p=0.000 n=10+8)
Unicode 28.9MB ± 0% 28.7MB ± 0% -0.52% (p=0.000 n=10+10)
GoTypes 110MB ± 0% 109MB ± 0% -0.88% (p=0.000 n=10+10)
Flate 24.7MB ± 0% 24.6MB ± 0% -0.66% (p=0.000 n=10+10)
GoParser 31.1MB ± 0% 30.9MB ± 0% -0.61% (p=0.000 n=10+9)
Reflect 73.9MB ± 0% 73.4MB ± 0% -0.62% (p=0.000 n=10+8)
Tar 25.8MB ± 0% 25.6MB ± 0% -0.77% (p=0.000 n=9+10)
XML 41.2MB ± 0% 40.9MB ± 0% -0.80% (p=0.000 n=10+10)
[Geo mean] 40.5MB 40.3MB -0.68%
name old allocs/op new allocs/op delta
Template 385k ± 0% 386k ± 0% ~ (p=0.356 n=10+9)
Unicode 343k ± 1% 344k ± 0% ~ (p=0.481 n=10+10)
GoTypes 1.16M ± 0% 1.16M ± 0% -0.16% (p=0.004 n=10+10)
Flate 238k ± 1% 238k ± 1% ~ (p=0.853 n=10+10)
GoParser 320k ± 0% 320k ± 0% ~ (p=0.720 n=10+9)
Reflect 957k ± 0% 957k ± 0% ~ (p=0.460 n=10+8)
Tar 252k ± 0% 252k ± 0% ~ (p=0.133 n=9+10)
XML 400k ± 0% 400k ± 0% ~ (p=0.796 n=10+10)
[Geo mean] 428k 428k -0.01%
Removing all the interface calls helps non-trivially with CPU, though.
name old time/op new time/op delta
Template 178ms ± 4% 173ms ± 3% -2.90% (p=0.000 n=94+96)
Unicode 85.0ms ± 4% 83.9ms ± 4% -1.23% (p=0.000 n=96+96)
GoTypes 543ms ± 3% 528ms ± 3% -2.73% (p=0.000 n=98+96)
Flate 116ms ± 3% 113ms ± 4% -2.34% (p=0.000 n=96+99)
GoParser 144ms ± 3% 140ms ± 4% -2.80% (p=0.000 n=99+97)
Reflect 344ms ± 3% 334ms ± 4% -3.02% (p=0.000 n=100+99)
Tar 106ms ± 5% 103ms ± 4% -3.30% (p=0.000 n=98+94)
XML 198ms ± 5% 192ms ± 4% -2.88% (p=0.000 n=92+95)
[Geo mean] 178ms 173ms -2.65%
name old user-time/op new user-time/op delta
Template 229ms ± 5% 224ms ± 5% -2.36% (p=0.000 n=95+99)
Unicode 107ms ± 6% 106ms ± 5% -1.13% (p=0.001 n=93+95)
GoTypes 696ms ± 4% 679ms ± 4% -2.45% (p=0.000 n=97+99)
Flate 137ms ± 4% 134ms ± 5% -2.66% (p=0.000 n=99+96)
GoParser 176ms ± 5% 172ms ± 8% -2.27% (p=0.000 n=98+100)
Reflect 430ms ± 6% 411ms ± 5% -4.46% (p=0.000 n=100+92)
Tar 128ms ±13% 123ms ±13% -4.21% (p=0.000 n=100+100)
XML 239ms ± 6% 233ms ± 6% -2.50% (p=0.000 n=95+97)
[Geo mean] 220ms 213ms -2.76%
Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-04-28 14:12:28 -07:00
|
|
|
func moveByType(t *types.Type) obj.As {
|
2016-03-12 14:07:40 -08:00
|
|
|
if t.IsFloat() {
|
|
|
|
|
// Moving the whole sse2 register is faster
|
|
|
|
|
// than moving just the correct low portion of it.
|
|
|
|
|
// There is no xmm->xmm move with 1 byte opcode,
|
|
|
|
|
// so use movups, which has 2 byte opcode.
|
|
|
|
|
return x86.AMOVUPS
|
|
|
|
|
} else {
|
2017-04-28 00:19:49 +00:00
|
|
|
switch t.Size() {
|
2016-03-12 14:07:40 -08:00
|
|
|
case 1:
|
|
|
|
|
// Avoids partial register write
|
|
|
|
|
return x86.AMOVL
|
|
|
|
|
case 2:
|
|
|
|
|
return x86.AMOVL
|
|
|
|
|
case 4:
|
|
|
|
|
return x86.AMOVL
|
|
|
|
|
case 8:
|
|
|
|
|
return x86.AMOVQ
|
|
|
|
|
case 16:
|
|
|
|
|
return x86.AMOVUPS // int128s are in SSE registers
|
|
|
|
|
default:
|
2020-11-13 16:54:48 -05:00
|
|
|
panic(fmt.Sprintf("bad int register width %d:%v", t.Size(), t))
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// opregreg emits instructions for
|
2022-02-03 14:12:08 -05:00
|
|
|
//
|
|
|
|
|
// dest := dest(To) op src(From)
|
|
|
|
|
//
|
2016-03-12 14:07:40 -08:00
|
|
|
// and also returns the created obj.Prog so it
|
|
|
|
|
// may be further adjusted (offset, scale, etc).
|
2020-12-23 00:57:10 -05:00
|
|
|
func opregreg(s *ssagen.State, op obj.As, dest, src int16) *obj.Prog {
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(op)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = dest
|
|
|
|
|
p.From.Reg = src
|
|
|
|
|
return p
|
|
|
|
|
}
|
|
|
|
|
|
2019-03-09 12:41:34 -08:00
|
|
|
// memIdx fills out a as an indexed memory reference for v.
|
|
|
|
|
// It assumes that the base register and the index register
|
|
|
|
|
// are v.Args[0].Reg() and v.Args[1].Reg(), respectively.
|
|
|
|
|
// The caller must still use gc.AddAux/gc.AddAux2 to handle v.Aux as necessary.
|
|
|
|
|
func memIdx(a *obj.Addr, v *ssa.Value) {
|
|
|
|
|
r, i := v.Args[0].Reg(), v.Args[1].Reg()
|
|
|
|
|
a.Type = obj.TYPE_MEM
|
|
|
|
|
a.Scale = v.Op.Scale()
|
|
|
|
|
if a.Scale == 1 && i == x86.REG_SP {
|
|
|
|
|
r, i = i, r
|
|
|
|
|
}
|
|
|
|
|
a.Reg = r
|
|
|
|
|
a.Index = i
|
|
|
|
|
}
|
|
|
|
|
|
2017-08-09 14:50:58 -05:00
|
|
|
// DUFFZERO consists of repeated blocks of 4 MOVUPSs + LEAQ,
|
2016-07-28 12:22:49 -04:00
|
|
|
// See runtime/mkduff.go.
|
2025-06-03 12:36:35 -07:00
|
|
|
const (
|
|
|
|
|
dzBlocks = 16 // number of MOV/ADD blocks
|
|
|
|
|
dzBlockLen = 4 // number of clears per block
|
|
|
|
|
dzBlockSize = 23 // size of instructions in a single block
|
|
|
|
|
dzMovSize = 5 // size of single MOV instruction w/ offset
|
|
|
|
|
dzLeaqSize = 4 // size of single LEAQ instruction
|
|
|
|
|
dzClearStep = 16 // number of bytes cleared by each MOV instruction
|
|
|
|
|
)
|
|
|
|
|
|
2016-07-28 12:22:49 -04:00
|
|
|
func duffStart(size int64) int64 {
|
|
|
|
|
x, _ := duff(size)
|
|
|
|
|
return x
|
|
|
|
|
}
|
|
|
|
|
func duffAdj(size int64) int64 {
|
|
|
|
|
_, x := duff(size)
|
|
|
|
|
return x
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// duff returns the offset (from duffzero, in bytes) and pointer adjust (in bytes)
|
|
|
|
|
// required to use the duffzero mechanism for a block of the given size.
|
|
|
|
|
func duff(size int64) (int64, int64) {
|
|
|
|
|
if size < 32 || size > 1024 || size%dzClearStep != 0 {
|
|
|
|
|
panic("bad duffzero size")
|
|
|
|
|
}
|
|
|
|
|
steps := size / dzClearStep
|
|
|
|
|
blocks := steps / dzBlockLen
|
|
|
|
|
steps %= dzBlockLen
|
|
|
|
|
off := dzBlockSize * (dzBlocks - blocks)
|
|
|
|
|
var adj int64
|
|
|
|
|
if steps != 0 {
|
2017-08-09 14:50:58 -05:00
|
|
|
off -= dzLeaqSize
|
2016-07-28 12:22:49 -04:00
|
|
|
off -= dzMovSize * steps
|
|
|
|
|
adj -= dzClearStep * (dzBlockLen - steps)
|
|
|
|
|
}
|
|
|
|
|
return off, adj
|
|
|
|
|
}
|
|
|
|
|
|
2021-02-02 18:20:16 -05:00
|
|
|
func getgFromTLS(s *ssagen.State, r int16) {
|
|
|
|
|
// See the comments in cmd/internal/obj/x86/obj6.go
|
|
|
|
|
// near CanUse1InsnTLS for a detailed explanation of these instructions.
|
|
|
|
|
if x86.CanUse1InsnTLS(base.Ctxt) {
|
|
|
|
|
// MOVQ (TLS), r
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = x86.REG_TLS
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
} else {
|
|
|
|
|
// MOVQ TLS, r
|
|
|
|
|
// MOVQ (r)(TLS*1), r
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = x86.REG_TLS
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
q := s.Prog(x86.AMOVQ)
|
|
|
|
|
q.From.Type = obj.TYPE_MEM
|
|
|
|
|
q.From.Reg = r
|
|
|
|
|
q.From.Index = x86.REG_TLS
|
|
|
|
|
q.From.Scale = 1
|
|
|
|
|
q.To.Type = obj.TYPE_REG
|
|
|
|
|
q.To.Reg = r
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
2020-12-23 00:57:10 -05:00
|
|
|
func ssaGenValue(s *ssagen.State, v *ssa.Value) {
|
2016-03-12 14:07:40 -08:00
|
|
|
switch v.Op {
|
2025-02-02 23:42:43 +01:00
|
|
|
case ssa.OpAMD64VFMADD231SD, ssa.OpAMD64VFMADD231SS:
|
2018-09-25 03:10:33 -04:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From = obj.Addr{Type: obj.TYPE_REG, Reg: v.Args[2].Reg()}
|
|
|
|
|
p.To = obj.Addr{Type: obj.TYPE_REG, Reg: v.Reg()}
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSourceReg(v.Args[1].Reg())
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64ADDQ, ssa.OpAMD64ADDL:
|
2016-09-16 09:36:00 -07:00
|
|
|
r := v.Reg()
|
|
|
|
|
r1 := v.Args[0].Reg()
|
|
|
|
|
r2 := v.Args[1].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
switch {
|
|
|
|
|
case r == r1:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r2
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
case r == r2:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r1
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
default:
|
|
|
|
|
var asm obj.As
|
2016-03-29 13:53:34 +03:00
|
|
|
if v.Op == ssa.OpAMD64ADDQ {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.ALEAQ
|
2016-03-29 13:53:34 +03:00
|
|
|
} else {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.ALEAL
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(asm)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = r1
|
|
|
|
|
p.From.Scale = 1
|
|
|
|
|
p.From.Index = r2
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
}
|
2016-04-10 08:26:43 -07:00
|
|
|
// 2-address opcode arithmetic
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64SUBQ, ssa.OpAMD64SUBL,
|
|
|
|
|
ssa.OpAMD64MULQ, ssa.OpAMD64MULL,
|
|
|
|
|
ssa.OpAMD64ANDQ, ssa.OpAMD64ANDL,
|
|
|
|
|
ssa.OpAMD64ORQ, ssa.OpAMD64ORL,
|
|
|
|
|
ssa.OpAMD64XORQ, ssa.OpAMD64XORL,
|
|
|
|
|
ssa.OpAMD64SHLQ, ssa.OpAMD64SHLL,
|
2016-04-10 08:26:43 -07:00
|
|
|
ssa.OpAMD64SHRQ, ssa.OpAMD64SHRL, ssa.OpAMD64SHRW, ssa.OpAMD64SHRB,
|
|
|
|
|
ssa.OpAMD64SARQ, ssa.OpAMD64SARL, ssa.OpAMD64SARW, ssa.OpAMD64SARB,
|
2017-03-29 10:04:17 -07:00
|
|
|
ssa.OpAMD64ROLQ, ssa.OpAMD64ROLL, ssa.OpAMD64ROLW, ssa.OpAMD64ROLB,
|
|
|
|
|
ssa.OpAMD64RORQ, ssa.OpAMD64RORL, ssa.OpAMD64RORW, ssa.OpAMD64RORB,
|
2016-04-10 08:26:43 -07:00
|
|
|
ssa.OpAMD64ADDSS, ssa.OpAMD64ADDSD, ssa.OpAMD64SUBSS, ssa.OpAMD64SUBSD,
|
|
|
|
|
ssa.OpAMD64MULSS, ssa.OpAMD64MULSD, ssa.OpAMD64DIVSS, ssa.OpAMD64DIVSD,
|
2023-07-31 14:08:42 -07:00
|
|
|
ssa.OpAMD64MINSS, ssa.OpAMD64MINSD,
|
|
|
|
|
ssa.OpAMD64POR, ssa.OpAMD64PXOR,
|
cmd/compile: add patterns for bit set/clear/complement on amd64
This patch completes implementation of BT(Q|L), and adds support
for BT(S|R|C)(Q|L).
Example of code changes from time.(*Time).addSec:
if t.wall&hasMonotonic != 0 {
0x1073465 488b08 MOVQ 0(AX), CX
0x1073468 4889ca MOVQ CX, DX
0x107346b 48c1e93f SHRQ $0x3f, CX
0x107346f 48c1e13f SHLQ $0x3f, CX
0x1073473 48f7c1ffffffff TESTQ $-0x1, CX
0x107347a 746b JE 0x10734e7
if t.wall&hasMonotonic != 0 {
0x1073435 488b08 MOVQ 0(AX), CX
0x1073438 480fbae13f BTQ $0x3f, CX
0x107343d 7363 JAE 0x10734a2
Another example:
t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
0x10734c8 4881e1ffffff3f ANDQ $0x3fffffff, CX
0x10734cf 48c1e61e SHLQ $0x1e, SI
0x10734d3 4809ce ORQ CX, SI
0x10734d6 48b90000000000000080 MOVQ $0x8000000000000000, CX
0x10734e0 4809f1 ORQ SI, CX
0x10734e3 488908 MOVQ CX, 0(AX)
t.wall = t.wall&nsecMask | uint64(dsec)<<nsecShift | hasMonotonic
0x107348b 4881e2ffffff3f ANDQ $0x3fffffff, DX
0x1073492 48c1e61e SHLQ $0x1e, SI
0x1073496 4809f2 ORQ SI, DX
0x1073499 480fbaea3f BTSQ $0x3f, DX
0x107349e 488910 MOVQ DX, 0(AX)
Go1 benchmarks seem unaffected, and I would be surprised
otherwise:
name old time/op new time/op delta
BinaryTree17-4 2.64s ± 4% 2.56s ± 9% -2.92% (p=0.008 n=9+9)
Fannkuch11-4 2.90s ± 1% 2.95s ± 3% +1.76% (p=0.010 n=10+9)
FmtFprintfEmpty-4 35.3ns ± 1% 34.5ns ± 2% -2.34% (p=0.004 n=9+8)
FmtFprintfString-4 57.0ns ± 1% 58.4ns ± 5% +2.52% (p=0.029 n=9+10)
FmtFprintfInt-4 59.8ns ± 3% 59.8ns ± 6% ~ (p=0.565 n=10+10)
FmtFprintfIntInt-4 93.9ns ± 3% 91.2ns ± 5% -2.94% (p=0.014 n=10+9)
FmtFprintfPrefixedInt-4 107ns ± 6% 104ns ± 6% ~ (p=0.099 n=10+10)
FmtFprintfFloat-4 187ns ± 3% 188ns ± 3% ~ (p=0.505 n=10+9)
FmtManyArgs-4 410ns ± 1% 415ns ± 6% ~ (p=0.649 n=8+10)
GobDecode-4 5.30ms ± 3% 5.27ms ± 3% ~ (p=0.436 n=10+10)
GobEncode-4 4.62ms ± 5% 4.47ms ± 2% -3.24% (p=0.001 n=9+10)
Gzip-4 197ms ± 4% 193ms ± 3% ~ (p=0.123 n=10+10)
Gunzip-4 30.4ms ± 3% 30.1ms ± 3% ~ (p=0.481 n=10+10)
HTTPClientServer-4 76.3µs ± 1% 76.0µs ± 1% ~ (p=0.236 n=8+9)
JSONEncode-4 10.5ms ± 9% 10.3ms ± 3% ~ (p=0.280 n=10+10)
JSONDecode-4 42.3ms ±10% 41.3ms ± 2% ~ (p=0.053 n=9+10)
Mandelbrot200-4 3.80ms ± 2% 3.72ms ± 2% -2.15% (p=0.001 n=9+10)
GoParse-4 2.88ms ±10% 2.81ms ± 2% ~ (p=0.247 n=10+10)
RegexpMatchEasy0_32-4 69.5ns ± 4% 68.6ns ± 2% ~ (p=0.171 n=10+10)
RegexpMatchEasy0_1K-4 165ns ± 3% 162ns ± 3% ~ (p=0.137 n=10+10)
RegexpMatchEasy1_32-4 65.7ns ± 6% 64.4ns ± 2% -2.02% (p=0.037 n=10+10)
RegexpMatchEasy1_1K-4 278ns ± 2% 279ns ± 3% ~ (p=0.991 n=8+9)
RegexpMatchMedium_32-4 99.3ns ± 3% 98.5ns ± 4% ~ (p=0.457 n=10+9)
RegexpMatchMedium_1K-4 30.1µs ± 1% 30.4µs ± 2% ~ (p=0.173 n=8+10)
RegexpMatchHard_32-4 1.40µs ± 2% 1.41µs ± 4% ~ (p=0.565 n=10+10)
RegexpMatchHard_1K-4 42.5µs ± 1% 41.5µs ± 3% -2.13% (p=0.002 n=8+9)
Revcomp-4 332ms ± 4% 328ms ± 5% ~ (p=0.720 n=9+10)
Template-4 48.3ms ± 2% 49.6ms ± 3% +2.56% (p=0.002 n=8+10)
TimeParse-4 252ns ± 2% 249ns ± 3% ~ (p=0.116 n=9+10)
TimeFormat-4 262ns ± 4% 252ns ± 3% -4.01% (p=0.000 n=9+10)
name old speed new speed delta
GobDecode-4 145MB/s ± 3% 146MB/s ± 3% ~ (p=0.436 n=10+10)
GobEncode-4 166MB/s ± 5% 172MB/s ± 2% +3.28% (p=0.001 n=9+10)
Gzip-4 98.6MB/s ± 4% 100.4MB/s ± 3% ~ (p=0.123 n=10+10)
Gunzip-4 639MB/s ± 3% 645MB/s ± 3% ~ (p=0.481 n=10+10)
JSONEncode-4 185MB/s ± 8% 189MB/s ± 3% ~ (p=0.280 n=10+10)
JSONDecode-4 46.0MB/s ± 9% 47.0MB/s ± 2% +2.21% (p=0.046 n=9+10)
GoParse-4 20.1MB/s ± 9% 20.6MB/s ± 2% ~ (p=0.239 n=10+10)
RegexpMatchEasy0_32-4 460MB/s ± 4% 467MB/s ± 2% ~ (p=0.165 n=10+10)
RegexpMatchEasy0_1K-4 6.19GB/s ± 3% 6.28GB/s ± 3% ~ (p=0.165 n=10+10)
RegexpMatchEasy1_32-4 487MB/s ± 5% 497MB/s ± 2% +2.00% (p=0.043 n=10+10)
RegexpMatchEasy1_1K-4 3.67GB/s ± 2% 3.67GB/s ± 3% ~ (p=0.963 n=8+9)
RegexpMatchMedium_32-4 10.1MB/s ± 3% 10.1MB/s ± 4% ~ (p=0.435 n=10+9)
RegexpMatchMedium_1K-4 34.0MB/s ± 1% 33.7MB/s ± 2% ~ (p=0.173 n=8+10)
RegexpMatchHard_32-4 22.9MB/s ± 2% 22.7MB/s ± 4% ~ (p=0.565 n=10+10)
RegexpMatchHard_1K-4 24.0MB/s ± 3% 24.7MB/s ± 3% +2.64% (p=0.001 n=9+9)
Revcomp-4 766MB/s ± 4% 775MB/s ± 5% ~ (p=0.720 n=9+10)
Template-4 40.2MB/s ± 2% 39.2MB/s ± 3% -2.47% (p=0.002 n=8+10)
The rules match ~1800 times during all.bash.
Fixes #18943
Change-Id: I64be1ada34e89c486dfd935bf429b35652117ed4
Reviewed-on: https://go-review.googlesource.com/94766
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-02-17 13:54:03 +01:00
|
|
|
ssa.OpAMD64BTSL, ssa.OpAMD64BTSQ,
|
|
|
|
|
ssa.OpAMD64BTCL, ssa.OpAMD64BTCQ,
|
2024-11-04 12:41:33 -05:00
|
|
|
ssa.OpAMD64BTRL, ssa.OpAMD64BTRQ,
|
|
|
|
|
ssa.OpAMD64PCMPEQB, ssa.OpAMD64PSIGNB,
|
|
|
|
|
ssa.OpAMD64PUNPCKLBW:
|
2021-01-07 19:08:37 -08:00
|
|
|
opregreg(s, v.Op.Asm(), v.Reg(), v.Args[1].Reg())
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2024-11-04 12:41:33 -05:00
|
|
|
case ssa.OpAMD64PSHUFLW:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
imm := v.AuxInt
|
|
|
|
|
if imm < 0 || imm > 255 {
|
|
|
|
|
v.Fatalf("Invalid source selection immediate")
|
|
|
|
|
}
|
|
|
|
|
p.From.Offset = imm
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.AddRestSourceReg(v.Args[0].Reg())
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
|
|
|
|
|
case ssa.OpAMD64PSHUFBbroadcast:
|
|
|
|
|
// PSHUFB with a control mask of zero copies byte 0 to all
|
|
|
|
|
// bytes in the register.
|
|
|
|
|
//
|
|
|
|
|
// X15 is always zero with ABIInternal.
|
|
|
|
|
if s.ABI != obj.ABIInternal {
|
|
|
|
|
// zero X15 manually
|
|
|
|
|
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
p.From.Reg = x86.REG_X15
|
|
|
|
|
|
2021-01-07 19:25:05 -08:00
|
|
|
case ssa.OpAMD64SHRDQ, ssa.OpAMD64SHLDQ:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
lo, hi, bits := v.Args[0].Reg(), v.Args[1].Reg(), v.Args[2].Reg()
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = bits
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = lo
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSourceReg(hi)
|
2021-01-07 19:25:05 -08:00
|
|
|
|
2021-07-02 21:02:30 -07:00
|
|
|
case ssa.OpAMD64BLSIQ, ssa.OpAMD64BLSIL,
|
|
|
|
|
ssa.OpAMD64BLSMSKQ, ssa.OpAMD64BLSMSKL,
|
2021-12-06 18:46:25 +03:00
|
|
|
ssa.OpAMD64BLSRQ, ssa.OpAMD64BLSRL:
|
2021-07-02 21:02:30 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
2022-11-06 06:37:13 +01:00
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64BLSRQ, ssa.OpAMD64BLSRL:
|
|
|
|
|
p.To.Reg = v.Reg0()
|
|
|
|
|
default:
|
|
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
}
|
2021-07-02 21:02:30 -07:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64ANDNQ, ssa.OpAMD64ANDNL:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSourceReg(v.Args[1].Reg())
|
2021-07-02 21:02:30 -07:00
|
|
|
|
2022-04-09 14:40:40 +08:00
|
|
|
case ssa.OpAMD64SARXL, ssa.OpAMD64SARXQ,
|
|
|
|
|
ssa.OpAMD64SHLXL, ssa.OpAMD64SHLXQ,
|
|
|
|
|
ssa.OpAMD64SHRXL, ssa.OpAMD64SHRXQ:
|
2022-04-08 16:44:13 +08:00
|
|
|
p := opregreg(s, v.Op.Asm(), v.Reg(), v.Args[1].Reg())
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSourceReg(v.Args[0].Reg())
|
2022-04-08 16:44:13 +08:00
|
|
|
|
2022-03-02 16:32:16 +08:00
|
|
|
case ssa.OpAMD64SHLXLload, ssa.OpAMD64SHLXQload,
|
2022-04-08 17:33:50 +08:00
|
|
|
ssa.OpAMD64SHRXLload, ssa.OpAMD64SHRXQload,
|
|
|
|
|
ssa.OpAMD64SARXLload, ssa.OpAMD64SARXQload:
|
2022-03-02 16:32:16 +08:00
|
|
|
p := opregreg(s, v.Op.Asm(), v.Reg(), v.Args[1].Reg())
|
|
|
|
|
m := obj.Addr{Type: obj.TYPE_MEM, Reg: v.Args[0].Reg()}
|
|
|
|
|
ssagen.AddAux(&m, v)
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSource(m)
|
2022-03-02 16:32:16 +08:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64SHLXLloadidx1, ssa.OpAMD64SHLXLloadidx4, ssa.OpAMD64SHLXLloadidx8,
|
|
|
|
|
ssa.OpAMD64SHRXLloadidx1, ssa.OpAMD64SHRXLloadidx4, ssa.OpAMD64SHRXLloadidx8,
|
2022-04-08 17:33:50 +08:00
|
|
|
ssa.OpAMD64SARXLloadidx1, ssa.OpAMD64SARXLloadidx4, ssa.OpAMD64SARXLloadidx8,
|
2022-03-02 16:32:16 +08:00
|
|
|
ssa.OpAMD64SHLXQloadidx1, ssa.OpAMD64SHLXQloadidx8,
|
2022-04-08 17:33:50 +08:00
|
|
|
ssa.OpAMD64SHRXQloadidx1, ssa.OpAMD64SHRXQloadidx8,
|
|
|
|
|
ssa.OpAMD64SARXQloadidx1, ssa.OpAMD64SARXQloadidx8:
|
2022-03-02 16:32:16 +08:00
|
|
|
p := opregreg(s, v.Op.Asm(), v.Reg(), v.Args[2].Reg())
|
|
|
|
|
m := obj.Addr{Type: obj.TYPE_MEM}
|
|
|
|
|
memIdx(&m, v)
|
|
|
|
|
ssagen.AddAux(&m, v)
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSource(m)
|
2022-03-02 16:32:16 +08:00
|
|
|
|
2016-07-18 10:18:12 -07:00
|
|
|
case ssa.OpAMD64DIVQU, ssa.OpAMD64DIVLU, ssa.OpAMD64DIVWU:
|
|
|
|
|
// Arg[0] (the dividend) is in AX.
|
|
|
|
|
// Arg[1] (the divisor) can be in any other register.
|
|
|
|
|
// Result[0] (the quotient) is in AX.
|
|
|
|
|
// Result[1] (the remainder) is in DX.
|
2016-09-16 09:36:00 -07:00
|
|
|
r := v.Args[1].Reg()
|
2016-07-18 10:18:12 -07:00
|
|
|
|
|
|
|
|
// Zero extend dividend.
|
2021-12-06 11:46:57 +03:00
|
|
|
opregreg(s, x86.AXORL, x86.REG_DX, x86.REG_DX)
|
2016-07-18 10:18:12 -07:00
|
|
|
|
|
|
|
|
// Issue divide.
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-07-18 10:18:12 -07:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2016-07-18 10:18:12 -07:00
|
|
|
case ssa.OpAMD64DIVQ, ssa.OpAMD64DIVL, ssa.OpAMD64DIVW:
|
|
|
|
|
// Arg[0] (the dividend) is in AX.
|
|
|
|
|
// Arg[1] (the divisor) can be in any other register.
|
|
|
|
|
// Result[0] (the quotient) is in AX.
|
|
|
|
|
// Result[1] (the remainder) is in DX.
|
2016-09-16 09:36:00 -07:00
|
|
|
r := v.Args[1].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2023-04-06 06:11:10 +10:00
|
|
|
var opCMP, opNEG, opSXD obj.As
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64DIVQ:
|
|
|
|
|
opCMP, opNEG, opSXD = x86.ACMPQ, x86.ANEGQ, x86.ACQO
|
|
|
|
|
case ssa.OpAMD64DIVL:
|
|
|
|
|
opCMP, opNEG, opSXD = x86.ACMPL, x86.ANEGL, x86.ACDQ
|
|
|
|
|
case ssa.OpAMD64DIVW:
|
|
|
|
|
opCMP, opNEG, opSXD = x86.ACMPW, x86.ANEGW, x86.ACWD
|
|
|
|
|
}
|
|
|
|
|
|
2016-07-18 10:18:12 -07:00
|
|
|
// CPU faults upon signed overflow, which occurs when the most
|
|
|
|
|
// negative int is divided by -1. Handle divide by -1 as a special case.
|
cmd/compile/internal/amd64: improve fix up code for signed division
In order to avoid a CPU exception resulting from signed overflow, the signed
division code tests if the divisor is -1 and if it is, runs fix up code to
manually compute the quotient and remainder (thus avoiding IDIV and potential
signed overflow).
However, the way that this is currently structured means that the normal code
path for the case where the divisor is not -1 results in five instructions
and two branches (CMP, JEQ, followed by sign extension, IDIV and another JMP
to skip over the fix up code).
Rework the fix up code such that the final JMP is incurred by the less likely
divisor is -1 code path, rather than more likely code path (which is already
more expensive due to IDIV). This result in a four instruction sequence
(CMP, JNE, sign extension, IDIV), with only a single branch.
Updates #59089
Change-Id: Ie8d065750a178518d7397e194920b201afeb0530
Reviewed-on: https://go-review.googlesource.com/c/go/+/482658
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-04-06 06:11:14 +10:00
|
|
|
var j1, j2 *obj.Prog
|
2020-01-23 22:18:30 -08:00
|
|
|
if ssa.DivisionNeedsFixUp(v) {
|
2023-04-06 06:11:10 +10:00
|
|
|
c := s.Prog(opCMP)
|
2018-08-06 19:50:38 +10:00
|
|
|
c.From.Type = obj.TYPE_REG
|
|
|
|
|
c.From.Reg = r
|
|
|
|
|
c.To.Type = obj.TYPE_CONST
|
|
|
|
|
c.To.Offset = -1
|
2016-03-12 14:07:40 -08:00
|
|
|
|
cmd/compile/internal/amd64: improve fix up code for signed division
In order to avoid a CPU exception resulting from signed overflow, the signed
division code tests if the divisor is -1 and if it is, runs fix up code to
manually compute the quotient and remainder (thus avoiding IDIV and potential
signed overflow).
However, the way that this is currently structured means that the normal code
path for the case where the divisor is not -1 results in five instructions
and two branches (CMP, JEQ, followed by sign extension, IDIV and another JMP
to skip over the fix up code).
Rework the fix up code such that the final JMP is incurred by the less likely
divisor is -1 code path, rather than more likely code path (which is already
more expensive due to IDIV). This result in a four instruction sequence
(CMP, JNE, sign extension, IDIV), with only a single branch.
Updates #59089
Change-Id: Ie8d065750a178518d7397e194920b201afeb0530
Reviewed-on: https://go-review.googlesource.com/c/go/+/482658
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-04-06 06:11:14 +10:00
|
|
|
// Divisor is not -1, proceed with normal division.
|
|
|
|
|
j1 = s.Prog(x86.AJNE)
|
|
|
|
|
j1.To.Type = obj.TYPE_BRANCH
|
2016-07-18 10:18:12 -07:00
|
|
|
|
cmd/compile/internal/amd64: improve fix up code for signed division
In order to avoid a CPU exception resulting from signed overflow, the signed
division code tests if the divisor is -1 and if it is, runs fix up code to
manually compute the quotient and remainder (thus avoiding IDIV and potential
signed overflow).
However, the way that this is currently structured means that the normal code
path for the case where the divisor is not -1 results in five instructions
and two branches (CMP, JEQ, followed by sign extension, IDIV and another JMP
to skip over the fix up code).
Rework the fix up code such that the final JMP is incurred by the less likely
divisor is -1 code path, rather than more likely code path (which is already
more expensive due to IDIV). This result in a four instruction sequence
(CMP, JNE, sign extension, IDIV), with only a single branch.
Updates #59089
Change-Id: Ie8d065750a178518d7397e194920b201afeb0530
Reviewed-on: https://go-review.googlesource.com/c/go/+/482658
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-04-06 06:11:14 +10:00
|
|
|
// Divisor is -1, manually compute quotient and remainder via fixup code.
|
2018-08-06 19:50:38 +10:00
|
|
|
// n / -1 = -n
|
2023-04-06 06:11:10 +10:00
|
|
|
n1 := s.Prog(opNEG)
|
2018-08-06 19:50:38 +10:00
|
|
|
n1.To.Type = obj.TYPE_REG
|
|
|
|
|
n1.To.Reg = x86.REG_AX
|
2016-07-18 10:18:12 -07:00
|
|
|
|
2018-08-06 19:50:38 +10:00
|
|
|
// n % -1 == 0
|
2021-12-06 11:46:57 +03:00
|
|
|
opregreg(s, x86.AXORL, x86.REG_DX, x86.REG_DX)
|
2016-07-18 10:18:12 -07:00
|
|
|
|
2018-08-06 19:50:38 +10:00
|
|
|
// TODO(khr): issue only the -1 fixup code we need.
|
|
|
|
|
// For instance, if only the quotient is used, no point in zeroing the remainder.
|
2016-07-18 10:18:12 -07:00
|
|
|
|
cmd/compile/internal/amd64: improve fix up code for signed division
In order to avoid a CPU exception resulting from signed overflow, the signed
division code tests if the divisor is -1 and if it is, runs fix up code to
manually compute the quotient and remainder (thus avoiding IDIV and potential
signed overflow).
However, the way that this is currently structured means that the normal code
path for the case where the divisor is not -1 results in five instructions
and two branches (CMP, JEQ, followed by sign extension, IDIV and another JMP
to skip over the fix up code).
Rework the fix up code such that the final JMP is incurred by the less likely
divisor is -1 code path, rather than more likely code path (which is already
more expensive due to IDIV). This result in a four instruction sequence
(CMP, JNE, sign extension, IDIV), with only a single branch.
Updates #59089
Change-Id: Ie8d065750a178518d7397e194920b201afeb0530
Reviewed-on: https://go-review.googlesource.com/c/go/+/482658
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-04-06 06:11:14 +10:00
|
|
|
// Skip over normal division.
|
|
|
|
|
j2 = s.Prog(obj.AJMP)
|
|
|
|
|
j2.To.Type = obj.TYPE_BRANCH
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
// Sign extend dividend and perform division.
|
|
|
|
|
p := s.Prog(opSXD)
|
|
|
|
|
if j1 != nil {
|
|
|
|
|
j1.To.SetTarget(p)
|
|
|
|
|
}
|
|
|
|
|
p = s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r
|
|
|
|
|
|
|
|
|
|
if j2 != nil {
|
2020-08-28 17:10:32 +00:00
|
|
|
j2.To.SetTarget(s.Pc())
|
2018-08-06 19:50:38 +10:00
|
|
|
}
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2017-03-03 11:35:44 -08:00
|
|
|
case ssa.OpAMD64HMULQ, ssa.OpAMD64HMULL, ssa.OpAMD64HMULQU, ssa.OpAMD64HMULLU:
|
2016-03-12 14:07:40 -08:00
|
|
|
// the frontend rewrites constant division by 8/16/32 bit integers into
|
|
|
|
|
// HMUL by a constant
|
|
|
|
|
// SSA rewrites generate the 64 bit versions
|
|
|
|
|
|
|
|
|
|
// Arg[0] is already in AX as it's the only register we allow
|
|
|
|
|
// and DX is the only output we care about (the high bits)
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[1].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
// IMULB puts the high portion in AH instead of DL,
|
|
|
|
|
// so move it to DL for consistency
|
2017-04-28 00:19:49 +00:00
|
|
|
if v.Type.Size() == 1 {
|
2017-03-20 08:01:28 -07:00
|
|
|
m := s.Prog(x86.AMOVB)
|
2016-03-12 14:07:40 -08:00
|
|
|
m.From.Type = obj.TYPE_REG
|
|
|
|
|
m.From.Reg = x86.REG_AH
|
|
|
|
|
m.To.Type = obj.TYPE_REG
|
|
|
|
|
m.To.Reg = x86.REG_DX
|
|
|
|
|
}
|
|
|
|
|
|
2018-01-27 11:55:34 +01:00
|
|
|
case ssa.OpAMD64MULQU, ssa.OpAMD64MULLU:
|
|
|
|
|
// Arg[0] is already in AX as it's the only register we allow
|
|
|
|
|
// results lo in AX
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[1].Reg()
|
|
|
|
|
|
2016-10-06 15:43:47 -04:00
|
|
|
case ssa.OpAMD64MULQU2:
|
|
|
|
|
// Arg[0] is already in AX as it's the only register we allow
|
|
|
|
|
// results hi in DX, lo in AX
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-10-06 15:43:47 -04:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[1].Reg()
|
|
|
|
|
|
|
|
|
|
case ssa.OpAMD64DIVQU2:
|
|
|
|
|
// Arg[0], Arg[1] are already in Dx, AX, as they're the only registers we allow
|
|
|
|
|
// results q in AX, r in DX
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-10-06 15:43:47 -04:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[2].Reg()
|
|
|
|
|
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64AVGQU:
|
|
|
|
|
// compute (x+y)/2 unsigned.
|
|
|
|
|
// Do a 64-bit add, the overflow goes into the carry.
|
|
|
|
|
// Shift right once and pull the carry back into the 63rd bit.
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(x86.AADDQ)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg()
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[1].Reg()
|
2017-03-20 08:01:28 -07:00
|
|
|
p = s.Prog(x86.ARCRQ)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = 1
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2018-10-23 14:05:38 -07:00
|
|
|
case ssa.OpAMD64ADDQcarry, ssa.OpAMD64ADCQ:
|
|
|
|
|
r := v.Reg0()
|
|
|
|
|
r0 := v.Args[0].Reg()
|
|
|
|
|
r1 := v.Args[1].Reg()
|
|
|
|
|
switch r {
|
|
|
|
|
case r0:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r1
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
case r1:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r0
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
default:
|
|
|
|
|
v.Fatalf("output not in same register as an input %s", v.LongString())
|
|
|
|
|
}
|
|
|
|
|
|
2018-10-23 14:38:22 -07:00
|
|
|
case ssa.OpAMD64SUBQborrow, ssa.OpAMD64SBBQ:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[1].Reg()
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg0()
|
|
|
|
|
|
|
|
|
|
case ssa.OpAMD64ADDQconstcarry, ssa.OpAMD64ADCQconst, ssa.OpAMD64SUBQconstborrow, ssa.OpAMD64SBBQconst:
|
2018-10-23 14:05:38 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = v.AuxInt
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg0()
|
|
|
|
|
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64ADDQconst, ssa.OpAMD64ADDLconst:
|
2016-09-16 09:36:00 -07:00
|
|
|
r := v.Reg()
|
|
|
|
|
a := v.Args[0].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
if r == a {
|
2019-03-19 12:26:22 -07:00
|
|
|
switch v.AuxInt {
|
|
|
|
|
case 1:
|
2016-03-12 14:07:40 -08:00
|
|
|
var asm obj.As
|
|
|
|
|
// Software optimization manual recommends add $1,reg.
|
|
|
|
|
// But inc/dec is 1 byte smaller. ICC always uses inc
|
|
|
|
|
// Clang/GCC choose depending on flags, but prefer add.
|
|
|
|
|
// Experiments show that inc/dec is both a little faster
|
|
|
|
|
// and make a binary a little smaller.
|
2016-03-29 13:53:34 +03:00
|
|
|
if v.Op == ssa.OpAMD64ADDQconst {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.AINCQ
|
2016-03-29 13:53:34 +03:00
|
|
|
} else {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.AINCL
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(asm)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
return
|
2019-03-19 12:26:22 -07:00
|
|
|
case -1:
|
2016-03-12 14:07:40 -08:00
|
|
|
var asm obj.As
|
2016-03-29 13:53:34 +03:00
|
|
|
if v.Op == ssa.OpAMD64ADDQconst {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.ADECQ
|
2016-03-29 13:53:34 +03:00
|
|
|
} else {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.ADECL
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(asm)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
return
|
2019-03-19 12:26:22 -07:00
|
|
|
case 0x80:
|
|
|
|
|
// 'SUBQ $-0x80, r' is shorter to encode than
|
|
|
|
|
// and functionally equivalent to 'ADDQ $0x80, r'.
|
|
|
|
|
asm := x86.ASUBL
|
|
|
|
|
if v.Op == ssa.OpAMD64ADDQconst {
|
|
|
|
|
asm = x86.ASUBQ
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(asm)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = -0x80
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
return
|
|
|
|
|
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-04-10 08:26:43 -07:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = v.AuxInt
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
|
|
|
|
return
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
var asm obj.As
|
2016-03-29 13:53:34 +03:00
|
|
|
if v.Op == ssa.OpAMD64ADDQconst {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.ALEAQ
|
2016-03-29 13:53:34 +03:00
|
|
|
} else {
|
2016-03-12 14:07:40 -08:00
|
|
|
asm = x86.ALEAL
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(asm)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = a
|
2016-03-29 16:39:53 -07:00
|
|
|
p.From.Offset = v.AuxInt
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
2016-03-11 00:10:52 -05:00
|
|
|
|
2018-03-05 20:59:40 +01:00
|
|
|
case ssa.OpAMD64CMOVQEQ, ssa.OpAMD64CMOVLEQ, ssa.OpAMD64CMOVWEQ,
|
|
|
|
|
ssa.OpAMD64CMOVQLT, ssa.OpAMD64CMOVLLT, ssa.OpAMD64CMOVWLT,
|
|
|
|
|
ssa.OpAMD64CMOVQNE, ssa.OpAMD64CMOVLNE, ssa.OpAMD64CMOVWNE,
|
|
|
|
|
ssa.OpAMD64CMOVQGT, ssa.OpAMD64CMOVLGT, ssa.OpAMD64CMOVWGT,
|
|
|
|
|
ssa.OpAMD64CMOVQLE, ssa.OpAMD64CMOVLLE, ssa.OpAMD64CMOVWLE,
|
|
|
|
|
ssa.OpAMD64CMOVQGE, ssa.OpAMD64CMOVLGE, ssa.OpAMD64CMOVWGE,
|
|
|
|
|
ssa.OpAMD64CMOVQHI, ssa.OpAMD64CMOVLHI, ssa.OpAMD64CMOVWHI,
|
|
|
|
|
ssa.OpAMD64CMOVQLS, ssa.OpAMD64CMOVLLS, ssa.OpAMD64CMOVWLS,
|
|
|
|
|
ssa.OpAMD64CMOVQCC, ssa.OpAMD64CMOVLCC, ssa.OpAMD64CMOVWCC,
|
|
|
|
|
ssa.OpAMD64CMOVQCS, ssa.OpAMD64CMOVLCS, ssa.OpAMD64CMOVWCS,
|
|
|
|
|
ssa.OpAMD64CMOVQGTF, ssa.OpAMD64CMOVLGTF, ssa.OpAMD64CMOVWGTF,
|
|
|
|
|
ssa.OpAMD64CMOVQGEF, ssa.OpAMD64CMOVLGEF, ssa.OpAMD64CMOVWGEF:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-11 00:10:52 -05:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[1].Reg()
|
2016-03-11 00:10:52 -05:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg()
|
2016-03-11 00:10:52 -05:00
|
|
|
|
2018-03-05 20:59:40 +01:00
|
|
|
case ssa.OpAMD64CMOVQNEF, ssa.OpAMD64CMOVLNEF, ssa.OpAMD64CMOVWNEF:
|
|
|
|
|
// Flag condition: ^ZERO || PARITY
|
|
|
|
|
// Generate:
|
|
|
|
|
// CMOV*NE SRC,DST
|
|
|
|
|
// CMOV*PS SRC,DST
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[1].Reg()
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg()
|
2018-03-05 20:59:40 +01:00
|
|
|
var q *obj.Prog
|
|
|
|
|
if v.Op == ssa.OpAMD64CMOVQNEF {
|
|
|
|
|
q = s.Prog(x86.ACMOVQPS)
|
|
|
|
|
} else if v.Op == ssa.OpAMD64CMOVLNEF {
|
|
|
|
|
q = s.Prog(x86.ACMOVLPS)
|
|
|
|
|
} else {
|
|
|
|
|
q = s.Prog(x86.ACMOVWPS)
|
|
|
|
|
}
|
|
|
|
|
q.From.Type = obj.TYPE_REG
|
|
|
|
|
q.From.Reg = v.Args[1].Reg()
|
|
|
|
|
q.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
q.To.Reg = v.Reg()
|
2018-03-05 20:59:40 +01:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64CMOVQEQF, ssa.OpAMD64CMOVLEQF, ssa.OpAMD64CMOVWEQF:
|
|
|
|
|
// Flag condition: ZERO && !PARITY
|
|
|
|
|
// Generate:
|
2022-04-05 15:07:29 -07:00
|
|
|
// MOV SRC,TMP
|
|
|
|
|
// CMOV*NE DST,TMP
|
|
|
|
|
// CMOV*PC TMP,DST
|
2018-03-05 20:59:40 +01:00
|
|
|
//
|
|
|
|
|
// TODO(rasky): we could generate:
|
|
|
|
|
// CMOV*NE DST,SRC
|
|
|
|
|
// CMOV*PC SRC,DST
|
|
|
|
|
// But this requires a way for regalloc to know that SRC might be
|
|
|
|
|
// clobbered by this instruction.
|
2022-04-05 15:07:29 -07:00
|
|
|
t := v.RegTmp()
|
|
|
|
|
opregreg(s, moveByType(v.Type), t, v.Args[1].Reg())
|
|
|
|
|
|
2018-03-05 20:59:40 +01:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.From.Reg = v.Reg()
|
2018-03-05 20:59:40 +01:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2022-04-05 15:07:29 -07:00
|
|
|
p.To.Reg = t
|
2018-03-05 20:59:40 +01:00
|
|
|
var q *obj.Prog
|
|
|
|
|
if v.Op == ssa.OpAMD64CMOVQEQF {
|
|
|
|
|
q = s.Prog(x86.ACMOVQPC)
|
|
|
|
|
} else if v.Op == ssa.OpAMD64CMOVLEQF {
|
|
|
|
|
q = s.Prog(x86.ACMOVLPC)
|
|
|
|
|
} else {
|
|
|
|
|
q = s.Prog(x86.ACMOVWPC)
|
|
|
|
|
}
|
|
|
|
|
q.From.Type = obj.TYPE_REG
|
2022-04-05 15:07:29 -07:00
|
|
|
q.From.Reg = t
|
2018-03-05 20:59:40 +01:00
|
|
|
q.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
q.To.Reg = v.Reg()
|
2018-03-05 20:59:40 +01:00
|
|
|
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64MULQconst, ssa.OpAMD64MULLconst:
|
2016-09-16 09:36:00 -07:00
|
|
|
r := v.Reg()
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
2016-03-29 16:39:53 -07:00
|
|
|
p.From.Offset = v.AuxInt
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSourceReg(v.Args[0].Reg())
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2021-10-10 17:56:16 +02:00
|
|
|
case ssa.OpAMD64ANDQconst:
|
|
|
|
|
asm := v.Op.Asm()
|
|
|
|
|
// If the constant is positive and fits into 32 bits, use ANDL.
|
|
|
|
|
// This saves a few bytes of encoding.
|
|
|
|
|
if 0 <= v.AuxInt && v.AuxInt <= (1<<32-1) {
|
|
|
|
|
asm = x86.AANDL
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(asm)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = v.AuxInt
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64SUBQconst, ssa.OpAMD64SUBLconst,
|
2021-10-10 17:56:16 +02:00
|
|
|
ssa.OpAMD64ANDLconst,
|
2016-04-22 13:09:18 -07:00
|
|
|
ssa.OpAMD64ORQconst, ssa.OpAMD64ORLconst,
|
|
|
|
|
ssa.OpAMD64XORQconst, ssa.OpAMD64XORLconst,
|
|
|
|
|
ssa.OpAMD64SHLQconst, ssa.OpAMD64SHLLconst,
|
2016-04-10 08:26:43 -07:00
|
|
|
ssa.OpAMD64SHRQconst, ssa.OpAMD64SHRLconst, ssa.OpAMD64SHRWconst, ssa.OpAMD64SHRBconst,
|
|
|
|
|
ssa.OpAMD64SARQconst, ssa.OpAMD64SARLconst, ssa.OpAMD64SARWconst, ssa.OpAMD64SARBconst,
|
|
|
|
|
ssa.OpAMD64ROLQconst, ssa.OpAMD64ROLLconst, ssa.OpAMD64ROLWconst, ssa.OpAMD64ROLBconst:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
2016-03-29 16:39:53 -07:00
|
|
|
p.From.Offset = v.AuxInt
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64SBBQcarrymask, ssa.OpAMD64SBBLcarrymask:
|
2016-09-16 09:36:00 -07:00
|
|
|
r := v.Reg()
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = r
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = r
|
2018-02-26 07:04:32 -08:00
|
|
|
case ssa.OpAMD64LEAQ1, ssa.OpAMD64LEAQ2, ssa.OpAMD64LEAQ4, ssa.OpAMD64LEAQ8,
|
|
|
|
|
ssa.OpAMD64LEAL1, ssa.OpAMD64LEAL2, ssa.OpAMD64LEAL4, ssa.OpAMD64LEAL8,
|
|
|
|
|
ssa.OpAMD64LEAW1, ssa.OpAMD64LEAW2, ssa.OpAMD64LEAW4, ssa.OpAMD64LEAW8:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
2019-03-09 12:41:34 -08:00
|
|
|
memIdx(&p.From, v)
|
|
|
|
|
o := v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2018-05-11 08:01:31 +02:00
|
|
|
p.To.Reg = o
|
|
|
|
|
if v.AuxInt != 0 && v.Aux == nil {
|
|
|
|
|
// Emit an additional LEA to add the displacement instead of creating a slow 3 operand LEA.
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64LEAQ1, ssa.OpAMD64LEAQ2, ssa.OpAMD64LEAQ4, ssa.OpAMD64LEAQ8:
|
|
|
|
|
p = s.Prog(x86.ALEAQ)
|
|
|
|
|
case ssa.OpAMD64LEAL1, ssa.OpAMD64LEAL2, ssa.OpAMD64LEAL4, ssa.OpAMD64LEAL8:
|
|
|
|
|
p = s.Prog(x86.ALEAL)
|
|
|
|
|
case ssa.OpAMD64LEAW1, ssa.OpAMD64LEAW2, ssa.OpAMD64LEAW4, ssa.OpAMD64LEAW8:
|
|
|
|
|
p = s.Prog(x86.ALEAW)
|
|
|
|
|
}
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = o
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = o
|
|
|
|
|
}
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2018-02-26 07:04:32 -08:00
|
|
|
case ssa.OpAMD64LEAQ, ssa.OpAMD64LEAL, ssa.OpAMD64LEAW:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64CMPQ, ssa.OpAMD64CMPL, ssa.OpAMD64CMPW, ssa.OpAMD64CMPB,
|
2017-02-06 10:55:39 -08:00
|
|
|
ssa.OpAMD64TESTQ, ssa.OpAMD64TESTL, ssa.OpAMD64TESTW, ssa.OpAMD64TESTB,
|
|
|
|
|
ssa.OpAMD64BTL, ssa.OpAMD64BTQ:
|
2017-03-20 08:01:28 -07:00
|
|
|
opregreg(s, v.Op.Asm(), v.Args[1].Reg(), v.Args[0].Reg())
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64UCOMISS, ssa.OpAMD64UCOMISD:
|
|
|
|
|
// Go assembler has swapped operands for UCOMISx relative to CMP,
|
|
|
|
|
// must account for that right here.
|
2017-03-20 08:01:28 -07:00
|
|
|
opregreg(s, v.Op.Asm(), v.Args[0].Reg(), v.Args[1].Reg())
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64CMPQconst, ssa.OpAMD64CMPLconst, ssa.OpAMD64CMPWconst, ssa.OpAMD64CMPBconst:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_CONST
|
2016-03-29 16:39:53 -07:00
|
|
|
p.To.Offset = v.AuxInt
|
2018-09-08 14:23:14 +00:00
|
|
|
case ssa.OpAMD64BTLconst, ssa.OpAMD64BTQconst,
|
|
|
|
|
ssa.OpAMD64TESTQconst, ssa.OpAMD64TESTLconst, ssa.OpAMD64TESTWconst, ssa.OpAMD64TESTBconst,
|
2023-08-01 14:32:56 -07:00
|
|
|
ssa.OpAMD64BTSQconst,
|
|
|
|
|
ssa.OpAMD64BTCQconst,
|
|
|
|
|
ssa.OpAMD64BTRQconst:
|
2018-03-10 11:17:05 +01:00
|
|
|
op := v.Op
|
|
|
|
|
if op == ssa.OpAMD64BTQconst && v.AuxInt < 32 {
|
|
|
|
|
// Emit 32-bit version because it's shorter
|
|
|
|
|
op = ssa.OpAMD64BTLconst
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = v.AuxInt
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2018-05-08 09:11:00 -07:00
|
|
|
case ssa.OpAMD64CMPQload, ssa.OpAMD64CMPLload, ssa.OpAMD64CMPWload, ssa.OpAMD64CMPBload:
|
2018-01-03 14:38:55 -08:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2018-01-03 14:38:55 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Args[1].Reg()
|
2018-05-08 09:11:00 -07:00
|
|
|
case ssa.OpAMD64CMPQconstload, ssa.OpAMD64CMPLconstload, ssa.OpAMD64CMPWconstload, ssa.OpAMD64CMPBconstload:
|
2018-01-03 14:38:55 -08:00
|
|
|
sc := v.AuxValAndOff()
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2021-03-05 11:22:13 +01:00
|
|
|
ssagen.AddAux2(&p.From, v, sc.Off64())
|
2018-01-03 14:38:55 -08:00
|
|
|
p.To.Type = obj.TYPE_CONST
|
2021-03-05 11:22:13 +01:00
|
|
|
p.To.Offset = sc.Val64()
|
2020-03-19 17:48:42 -07:00
|
|
|
case ssa.OpAMD64CMPQloadidx8, ssa.OpAMD64CMPQloadidx1, ssa.OpAMD64CMPLloadidx4, ssa.OpAMD64CMPLloadidx1, ssa.OpAMD64CMPWloadidx2, ssa.OpAMD64CMPWloadidx1, ssa.OpAMD64CMPBloadidx1:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
memIdx(&p.From, v)
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2020-03-19 17:48:42 -07:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Args[2].Reg()
|
|
|
|
|
case ssa.OpAMD64CMPQconstloadidx8, ssa.OpAMD64CMPQconstloadidx1, ssa.OpAMD64CMPLconstloadidx4, ssa.OpAMD64CMPLconstloadidx1, ssa.OpAMD64CMPWconstloadidx2, ssa.OpAMD64CMPWconstloadidx1, ssa.OpAMD64CMPBconstloadidx1:
|
|
|
|
|
sc := v.AuxValAndOff()
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
memIdx(&p.From, v)
|
2021-03-05 11:22:13 +01:00
|
|
|
ssagen.AddAux2(&p.From, v, sc.Off64())
|
2020-03-19 17:48:42 -07:00
|
|
|
p.To.Type = obj.TYPE_CONST
|
2021-03-05 11:22:13 +01:00
|
|
|
p.To.Offset = sc.Val64()
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64MOVLconst, ssa.OpAMD64MOVQconst:
|
2016-09-16 09:36:00 -07:00
|
|
|
x := v.Reg()
|
2017-10-24 13:24:14 -07:00
|
|
|
|
|
|
|
|
// If flags aren't live (indicated by v.Aux == nil),
|
|
|
|
|
// then we can rewrite MOV $0, AX into XOR AX, AX.
|
|
|
|
|
if v.AuxInt == 0 && v.Aux == nil {
|
2021-12-06 11:46:57 +03:00
|
|
|
opregreg(s, x86.AXORL, x, x)
|
2017-10-24 13:24:14 -07:00
|
|
|
break
|
|
|
|
|
}
|
|
|
|
|
|
2017-03-24 08:13:17 +01:00
|
|
|
asm := v.Op.Asm()
|
|
|
|
|
// Use MOVL to move a small constant into a register
|
|
|
|
|
// when the constant is positive and fits into 32 bits.
|
|
|
|
|
if 0 <= v.AuxInt && v.AuxInt <= (1<<32-1) {
|
|
|
|
|
// The upper 32bit are zeroed automatically when using MOVL.
|
|
|
|
|
asm = x86.AMOVL
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(asm)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
2016-03-29 16:39:53 -07:00
|
|
|
p.From.Offset = v.AuxInt
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x
|
|
|
|
|
case ssa.OpAMD64MOVSSconst, ssa.OpAMD64MOVSDconst:
|
2016-09-16 09:36:00 -07:00
|
|
|
x := v.Reg()
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_FCONST
|
|
|
|
|
p.From.Val = math.Float64frombits(uint64(v.AuxInt))
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x
|
2021-10-19 19:39:21 +08:00
|
|
|
case ssa.OpAMD64MOVQload, ssa.OpAMD64MOVLload, ssa.OpAMD64MOVWload, ssa.OpAMD64MOVBload, ssa.OpAMD64MOVOload,
|
|
|
|
|
ssa.OpAMD64MOVSSload, ssa.OpAMD64MOVSDload, ssa.OpAMD64MOVBQSXload, ssa.OpAMD64MOVWQSXload, ssa.OpAMD64MOVLQSXload,
|
|
|
|
|
ssa.OpAMD64MOVBEQload, ssa.OpAMD64MOVBELload:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2018-10-06 03:35:17 +00:00
|
|
|
case ssa.OpAMD64MOVBloadidx1, ssa.OpAMD64MOVWloadidx1, ssa.OpAMD64MOVLloadidx1, ssa.OpAMD64MOVQloadidx1, ssa.OpAMD64MOVSSloadidx1, ssa.OpAMD64MOVSDloadidx1,
|
2022-03-24 22:53:41 +08:00
|
|
|
ssa.OpAMD64MOVQloadidx8, ssa.OpAMD64MOVSDloadidx8, ssa.OpAMD64MOVLloadidx8, ssa.OpAMD64MOVLloadidx4, ssa.OpAMD64MOVSSloadidx4, ssa.OpAMD64MOVWloadidx2,
|
|
|
|
|
ssa.OpAMD64MOVBELloadidx1, ssa.OpAMD64MOVBELloadidx4, ssa.OpAMD64MOVBELloadidx8, ssa.OpAMD64MOVBEQloadidx1, ssa.OpAMD64MOVBEQloadidx8:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2019-03-09 12:41:34 -08:00
|
|
|
memIdx(&p.From, v)
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2018-06-29 02:11:53 +00:00
|
|
|
case ssa.OpAMD64MOVQstore, ssa.OpAMD64MOVSSstore, ssa.OpAMD64MOVSDstore, ssa.OpAMD64MOVLstore, ssa.OpAMD64MOVWstore, ssa.OpAMD64MOVBstore, ssa.OpAMD64MOVOstore,
|
|
|
|
|
ssa.OpAMD64ADDQmodify, ssa.OpAMD64SUBQmodify, ssa.OpAMD64ANDQmodify, ssa.OpAMD64ORQmodify, ssa.OpAMD64XORQmodify,
|
2021-10-19 19:39:21 +08:00
|
|
|
ssa.OpAMD64ADDLmodify, ssa.OpAMD64SUBLmodify, ssa.OpAMD64ANDLmodify, ssa.OpAMD64ORLmodify, ssa.OpAMD64XORLmodify,
|
2022-03-30 19:27:21 +08:00
|
|
|
ssa.OpAMD64MOVBEQstore, ssa.OpAMD64MOVBELstore, ssa.OpAMD64MOVBEWstore:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[1].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2018-10-06 03:35:17 +00:00
|
|
|
case ssa.OpAMD64MOVBstoreidx1, ssa.OpAMD64MOVWstoreidx1, ssa.OpAMD64MOVLstoreidx1, ssa.OpAMD64MOVQstoreidx1, ssa.OpAMD64MOVSSstoreidx1, ssa.OpAMD64MOVSDstoreidx1,
|
2020-04-13 09:59:20 -07:00
|
|
|
ssa.OpAMD64MOVQstoreidx8, ssa.OpAMD64MOVSDstoreidx8, ssa.OpAMD64MOVLstoreidx8, ssa.OpAMD64MOVSSstoreidx4, ssa.OpAMD64MOVLstoreidx4, ssa.OpAMD64MOVWstoreidx2,
|
|
|
|
|
ssa.OpAMD64ADDLmodifyidx1, ssa.OpAMD64ADDLmodifyidx4, ssa.OpAMD64ADDLmodifyidx8, ssa.OpAMD64ADDQmodifyidx1, ssa.OpAMD64ADDQmodifyidx8,
|
|
|
|
|
ssa.OpAMD64SUBLmodifyidx1, ssa.OpAMD64SUBLmodifyidx4, ssa.OpAMD64SUBLmodifyidx8, ssa.OpAMD64SUBQmodifyidx1, ssa.OpAMD64SUBQmodifyidx8,
|
|
|
|
|
ssa.OpAMD64ANDLmodifyidx1, ssa.OpAMD64ANDLmodifyidx4, ssa.OpAMD64ANDLmodifyidx8, ssa.OpAMD64ANDQmodifyidx1, ssa.OpAMD64ANDQmodifyidx8,
|
|
|
|
|
ssa.OpAMD64ORLmodifyidx1, ssa.OpAMD64ORLmodifyidx4, ssa.OpAMD64ORLmodifyidx8, ssa.OpAMD64ORQmodifyidx1, ssa.OpAMD64ORQmodifyidx8,
|
2022-03-24 22:53:41 +08:00
|
|
|
ssa.OpAMD64XORLmodifyidx1, ssa.OpAMD64XORLmodifyidx4, ssa.OpAMD64XORLmodifyidx8, ssa.OpAMD64XORQmodifyidx1, ssa.OpAMD64XORQmodifyidx8,
|
|
|
|
|
ssa.OpAMD64MOVBEWstoreidx1, ssa.OpAMD64MOVBEWstoreidx2, ssa.OpAMD64MOVBELstoreidx1, ssa.OpAMD64MOVBELstoreidx4, ssa.OpAMD64MOVBELstoreidx8, ssa.OpAMD64MOVBEQstoreidx1, ssa.OpAMD64MOVBEQstoreidx8:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[2].Reg()
|
2019-03-09 12:41:34 -08:00
|
|
|
memIdx(&p.To, v)
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2018-05-08 09:11:00 -07:00
|
|
|
case ssa.OpAMD64ADDQconstmodify, ssa.OpAMD64ADDLconstmodify:
|
2017-08-09 15:43:25 -05:00
|
|
|
sc := v.AuxValAndOff()
|
2021-03-05 11:22:13 +01:00
|
|
|
off := sc.Off64()
|
2017-08-09 15:43:25 -05:00
|
|
|
val := sc.Val()
|
2018-10-29 08:34:42 +00:00
|
|
|
if val == 1 || val == -1 {
|
2017-08-09 15:43:25 -05:00
|
|
|
var asm obj.As
|
2018-05-08 09:11:00 -07:00
|
|
|
if v.Op == ssa.OpAMD64ADDQconstmodify {
|
2018-10-29 08:34:42 +00:00
|
|
|
if val == 1 {
|
|
|
|
|
asm = x86.AINCQ
|
|
|
|
|
} else {
|
|
|
|
|
asm = x86.ADECQ
|
|
|
|
|
}
|
2017-08-09 15:43:25 -05:00
|
|
|
} else {
|
2018-10-29 08:34:42 +00:00
|
|
|
if val == 1 {
|
|
|
|
|
asm = x86.AINCL
|
|
|
|
|
} else {
|
|
|
|
|
asm = x86.ADECL
|
|
|
|
|
}
|
2017-08-09 15:43:25 -05:00
|
|
|
}
|
|
|
|
|
p := s.Prog(asm)
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux2(&p.To, v, off)
|
2018-09-18 01:53:42 +00:00
|
|
|
break
|
2017-08-09 15:43:25 -05:00
|
|
|
}
|
2018-09-18 01:53:42 +00:00
|
|
|
fallthrough
|
2018-06-27 02:46:17 +00:00
|
|
|
case ssa.OpAMD64ANDQconstmodify, ssa.OpAMD64ANDLconstmodify, ssa.OpAMD64ORQconstmodify, ssa.OpAMD64ORLconstmodify,
|
2023-08-01 14:32:56 -07:00
|
|
|
ssa.OpAMD64XORQconstmodify, ssa.OpAMD64XORLconstmodify,
|
|
|
|
|
ssa.OpAMD64BTSQconstmodify, ssa.OpAMD64BTRQconstmodify, ssa.OpAMD64BTCQconstmodify:
|
2018-06-27 02:46:17 +00:00
|
|
|
sc := v.AuxValAndOff()
|
2021-03-05 11:22:13 +01:00
|
|
|
off := sc.Off64()
|
|
|
|
|
val := sc.Val64()
|
2018-06-27 02:46:17 +00:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = val
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux2(&p.To, v, off)
|
2020-04-13 09:59:20 -07:00
|
|
|
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64MOVQstoreconst, ssa.OpAMD64MOVLstoreconst, ssa.OpAMD64MOVWstoreconst, ssa.OpAMD64MOVBstoreconst:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
sc := v.AuxValAndOff()
|
2021-03-05 11:22:13 +01:00
|
|
|
p.From.Offset = sc.Val64()
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2021-03-05 11:22:13 +01:00
|
|
|
ssagen.AddAux2(&p.To, v, sc.Off64())
|
2021-08-30 22:42:17 +02:00
|
|
|
case ssa.OpAMD64MOVOstoreconst:
|
|
|
|
|
sc := v.AuxValAndOff()
|
|
|
|
|
if sc.Val() != 0 {
|
|
|
|
|
v.Fatalf("MOVO for non zero constants not implemented: %s", v.LongString())
|
|
|
|
|
}
|
|
|
|
|
|
2021-06-09 14:29:20 -04:00
|
|
|
if s.ABI != obj.ABIInternal {
|
all: explode GOEXPERIMENT=regabi into 5 sub-experiments
This separates GOEXPERIMENT=regabi into five sub-experiments:
regabiwrappers, regabig, regabireflect, regabidefer, and regabiargs.
Setting GOEXPERIMENT=regabi now implies the working subset of these
(currently, regabiwrappers, regabig, and regabireflect).
This simplifies testing, helps derisk the register ABI project,
and will also help with performance comparisons.
This replaces the -abiwrap flag to the compiler and linker with
the regabiwrappers experiment.
As part of this, regabiargs now enables registers for all calls
in the compiler. Previously, this was statically disabled in
regabiEnabledForAllCompilation, but now that we can control it
independently, this isn't necessary.
For #40724.
Change-Id: I5171e60cda6789031f2ef034cc2e7c5d62459122
Reviewed-on: https://go-review.googlesource.com/c/go/+/302070
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-15 16:48:54 -04:00
|
|
|
// zero X15 manually
|
2021-01-29 13:46:34 -05:00
|
|
|
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = x86.REG_X15
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2021-08-30 22:42:17 +02:00
|
|
|
ssagen.AddAux2(&p.To, v, sc.Off64())
|
|
|
|
|
|
2020-04-13 09:59:20 -07:00
|
|
|
case ssa.OpAMD64MOVQstoreconstidx1, ssa.OpAMD64MOVQstoreconstidx8, ssa.OpAMD64MOVLstoreconstidx1, ssa.OpAMD64MOVLstoreconstidx4, ssa.OpAMD64MOVWstoreconstidx1, ssa.OpAMD64MOVWstoreconstidx2, ssa.OpAMD64MOVBstoreconstidx1,
|
|
|
|
|
ssa.OpAMD64ADDLconstmodifyidx1, ssa.OpAMD64ADDLconstmodifyidx4, ssa.OpAMD64ADDLconstmodifyidx8, ssa.OpAMD64ADDQconstmodifyidx1, ssa.OpAMD64ADDQconstmodifyidx8,
|
|
|
|
|
ssa.OpAMD64ANDLconstmodifyidx1, ssa.OpAMD64ANDLconstmodifyidx4, ssa.OpAMD64ANDLconstmodifyidx8, ssa.OpAMD64ANDQconstmodifyidx1, ssa.OpAMD64ANDQconstmodifyidx8,
|
|
|
|
|
ssa.OpAMD64ORLconstmodifyidx1, ssa.OpAMD64ORLconstmodifyidx4, ssa.OpAMD64ORLconstmodifyidx8, ssa.OpAMD64ORQconstmodifyidx1, ssa.OpAMD64ORQconstmodifyidx8,
|
|
|
|
|
ssa.OpAMD64XORLconstmodifyidx1, ssa.OpAMD64XORLconstmodifyidx4, ssa.OpAMD64XORLconstmodifyidx8, ssa.OpAMD64XORQconstmodifyidx1, ssa.OpAMD64XORQconstmodifyidx8:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
sc := v.AuxValAndOff()
|
2021-03-05 11:22:13 +01:00
|
|
|
p.From.Offset = sc.Val64()
|
2020-04-13 09:59:20 -07:00
|
|
|
switch {
|
|
|
|
|
case p.As == x86.AADDQ && p.From.Offset == 1:
|
|
|
|
|
p.As = x86.AINCQ
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
case p.As == x86.AADDQ && p.From.Offset == -1:
|
|
|
|
|
p.As = x86.ADECQ
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
case p.As == x86.AADDL && p.From.Offset == 1:
|
|
|
|
|
p.As = x86.AINCL
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
case p.As == x86.AADDL && p.From.Offset == -1:
|
|
|
|
|
p.As = x86.ADECL
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
}
|
2019-03-09 12:41:34 -08:00
|
|
|
memIdx(&p.To, v)
|
2021-03-05 11:22:13 +01:00
|
|
|
ssagen.AddAux2(&p.To, v, sc.Off64())
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64MOVLQSX, ssa.OpAMD64MOVWQSX, ssa.OpAMD64MOVBQSX, ssa.OpAMD64MOVLQZX, ssa.OpAMD64MOVWQZX, ssa.OpAMD64MOVBQZX,
|
|
|
|
|
ssa.OpAMD64CVTTSS2SL, ssa.OpAMD64CVTTSD2SL, ssa.OpAMD64CVTTSS2SQ, ssa.OpAMD64CVTTSD2SQ,
|
2024-11-04 12:41:33 -05:00
|
|
|
ssa.OpAMD64CVTSS2SD, ssa.OpAMD64CVTSD2SS, ssa.OpAMD64VPBROADCASTB, ssa.OpAMD64PMOVMSKB:
|
2017-03-20 08:01:28 -07:00
|
|
|
opregreg(s, v.Op.Asm(), v.Reg(), v.Args[0].Reg())
|
2016-10-19 20:21:42 +03:00
|
|
|
case ssa.OpAMD64CVTSL2SD, ssa.OpAMD64CVTSQ2SD, ssa.OpAMD64CVTSQ2SS, ssa.OpAMD64CVTSL2SS:
|
|
|
|
|
r := v.Reg()
|
|
|
|
|
// Break false dependency on destination register.
|
2017-03-20 08:01:28 -07:00
|
|
|
opregreg(s, x86.AXORPS, r, r)
|
|
|
|
|
opregreg(s, v.Op.Asm(), r, v.Args[0].Reg())
|
2018-10-08 02:20:03 +00:00
|
|
|
case ssa.OpAMD64MOVQi2f, ssa.OpAMD64MOVQf2i, ssa.OpAMD64MOVLi2f, ssa.OpAMD64MOVLf2i:
|
|
|
|
|
var p *obj.Prog
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64MOVQi2f, ssa.OpAMD64MOVQf2i:
|
|
|
|
|
p = s.Prog(x86.AMOVQ)
|
|
|
|
|
case ssa.OpAMD64MOVLi2f, ssa.OpAMD64MOVLf2i:
|
|
|
|
|
p = s.Prog(x86.AMOVL)
|
|
|
|
|
}
|
cmd/compile,math: improve code generation for math.Abs
Implement int reg <-> fp reg moves on amd64.
If we see a load to int reg followed by an int->fp move, then we can just
load to the fp reg instead. Same for stores.
math.Abs is now:
MOVQ "".x+8(SP), AX
SHLQ $1, AX
SHRQ $1, AX
MOVQ AX, "".~r1+16(SP)
math.Copysign is now:
MOVQ "".x+8(SP), AX
SHLQ $1, AX
SHRQ $1, AX
MOVQ "".y+16(SP), CX
SHRQ $63, CX
SHLQ $63, CX
ORQ CX, AX
MOVQ AX, "".~r2+24(SP)
math.Float64bits is now:
MOVSD "".x+8(SP), X0
MOVSD X0, "".~r1+16(SP)
(it would be nicer to use a non-SSE reg for this, nothing is perfect)
And due to the fix for #21440, the inlined version of these improve as well.
name old time/op new time/op delta
Abs 1.38ns ± 5% 0.89ns ±10% -35.54% (p=0.000 n=10+10)
Copysign 1.56ns ± 7% 1.35ns ± 6% -13.77% (p=0.000 n=9+10)
Fixes #13095
Change-Id: Ibd7f2792412a6668608780b0688a77062e1f1499
Reviewed-on: https://go-review.googlesource.com/58732
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2017-08-24 13:19:40 -07:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2018-05-08 09:11:00 -07:00
|
|
|
case ssa.OpAMD64ADDQload, ssa.OpAMD64ADDLload, ssa.OpAMD64SUBQload, ssa.OpAMD64SUBLload,
|
|
|
|
|
ssa.OpAMD64ANDQload, ssa.OpAMD64ANDLload, ssa.OpAMD64ORQload, ssa.OpAMD64ORLload,
|
|
|
|
|
ssa.OpAMD64XORQload, ssa.OpAMD64XORLload, ssa.OpAMD64ADDSDload, ssa.OpAMD64ADDSSload,
|
2018-06-21 10:14:18 +00:00
|
|
|
ssa.OpAMD64SUBSDload, ssa.OpAMD64SUBSSload, ssa.OpAMD64MULSDload, ssa.OpAMD64MULSSload,
|
|
|
|
|
ssa.OpAMD64DIVSDload, ssa.OpAMD64DIVSSload:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2017-02-10 13:17:20 -06:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = v.Args[1].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2017-02-10 13:17:20 -06:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2020-04-11 22:15:58 -07:00
|
|
|
case ssa.OpAMD64ADDLloadidx1, ssa.OpAMD64ADDLloadidx4, ssa.OpAMD64ADDLloadidx8, ssa.OpAMD64ADDQloadidx1, ssa.OpAMD64ADDQloadidx8,
|
|
|
|
|
ssa.OpAMD64SUBLloadidx1, ssa.OpAMD64SUBLloadidx4, ssa.OpAMD64SUBLloadidx8, ssa.OpAMD64SUBQloadidx1, ssa.OpAMD64SUBQloadidx8,
|
|
|
|
|
ssa.OpAMD64ANDLloadidx1, ssa.OpAMD64ANDLloadidx4, ssa.OpAMD64ANDLloadidx8, ssa.OpAMD64ANDQloadidx1, ssa.OpAMD64ANDQloadidx8,
|
|
|
|
|
ssa.OpAMD64ORLloadidx1, ssa.OpAMD64ORLloadidx4, ssa.OpAMD64ORLloadidx8, ssa.OpAMD64ORQloadidx1, ssa.OpAMD64ORQloadidx8,
|
2020-07-27 09:45:21 -07:00
|
|
|
ssa.OpAMD64XORLloadidx1, ssa.OpAMD64XORLloadidx4, ssa.OpAMD64XORLloadidx8, ssa.OpAMD64XORQloadidx1, ssa.OpAMD64XORQloadidx8,
|
|
|
|
|
ssa.OpAMD64ADDSSloadidx1, ssa.OpAMD64ADDSSloadidx4, ssa.OpAMD64ADDSDloadidx1, ssa.OpAMD64ADDSDloadidx8,
|
|
|
|
|
ssa.OpAMD64SUBSSloadidx1, ssa.OpAMD64SUBSSloadidx4, ssa.OpAMD64SUBSDloadidx1, ssa.OpAMD64SUBSDloadidx8,
|
|
|
|
|
ssa.OpAMD64MULSSloadidx1, ssa.OpAMD64MULSSloadidx4, ssa.OpAMD64MULSDloadidx1, ssa.OpAMD64MULSDloadidx8,
|
|
|
|
|
ssa.OpAMD64DIVSSloadidx1, ssa.OpAMD64DIVSSloadidx4, ssa.OpAMD64DIVSDloadidx1, ssa.OpAMD64DIVSDloadidx8:
|
2020-04-11 22:15:58 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
|
|
|
|
|
r, i := v.Args[1].Reg(), v.Args[2].Reg()
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Scale = v.Op.Scale()
|
|
|
|
|
if p.From.Scale == 1 && i == x86.REG_SP {
|
|
|
|
|
r, i = i, r
|
|
|
|
|
}
|
|
|
|
|
p.From.Reg = r
|
|
|
|
|
p.From.Index = i
|
|
|
|
|
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2020-04-11 22:15:58 -07:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64DUFFZERO:
|
2021-06-09 14:29:20 -04:00
|
|
|
if s.ABI != obj.ABIInternal {
|
all: explode GOEXPERIMENT=regabi into 5 sub-experiments
This separates GOEXPERIMENT=regabi into five sub-experiments:
regabiwrappers, regabig, regabireflect, regabidefer, and regabiargs.
Setting GOEXPERIMENT=regabi now implies the working subset of these
(currently, regabiwrappers, regabig, and regabireflect).
This simplifies testing, helps derisk the register ABI project,
and will also help with performance comparisons.
This replaces the -abiwrap flag to the compiler and linker with
the regabiwrappers experiment.
As part of this, regabiargs now enables registers for all calls
in the compiler. Previously, this was statically disabled in
regabiEnabledForAllCompilation, but now that we can control it
independently, this isn't necessary.
For #40724.
Change-Id: I5171e60cda6789031f2ef034cc2e7c5d62459122
Reviewed-on: https://go-review.googlesource.com/c/go/+/302070
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-15 16:48:54 -04:00
|
|
|
// zero X15 manually
|
2021-01-29 13:46:34 -05:00
|
|
|
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
|
|
|
|
}
|
2016-07-28 12:22:49 -04:00
|
|
|
off := duffStart(v.AuxInt)
|
|
|
|
|
adj := duffAdj(v.AuxInt)
|
|
|
|
|
var p *obj.Prog
|
|
|
|
|
if adj != 0 {
|
2017-08-09 14:50:58 -05:00
|
|
|
p = s.Prog(x86.ALEAQ)
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
2016-07-28 12:22:49 -04:00
|
|
|
p.From.Offset = adj
|
2017-08-09 14:50:58 -05:00
|
|
|
p.From.Reg = x86.REG_DI
|
2016-07-28 12:22:49 -04:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x86.REG_DI
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p = s.Prog(obj.ADUFFZERO)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_ADDR
|
[dev.regabi] cmd/compile: group known symbols, packages, names [generated]
There are a handful of pre-computed magic symbols known by
package gc, and we need a place to store them.
If we keep them together, the need for type *ir.Name means that
package ir is the lowest package in the import hierarchy that they
can go in. And package ir needs gopkg for methodSymSuffix
(in a later CL), so they can't go any higher either, at least not all together.
So package ir it is.
Rather than dump them all into the top-level package ir
namespace, however, we introduce global structs, Syms, Pkgs, and Names,
and make the known symbols, packages, and names fields of those.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add go.go:$ \
// Names holds known names. \
var Names struct{} \
\
// Syms holds known symbols. \
var Syms struct {} \
\
// Pkgs holds known packages. \
var Pkgs struct {} \
mv staticuint64s Names.Staticuint64s
mv zerobase Names.Zerobase
mv assertE2I Syms.AssertE2I
mv assertE2I2 Syms.AssertE2I2
mv assertI2I Syms.AssertI2I
mv assertI2I2 Syms.AssertI2I2
mv deferproc Syms.Deferproc
mv deferprocStack Syms.DeferprocStack
mv Deferreturn Syms.Deferreturn
mv Duffcopy Syms.Duffcopy
mv Duffzero Syms.Duffzero
mv gcWriteBarrier Syms.GCWriteBarrier
mv goschedguarded Syms.Goschedguarded
mv growslice Syms.Growslice
mv msanread Syms.Msanread
mv msanwrite Syms.Msanwrite
mv msanmove Syms.Msanmove
mv newobject Syms.Newobject
mv newproc Syms.Newproc
mv panicdivide Syms.Panicdivide
mv panicshift Syms.Panicshift
mv panicdottypeE Syms.PanicdottypeE
mv panicdottypeI Syms.PanicdottypeI
mv panicnildottype Syms.Panicnildottype
mv panicoverflow Syms.Panicoverflow
mv raceread Syms.Raceread
mv racereadrange Syms.Racereadrange
mv racewrite Syms.Racewrite
mv racewriterange Syms.Racewriterange
mv SigPanic Syms.SigPanic
mv typedmemclr Syms.Typedmemclr
mv typedmemmove Syms.Typedmemmove
mv Udiv Syms.Udiv
mv writeBarrier Syms.WriteBarrier
mv zerobaseSym Syms.Zerobase
mv arm64HasATOMICS Syms.ARM64HasATOMICS
mv armHasVFPv4 Syms.ARMHasVFPv4
mv x86HasFMA Syms.X86HasFMA
mv x86HasPOPCNT Syms.X86HasPOPCNT
mv x86HasSSE41 Syms.X86HasSSE41
mv WasmDiv Syms.WasmDiv
mv WasmMove Syms.WasmMove
mv WasmZero Syms.WasmZero
mv WasmTruncS Syms.WasmTruncS
mv WasmTruncU Syms.WasmTruncU
mv gopkg Pkgs.Go
mv itabpkg Pkgs.Itab
mv itablinkpkg Pkgs.Itablink
mv mappkg Pkgs.Map
mv msanpkg Pkgs.Msan
mv racepkg Pkgs.Race
mv Runtimepkg Pkgs.Runtime
mv trackpkg Pkgs.Track
mv unsafepkg Pkgs.Unsafe
mv Names Syms Pkgs symtab.go
mv symtab.go cmd/compile/internal/ir
'
Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/279235
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:10:25 -05:00
|
|
|
p.To.Sym = ir.Syms.Duffzero
|
2016-07-28 12:22:49 -04:00
|
|
|
p.To.Offset = off
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64DUFFCOPY:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(obj.ADUFFCOPY)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_ADDR
|
[dev.regabi] cmd/compile: group known symbols, packages, names [generated]
There are a handful of pre-computed magic symbols known by
package gc, and we need a place to store them.
If we keep them together, the need for type *ir.Name means that
package ir is the lowest package in the import hierarchy that they
can go in. And package ir needs gopkg for methodSymSuffix
(in a later CL), so they can't go any higher either, at least not all together.
So package ir it is.
Rather than dump them all into the top-level package ir
namespace, however, we introduce global structs, Syms, Pkgs, and Names,
and make the known symbols, packages, and names fields of those.
[git-generate]
cd src/cmd/compile/internal/gc
rf '
add go.go:$ \
// Names holds known names. \
var Names struct{} \
\
// Syms holds known symbols. \
var Syms struct {} \
\
// Pkgs holds known packages. \
var Pkgs struct {} \
mv staticuint64s Names.Staticuint64s
mv zerobase Names.Zerobase
mv assertE2I Syms.AssertE2I
mv assertE2I2 Syms.AssertE2I2
mv assertI2I Syms.AssertI2I
mv assertI2I2 Syms.AssertI2I2
mv deferproc Syms.Deferproc
mv deferprocStack Syms.DeferprocStack
mv Deferreturn Syms.Deferreturn
mv Duffcopy Syms.Duffcopy
mv Duffzero Syms.Duffzero
mv gcWriteBarrier Syms.GCWriteBarrier
mv goschedguarded Syms.Goschedguarded
mv growslice Syms.Growslice
mv msanread Syms.Msanread
mv msanwrite Syms.Msanwrite
mv msanmove Syms.Msanmove
mv newobject Syms.Newobject
mv newproc Syms.Newproc
mv panicdivide Syms.Panicdivide
mv panicshift Syms.Panicshift
mv panicdottypeE Syms.PanicdottypeE
mv panicdottypeI Syms.PanicdottypeI
mv panicnildottype Syms.Panicnildottype
mv panicoverflow Syms.Panicoverflow
mv raceread Syms.Raceread
mv racereadrange Syms.Racereadrange
mv racewrite Syms.Racewrite
mv racewriterange Syms.Racewriterange
mv SigPanic Syms.SigPanic
mv typedmemclr Syms.Typedmemclr
mv typedmemmove Syms.Typedmemmove
mv Udiv Syms.Udiv
mv writeBarrier Syms.WriteBarrier
mv zerobaseSym Syms.Zerobase
mv arm64HasATOMICS Syms.ARM64HasATOMICS
mv armHasVFPv4 Syms.ARMHasVFPv4
mv x86HasFMA Syms.X86HasFMA
mv x86HasPOPCNT Syms.X86HasPOPCNT
mv x86HasSSE41 Syms.X86HasSSE41
mv WasmDiv Syms.WasmDiv
mv WasmMove Syms.WasmMove
mv WasmZero Syms.WasmZero
mv WasmTruncS Syms.WasmTruncS
mv WasmTruncU Syms.WasmTruncU
mv gopkg Pkgs.Go
mv itabpkg Pkgs.Itab
mv itablinkpkg Pkgs.Itablink
mv mappkg Pkgs.Map
mv msanpkg Pkgs.Msan
mv racepkg Pkgs.Race
mv Runtimepkg Pkgs.Runtime
mv trackpkg Pkgs.Track
mv unsafepkg Pkgs.Unsafe
mv Names Syms Pkgs symtab.go
mv symtab.go cmd/compile/internal/ir
'
Change-Id: Ic143862148569a3bcde8e70b26d75421aa2d00f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/279235
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2020-12-23 00:10:25 -05:00
|
|
|
p.To.Sym = ir.Syms.Duffcopy
|
2020-04-23 13:11:00 -07:00
|
|
|
if v.AuxInt%16 != 0 {
|
|
|
|
|
v.Fatalf("bad DUFFCOPY AuxInt %v", v.AuxInt)
|
|
|
|
|
}
|
|
|
|
|
p.To.Offset = 14 * (64 - v.AuxInt/16)
|
|
|
|
|
// 14 and 64 are magic constants. 14 is the number of bytes to encode:
|
|
|
|
|
// MOVUPS (SI), X0
|
|
|
|
|
// ADDQ $16, SI
|
|
|
|
|
// MOVUPS X0, (DI)
|
|
|
|
|
// ADDQ $16, DI
|
|
|
|
|
// and 64 is the number of such blocks. See src/runtime/duff_amd64.s:duffcopy.
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2017-08-24 11:31:58 -07:00
|
|
|
case ssa.OpCopy: // TODO: use MOVQreg for reg->reg copies instead of OpCopy?
|
2016-04-21 10:02:36 -07:00
|
|
|
if v.Type.IsMemory() {
|
|
|
|
|
return
|
|
|
|
|
}
|
2016-09-16 09:36:00 -07:00
|
|
|
x := v.Args[0].Reg()
|
|
|
|
|
y := v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
if x != y {
|
2017-03-20 08:01:28 -07:00
|
|
|
opregreg(s, moveByType(v.Type), y, x)
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
case ssa.OpLoadReg:
|
|
|
|
|
if v.Type.IsFlags() {
|
2016-09-14 10:01:05 -07:00
|
|
|
v.Fatalf("load flags not implemented: %v", v.LongString())
|
2016-03-12 14:07:40 -08:00
|
|
|
return
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(loadByType(v.Type))
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddrAuto(&p.From, v.Args[0])
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
case ssa.OpStoreReg:
|
|
|
|
|
if v.Type.IsFlags() {
|
2016-09-14 10:01:05 -07:00
|
|
|
v.Fatalf("store flags not implemented: %v", v.LongString())
|
2016-03-12 14:07:40 -08:00
|
|
|
return
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(storeByType(v.Type))
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddrAuto(&p.To, v)
|
2019-12-19 10:58:28 -08:00
|
|
|
case ssa.OpAMD64LoweredHasCPUFeature:
|
2022-09-11 14:47:36 +02:00
|
|
|
p := s.Prog(x86.AMOVBLZX)
|
2019-12-19 10:58:28 -08:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2019-12-19 10:58:28 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2021-02-19 17:11:40 -05:00
|
|
|
case ssa.OpArgIntReg, ssa.OpArgFloatReg:
|
|
|
|
|
// The assembler needs to wrap the entry safepoint/stack growth code with spill/unspill
|
|
|
|
|
// The loop only runs once.
|
|
|
|
|
for _, ap := range v.Block.Func.RegArgs {
|
|
|
|
|
// Pass the spill/unspill information along to the assembler, offset by size of return PC pushed on stack.
|
2021-04-16 00:15:31 -04:00
|
|
|
addr := ssagen.SpillSlotAddr(ap, x86.REG_SP, v.Block.Func.Config.PtrSize)
|
2021-02-19 17:11:40 -05:00
|
|
|
s.FuncInfo().AddSpill(
|
2021-04-16 00:15:31 -04:00
|
|
|
obj.RegSpill{Reg: ap.Reg, Addr: addr, Unspill: loadByType(ap.Type), Spill: storeByType(ap.Type)})
|
2021-02-19 17:11:40 -05:00
|
|
|
}
|
|
|
|
|
v.Block.Func.RegArgs = nil
|
|
|
|
|
ssagen.CheckArgReg(v)
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64LoweredGetClosurePtr:
|
2016-07-03 13:40:03 -07:00
|
|
|
// Closure pointer is DX.
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.CheckLoweredGetClosurePtr(v)
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64LoweredGetG:
|
2021-06-09 14:29:20 -04:00
|
|
|
if s.ABI == obj.ABIInternal {
|
2021-04-01 11:11:04 -04:00
|
|
|
v.Fatalf("LoweredGetG should not appear in ABIInternal")
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
2021-02-02 18:20:16 -05:00
|
|
|
r := v.Reg()
|
|
|
|
|
getgFromTLS(s, r)
|
cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
|
|
|
case ssa.OpAMD64CALLstatic, ssa.OpAMD64CALLtail:
|
2021-06-09 14:29:20 -04:00
|
|
|
if s.ABI == obj.ABI0 && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABIInternal {
|
2021-01-29 13:46:34 -05:00
|
|
|
// zeroing X15 when entering ABIInternal from ABI0
|
2025-03-04 15:01:54 -05:00
|
|
|
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
2021-02-02 18:20:16 -05:00
|
|
|
// set G register from TLS
|
|
|
|
|
getgFromTLS(s, x86.REG_R14)
|
2021-01-29 13:46:34 -05:00
|
|
|
}
|
cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
|
|
|
if v.Op == ssa.OpAMD64CALLtail {
|
|
|
|
|
s.TailCall(v)
|
|
|
|
|
break
|
|
|
|
|
}
|
2021-01-29 13:46:34 -05:00
|
|
|
s.Call(v)
|
2021-06-09 14:29:20 -04:00
|
|
|
if s.ABI == obj.ABIInternal && v.Aux.(*ssa.AuxCall).Fn.ABI() == obj.ABI0 {
|
2021-01-29 13:46:34 -05:00
|
|
|
// zeroing X15 when entering ABIInternal from ABI0
|
2025-03-04 15:01:54 -05:00
|
|
|
opregreg(s, x86.AXORPS, x86.REG_X15, x86.REG_X15)
|
2021-02-02 18:20:16 -05:00
|
|
|
// set G register from TLS
|
|
|
|
|
getgFromTLS(s, x86.REG_R14)
|
2021-01-29 13:46:34 -05:00
|
|
|
}
|
|
|
|
|
case ssa.OpAMD64CALLclosure, ssa.OpAMD64CALLinter:
|
2017-03-10 18:34:41 -08:00
|
|
|
s.Call(v)
|
2016-10-24 10:25:05 -04:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64LoweredGetCallerPC:
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Offset = -8 // PC is stored 8 bytes below first parameter.
|
|
|
|
|
p.From.Name = obj.NAME_PARAM
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
|
2017-10-09 15:33:29 -04:00
|
|
|
case ssa.OpAMD64LoweredGetCallerSP:
|
|
|
|
|
// caller's SP is the address of the first arg
|
|
|
|
|
mov := x86.AMOVQ
|
2020-12-23 00:39:45 -05:00
|
|
|
if types.PtrSize == 4 {
|
2017-10-09 15:33:29 -04:00
|
|
|
mov = x86.AMOVL
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(mov)
|
|
|
|
|
p.From.Type = obj.TYPE_ADDR
|
2022-04-18 13:41:08 -04:00
|
|
|
p.From.Offset = -base.Ctxt.Arch.FixedFrameSize // 0 on amd64, just to be consistent with other architectures
|
2017-10-09 15:33:29 -04:00
|
|
|
p.From.Name = obj.NAME_PARAM
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
|
2017-10-26 12:33:04 -04:00
|
|
|
case ssa.OpAMD64LoweredWB:
|
|
|
|
|
p := s.Prog(obj.ACALL)
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Name = obj.NAME_EXTERN
|
2022-11-01 16:46:43 -07:00
|
|
|
// AuxInt encodes how many buffer entries we need.
|
|
|
|
|
p.To.Sym = ir.Syms.GCWriteBarrier[v.AuxInt-1]
|
2017-10-26 12:33:04 -04:00
|
|
|
|
2025-06-18 15:14:00 -07:00
|
|
|
case ssa.OpAMD64LoweredPanicBoundsRR, ssa.OpAMD64LoweredPanicBoundsRC, ssa.OpAMD64LoweredPanicBoundsCR, ssa.OpAMD64LoweredPanicBoundsCC:
|
|
|
|
|
// Compute the constant we put in the PCData entry for this call.
|
|
|
|
|
code, signed := ssa.BoundsKind(v.AuxInt).Code()
|
|
|
|
|
xIsReg := false
|
|
|
|
|
yIsReg := false
|
|
|
|
|
xVal := 0
|
|
|
|
|
yVal := 0
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64LoweredPanicBoundsRR:
|
|
|
|
|
xIsReg = true
|
|
|
|
|
xVal = int(v.Args[0].Reg() - x86.REG_AX)
|
|
|
|
|
yIsReg = true
|
|
|
|
|
yVal = int(v.Args[1].Reg() - x86.REG_AX)
|
|
|
|
|
case ssa.OpAMD64LoweredPanicBoundsRC:
|
|
|
|
|
xIsReg = true
|
|
|
|
|
xVal = int(v.Args[0].Reg() - x86.REG_AX)
|
|
|
|
|
c := v.Aux.(ssa.PanicBoundsC).C
|
|
|
|
|
if c >= 0 && c <= abi.BoundsMaxConst {
|
|
|
|
|
yVal = int(c)
|
|
|
|
|
} else {
|
|
|
|
|
// Move constant to a register
|
|
|
|
|
yIsReg = true
|
|
|
|
|
if yVal == xVal {
|
|
|
|
|
yVal = 1
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = c
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x86.REG_AX + int16(yVal)
|
|
|
|
|
}
|
|
|
|
|
case ssa.OpAMD64LoweredPanicBoundsCR:
|
|
|
|
|
yIsReg = true
|
|
|
|
|
yVal := int(v.Args[0].Reg() - x86.REG_AX)
|
|
|
|
|
c := v.Aux.(ssa.PanicBoundsC).C
|
|
|
|
|
if c >= 0 && c <= abi.BoundsMaxConst {
|
|
|
|
|
xVal = int(c)
|
|
|
|
|
} else {
|
|
|
|
|
// Move constant to a register
|
|
|
|
|
xIsReg = true
|
|
|
|
|
if xVal == yVal {
|
|
|
|
|
xVal = 1
|
|
|
|
|
}
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = c
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x86.REG_AX + int16(xVal)
|
|
|
|
|
}
|
|
|
|
|
case ssa.OpAMD64LoweredPanicBoundsCC:
|
|
|
|
|
c := v.Aux.(ssa.PanicBoundsCC).Cx
|
|
|
|
|
if c >= 0 && c <= abi.BoundsMaxConst {
|
|
|
|
|
xVal = int(c)
|
|
|
|
|
} else {
|
|
|
|
|
// Move constant to a register
|
|
|
|
|
xIsReg = true
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = c
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x86.REG_AX + int16(xVal)
|
|
|
|
|
}
|
|
|
|
|
c = v.Aux.(ssa.PanicBoundsCC).Cy
|
|
|
|
|
if c >= 0 && c <= abi.BoundsMaxConst {
|
|
|
|
|
yVal = int(c)
|
|
|
|
|
} else {
|
|
|
|
|
// Move constant to a register
|
|
|
|
|
yIsReg = true
|
|
|
|
|
yVal = 1
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = c
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = x86.REG_AX + int16(yVal)
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
c := abi.BoundsEncode(code, signed, xIsReg, yIsReg, xVal, yVal)
|
|
|
|
|
|
|
|
|
|
p := s.Prog(obj.APCDATA)
|
|
|
|
|
p.From.SetConst(abi.PCDATA_PanicBounds)
|
|
|
|
|
p.To.SetConst(int64(c))
|
|
|
|
|
p = s.Prog(obj.ACALL)
|
2019-02-06 14:12:36 -08:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Name = obj.NAME_EXTERN
|
2025-06-18 15:14:00 -07:00
|
|
|
p.To.Sym = ir.Syms.PanicBounds
|
2019-02-06 14:12:36 -08:00
|
|
|
|
2016-04-22 13:09:18 -07:00
|
|
|
case ssa.OpAMD64NEGQ, ssa.OpAMD64NEGL,
|
2016-03-11 00:10:52 -05:00
|
|
|
ssa.OpAMD64BSWAPQ, ssa.OpAMD64BSWAPL,
|
2016-04-22 13:09:18 -07:00
|
|
|
ssa.OpAMD64NOTQ, ssa.OpAMD64NOTL:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg()
|
2018-10-23 14:05:38 -07:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64NEGLflags:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.To.Reg = v.Reg0()
|
2018-10-23 14:05:38 -07:00
|
|
|
|
2025-05-04 10:34:41 -07:00
|
|
|
case ssa.OpAMD64ADDQconstflags, ssa.OpAMD64ADDLconstflags:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = v.AuxInt
|
|
|
|
|
// Note: the inc/dec instructions do not modify
|
|
|
|
|
// the carry flag like add$1 / sub$1 do.
|
|
|
|
|
// We currently never use the CF/OF flags from
|
|
|
|
|
// these instructions, so that is ok.
|
|
|
|
|
switch {
|
|
|
|
|
case p.As == x86.AADDQ && p.From.Offset == 1:
|
|
|
|
|
p.As = x86.AINCQ
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
case p.As == x86.AADDQ && p.From.Offset == -1:
|
|
|
|
|
p.As = x86.ADECQ
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
case p.As == x86.AADDL && p.From.Offset == 1:
|
|
|
|
|
p.As = x86.AINCL
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
case p.As == x86.AADDL && p.From.Offset == -1:
|
|
|
|
|
p.As = x86.ADECL
|
|
|
|
|
p.From.Type = obj.TYPE_NONE
|
|
|
|
|
}
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg0()
|
|
|
|
|
|
2020-12-07 19:15:15 +08:00
|
|
|
case ssa.OpAMD64BSFQ, ssa.OpAMD64BSRQ, ssa.OpAMD64BSFL, ssa.OpAMD64BSRL, ssa.OpAMD64SQRTSD, ssa.OpAMD64SQRTSS:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2018-10-08 02:20:03 +00:00
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64BSFQ, ssa.OpAMD64BSRQ:
|
|
|
|
|
p.To.Reg = v.Reg0()
|
2020-12-07 19:15:15 +08:00
|
|
|
case ssa.OpAMD64BSFL, ssa.OpAMD64BSRL, ssa.OpAMD64SQRTSD, ssa.OpAMD64SQRTSS:
|
2018-10-08 02:20:03 +00:00
|
|
|
p.To.Reg = v.Reg()
|
|
|
|
|
}
|
2025-02-02 23:42:43 +01:00
|
|
|
case ssa.OpAMD64LoweredRound32F, ssa.OpAMD64LoweredRound64F:
|
|
|
|
|
// input is already rounded
|
2017-10-05 15:45:46 -05:00
|
|
|
case ssa.OpAMD64ROUNDSD:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
val := v.AuxInt
|
2017-10-31 16:49:27 -05:00
|
|
|
// 0 means math.RoundToEven, 1 Floor, 2 Ceil, 3 Trunc
|
2020-10-30 11:55:18 +01:00
|
|
|
if val < 0 || val > 3 {
|
2017-10-05 15:45:46 -05:00
|
|
|
v.Fatalf("Invalid rounding mode")
|
|
|
|
|
}
|
|
|
|
|
p.From.Offset = val
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
2023-04-12 11:23:13 +08:00
|
|
|
p.AddRestSourceReg(v.Args[0].Reg())
|
2017-10-05 15:45:46 -05:00
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2021-12-06 18:46:25 +03:00
|
|
|
case ssa.OpAMD64POPCNTQ, ssa.OpAMD64POPCNTL,
|
2022-03-30 21:44:44 +08:00
|
|
|
ssa.OpAMD64TZCNTQ, ssa.OpAMD64TZCNTL,
|
|
|
|
|
ssa.OpAMD64LZCNTQ, ssa.OpAMD64LZCNTL:
|
2017-03-16 21:33:03 -07:00
|
|
|
if v.Args[0].Reg() != v.Reg() {
|
2021-12-06 18:46:25 +03:00
|
|
|
// POPCNT/TZCNT/LZCNT have a false dependency on the destination register on Intel cpus.
|
|
|
|
|
// TZCNT/LZCNT problem affects pre-Skylake models. See discussion at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62011#c7.
|
2017-12-07 10:56:45 -06:00
|
|
|
// Xor register with itself to break the dependency.
|
2021-12-06 11:46:57 +03:00
|
|
|
opregreg(s, x86.AXORL, v.Reg(), v.Reg())
|
2017-03-16 21:33:03 -07:00
|
|
|
}
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2017-10-03 14:12:00 -05:00
|
|
|
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64SETEQ, ssa.OpAMD64SETNE,
|
|
|
|
|
ssa.OpAMD64SETL, ssa.OpAMD64SETLE,
|
|
|
|
|
ssa.OpAMD64SETG, ssa.OpAMD64SETGE,
|
|
|
|
|
ssa.OpAMD64SETGF, ssa.OpAMD64SETGEF,
|
|
|
|
|
ssa.OpAMD64SETB, ssa.OpAMD64SETBE,
|
|
|
|
|
ssa.OpAMD64SETORD, ssa.OpAMD64SETNAN,
|
2018-01-27 11:55:34 +01:00
|
|
|
ssa.OpAMD64SETA, ssa.OpAMD64SETAE,
|
|
|
|
|
ssa.OpAMD64SETO:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
|
2018-05-08 09:11:00 -07:00
|
|
|
case ssa.OpAMD64SETEQstore, ssa.OpAMD64SETNEstore,
|
|
|
|
|
ssa.OpAMD64SETLstore, ssa.OpAMD64SETLEstore,
|
|
|
|
|
ssa.OpAMD64SETGstore, ssa.OpAMD64SETGEstore,
|
|
|
|
|
ssa.OpAMD64SETBstore, ssa.OpAMD64SETBEstore,
|
|
|
|
|
ssa.OpAMD64SETAstore, ssa.OpAMD64SETAEstore:
|
2017-10-03 14:12:00 -05:00
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2017-10-03 14:12:00 -05:00
|
|
|
|
2023-07-17 10:21:07 -07:00
|
|
|
case ssa.OpAMD64SETEQstoreidx1, ssa.OpAMD64SETNEstoreidx1,
|
|
|
|
|
ssa.OpAMD64SETLstoreidx1, ssa.OpAMD64SETLEstoreidx1,
|
|
|
|
|
ssa.OpAMD64SETGstoreidx1, ssa.OpAMD64SETGEstoreidx1,
|
|
|
|
|
ssa.OpAMD64SETBstoreidx1, ssa.OpAMD64SETBEstoreidx1,
|
|
|
|
|
ssa.OpAMD64SETAstoreidx1, ssa.OpAMD64SETAEstoreidx1:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
memIdx(&p.To, v)
|
|
|
|
|
ssagen.AddAux(&p.To, v)
|
|
|
|
|
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64SETNEF:
|
2022-04-05 15:07:29 -07:00
|
|
|
t := v.RegTmp()
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2017-03-20 08:01:28 -07:00
|
|
|
q := s.Prog(x86.ASETPS)
|
2016-03-12 14:07:40 -08:00
|
|
|
q.To.Type = obj.TYPE_REG
|
2022-04-05 15:07:29 -07:00
|
|
|
q.To.Reg = t
|
2016-03-12 14:07:40 -08:00
|
|
|
// ORL avoids partial register write and is smaller than ORQ, used by old compiler
|
2022-04-05 15:07:29 -07:00
|
|
|
opregreg(s, x86.AORL, v.Reg(), t)
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64SETEQF:
|
2022-04-05 15:07:29 -07:00
|
|
|
t := v.RegTmp()
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg()
|
2017-03-20 08:01:28 -07:00
|
|
|
q := s.Prog(x86.ASETPC)
|
2016-03-12 14:07:40 -08:00
|
|
|
q.To.Type = obj.TYPE_REG
|
2022-04-05 15:07:29 -07:00
|
|
|
q.To.Reg = t
|
2016-03-12 14:07:40 -08:00
|
|
|
// ANDL avoids partial register write and is smaller than ANDQ, used by old compiler
|
2022-04-05 15:07:29 -07:00
|
|
|
opregreg(s, x86.AANDL, v.Reg(), t)
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
case ssa.OpAMD64InvertFlags:
|
2016-03-11 00:10:52 -05:00
|
|
|
v.Fatalf("InvertFlags should never make it to codegen %v", v.LongString())
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64FlagEQ, ssa.OpAMD64FlagLT_ULT, ssa.OpAMD64FlagLT_UGT, ssa.OpAMD64FlagGT_ULT, ssa.OpAMD64FlagGT_UGT:
|
2016-03-11 00:10:52 -05:00
|
|
|
v.Fatalf("Flag* ops should never make it to codegen %v", v.LongString())
|
2016-08-28 11:17:37 -07:00
|
|
|
case ssa.OpAMD64AddTupleFirst32, ssa.OpAMD64AddTupleFirst64:
|
|
|
|
|
v.Fatalf("AddTupleFirst* should never make it to codegen %v", v.LongString())
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64REPSTOSQ:
|
2017-03-20 08:01:28 -07:00
|
|
|
s.Prog(x86.AREP)
|
|
|
|
|
s.Prog(x86.ASTOSQ)
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64REPMOVSQ:
|
2017-03-20 08:01:28 -07:00
|
|
|
s.Prog(x86.AREP)
|
|
|
|
|
s.Prog(x86.AMOVSQ)
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.OpAMD64LoweredNilCheck:
|
|
|
|
|
// Issue a load which will fault if the input is nil.
|
|
|
|
|
// TODO: We currently use the 2-byte instruction TESTB AX, (reg).
|
2017-08-19 22:33:51 +02:00
|
|
|
// Should we use the 3-byte TESTB $0, (reg) instead? It is larger
|
2016-03-12 14:07:40 -08:00
|
|
|
// but it doesn't have false dependency on AX.
|
|
|
|
|
// Or maybe allocate an output register and use MOVL (reg),reg2 ?
|
|
|
|
|
// That trades clobbering flags for clobbering a register.
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(x86.ATESTB)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.From.Type = obj.TYPE_REG
|
|
|
|
|
p.From.Reg = x86.REG_AX
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[0].Reg()
|
cmd/compile: add framework for logging optimizer (non)actions to LSP
This is intended to allow IDEs to note where the optimizer
was not able to improve users' code. There may be other
applications for this, for example in studying effectiveness
of optimizer changes more quickly than running benchmarks,
or in verifying that code changes did not accidentally disable
optimizations in performance-critical code.
Logging of nilcheck (bad) for amd64 is implemented as
proof-of-concept. In general, the intent is that optimizations
that didn't happen are what will be logged, because that is
believed to be what IDE users want.
Added flag -json=version,dest
Check that version=0. (Future compilers will support a
few recent versions, I hope that version is always <=3.)
Dest is expected to be one of:
/path (or \path in Windows)
will create directory /path and fill it w/ json files
file://path
will create directory path, intended either for
I:\dont\know\enough\about\windows\paths
trustme_I_know_what_I_am_doing_probably_testing
Not passing an absolute path name usually leads to
json splattered all over source directories,
or failure when those directories are not writeable.
If you want a foot-gun, you have to ask for it.
The JSON output is directed to subdirectories of dest,
where each subdirectory is net/url.PathEscape of the
package name, and each for each foo.go in the package,
net/url.PathEscape(foo).json is created. The first line
of foo.json contains version and context information,
and subsequent lines contains LSP-conforming JSON
describing the missing optimizations.
Change-Id: Ib83176a53a8c177ee9081aefc5ae05604ccad8a0
Reviewed-on: https://go-review.googlesource.com/c/go/+/204338
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2019-10-24 13:48:17 -04:00
|
|
|
if logopt.Enabled() {
|
|
|
|
|
logopt.LogOpt(v.Pos, "nilcheck", "genssa", v.Block.Func.Name)
|
|
|
|
|
}
|
2020-11-19 20:49:23 -05:00
|
|
|
if base.Debug.Nil != 0 && v.Pos.Line() > 1 { // v.Pos.Line()==1 in generated wrappers
|
|
|
|
|
base.WarnfAt(v.Pos, "generated nil check")
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
2019-03-28 14:58:06 -04:00
|
|
|
case ssa.OpAMD64MOVBatomicload, ssa.OpAMD64MOVLatomicload, ssa.OpAMD64MOVQatomicload:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-08-23 16:49:28 -07:00
|
|
|
p.From.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.From, v)
|
2016-08-23 16:49:28 -07:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg0()
|
2019-10-23 10:20:49 -04:00
|
|
|
case ssa.OpAMD64XCHGB, ssa.OpAMD64XCHGL, ssa.OpAMD64XCHGQ:
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-08-23 16:49:28 -07:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.From.Reg = v.Reg0()
|
2016-08-23 16:49:28 -07:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[1].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2016-08-25 16:02:57 -07:00
|
|
|
case ssa.OpAMD64XADDLlock, ssa.OpAMD64XADDQlock:
|
2017-03-20 08:01:28 -07:00
|
|
|
s.Prog(x86.ALOCK)
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-08-25 16:02:57 -07:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2021-01-07 19:08:37 -08:00
|
|
|
p.From.Reg = v.Reg0()
|
2016-08-25 16:02:57 -07:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[1].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2016-08-25 16:02:57 -07:00
|
|
|
case ssa.OpAMD64CMPXCHGLlock, ssa.OpAMD64CMPXCHGQlock:
|
2016-09-16 09:36:00 -07:00
|
|
|
if v.Args[1].Reg() != x86.REG_AX {
|
2016-08-25 16:02:57 -07:00
|
|
|
v.Fatalf("input[1] not in AX %s", v.LongString())
|
|
|
|
|
}
|
2017-03-20 08:01:28 -07:00
|
|
|
s.Prog(x86.ALOCK)
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-08-25 16:02:57 -07:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[2].Reg()
|
2016-08-25 16:02:57 -07:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2017-03-20 08:01:28 -07:00
|
|
|
p = s.Prog(x86.ASETEQ)
|
2016-08-25 16:02:57 -07:00
|
|
|
p.To.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Reg0()
|
2024-06-25 14:56:11 -07:00
|
|
|
case ssa.OpAMD64ANDBlock, ssa.OpAMD64ANDLlock, ssa.OpAMD64ANDQlock, ssa.OpAMD64ORBlock, ssa.OpAMD64ORLlock, ssa.OpAMD64ORQlock:
|
|
|
|
|
// Atomic memory operations that don't need to return the old value.
|
2017-03-20 08:01:28 -07:00
|
|
|
s.Prog(x86.ALOCK)
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
2016-08-25 16:02:57 -07:00
|
|
|
p.From.Type = obj.TYPE_REG
|
2016-09-16 09:36:00 -07:00
|
|
|
p.From.Reg = v.Args[1].Reg()
|
2016-08-25 16:02:57 -07:00
|
|
|
p.To.Type = obj.TYPE_MEM
|
2016-09-16 09:36:00 -07:00
|
|
|
p.To.Reg = v.Args[0].Reg()
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2024-06-25 14:56:11 -07:00
|
|
|
case ssa.OpAMD64LoweredAtomicAnd64, ssa.OpAMD64LoweredAtomicOr64, ssa.OpAMD64LoweredAtomicAnd32, ssa.OpAMD64LoweredAtomicOr32:
|
|
|
|
|
// Atomic memory operations that need to return the old value.
|
|
|
|
|
// We need to do these with compare-and-exchange to get access to the old value.
|
|
|
|
|
// loop:
|
|
|
|
|
// MOVQ mask, tmp
|
|
|
|
|
// MOVQ (addr), AX
|
|
|
|
|
// ANDQ AX, tmp
|
|
|
|
|
// LOCK CMPXCHGQ tmp, (addr) : note that AX is implicit old value to compare against
|
|
|
|
|
// JNE loop
|
|
|
|
|
// : result in AX
|
|
|
|
|
mov := x86.AMOVQ
|
|
|
|
|
op := x86.AANDQ
|
|
|
|
|
cmpxchg := x86.ACMPXCHGQ
|
|
|
|
|
switch v.Op {
|
|
|
|
|
case ssa.OpAMD64LoweredAtomicOr64:
|
|
|
|
|
op = x86.AORQ
|
|
|
|
|
case ssa.OpAMD64LoweredAtomicAnd32:
|
|
|
|
|
mov = x86.AMOVL
|
|
|
|
|
op = x86.AANDL
|
|
|
|
|
cmpxchg = x86.ACMPXCHGL
|
|
|
|
|
case ssa.OpAMD64LoweredAtomicOr32:
|
|
|
|
|
mov = x86.AMOVL
|
|
|
|
|
op = x86.AORL
|
|
|
|
|
cmpxchg = x86.ACMPXCHGL
|
|
|
|
|
}
|
|
|
|
|
addr := v.Args[0].Reg()
|
|
|
|
|
mask := v.Args[1].Reg()
|
|
|
|
|
tmp := v.RegTmp()
|
|
|
|
|
p1 := s.Prog(mov)
|
|
|
|
|
p1.From.Type = obj.TYPE_REG
|
|
|
|
|
p1.From.Reg = mask
|
|
|
|
|
p1.To.Type = obj.TYPE_REG
|
|
|
|
|
p1.To.Reg = tmp
|
|
|
|
|
p2 := s.Prog(mov)
|
|
|
|
|
p2.From.Type = obj.TYPE_MEM
|
|
|
|
|
p2.From.Reg = addr
|
|
|
|
|
ssagen.AddAux(&p2.From, v)
|
|
|
|
|
p2.To.Type = obj.TYPE_REG
|
|
|
|
|
p2.To.Reg = x86.REG_AX
|
|
|
|
|
p3 := s.Prog(op)
|
|
|
|
|
p3.From.Type = obj.TYPE_REG
|
|
|
|
|
p3.From.Reg = x86.REG_AX
|
|
|
|
|
p3.To.Type = obj.TYPE_REG
|
|
|
|
|
p3.To.Reg = tmp
|
|
|
|
|
s.Prog(x86.ALOCK)
|
|
|
|
|
p5 := s.Prog(cmpxchg)
|
|
|
|
|
p5.From.Type = obj.TYPE_REG
|
|
|
|
|
p5.From.Reg = tmp
|
|
|
|
|
p5.To.Type = obj.TYPE_MEM
|
|
|
|
|
p5.To.Reg = addr
|
|
|
|
|
ssagen.AddAux(&p5.To, v)
|
|
|
|
|
p6 := s.Prog(x86.AJNE)
|
|
|
|
|
p6.To.Type = obj.TYPE_BRANCH
|
|
|
|
|
p6.To.SetTarget(p1)
|
2021-06-15 14:04:30 +00:00
|
|
|
case ssa.OpAMD64PrefetchT0, ssa.OpAMD64PrefetchNTA:
|
|
|
|
|
p := s.Prog(v.Op.Asm())
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Reg = v.Args[0].Reg()
|
2016-06-08 22:02:08 -07:00
|
|
|
case ssa.OpClobber:
|
|
|
|
|
p := s.Prog(x86.AMOVL)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = 0xdeaddead
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = x86.REG_SP
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2016-06-08 22:02:08 -07:00
|
|
|
p = s.Prog(x86.AMOVL)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = 0xdeaddead
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = x86.REG_SP
|
2020-12-23 00:57:10 -05:00
|
|
|
ssagen.AddAux(&p.To, v)
|
2016-06-08 22:02:08 -07:00
|
|
|
p.To.Offset += 4
|
2021-03-17 19:15:38 -04:00
|
|
|
case ssa.OpClobberReg:
|
|
|
|
|
x := uint64(0xdeaddeaddeaddead)
|
|
|
|
|
p := s.Prog(x86.AMOVQ)
|
|
|
|
|
p.From.Type = obj.TYPE_CONST
|
|
|
|
|
p.From.Offset = int64(x)
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = v.Reg()
|
2016-03-12 14:07:40 -08:00
|
|
|
default:
|
2016-09-14 10:01:05 -07:00
|
|
|
v.Fatalf("genValue not implemented: %s", v.LongString())
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
|
|
|
|
|
var blockJump = [...]struct {
|
|
|
|
|
asm, invasm obj.As
|
|
|
|
|
}{
|
|
|
|
|
ssa.BlockAMD64EQ: {x86.AJEQ, x86.AJNE},
|
|
|
|
|
ssa.BlockAMD64NE: {x86.AJNE, x86.AJEQ},
|
|
|
|
|
ssa.BlockAMD64LT: {x86.AJLT, x86.AJGE},
|
|
|
|
|
ssa.BlockAMD64GE: {x86.AJGE, x86.AJLT},
|
|
|
|
|
ssa.BlockAMD64LE: {x86.AJLE, x86.AJGT},
|
|
|
|
|
ssa.BlockAMD64GT: {x86.AJGT, x86.AJLE},
|
2018-01-27 11:55:34 +01:00
|
|
|
ssa.BlockAMD64OS: {x86.AJOS, x86.AJOC},
|
|
|
|
|
ssa.BlockAMD64OC: {x86.AJOC, x86.AJOS},
|
2016-03-12 14:07:40 -08:00
|
|
|
ssa.BlockAMD64ULT: {x86.AJCS, x86.AJCC},
|
|
|
|
|
ssa.BlockAMD64UGE: {x86.AJCC, x86.AJCS},
|
|
|
|
|
ssa.BlockAMD64UGT: {x86.AJHI, x86.AJLS},
|
|
|
|
|
ssa.BlockAMD64ULE: {x86.AJLS, x86.AJHI},
|
|
|
|
|
ssa.BlockAMD64ORD: {x86.AJPC, x86.AJPS},
|
|
|
|
|
ssa.BlockAMD64NAN: {x86.AJPS, x86.AJPC},
|
|
|
|
|
}
|
|
|
|
|
|
2020-12-23 00:57:10 -05:00
|
|
|
var eqfJumps = [2][2]ssagen.IndexJump{
|
2016-04-29 09:02:27 -07:00
|
|
|
{{Jump: x86.AJNE, Index: 1}, {Jump: x86.AJPS, Index: 1}}, // next == b.Succs[0]
|
|
|
|
|
{{Jump: x86.AJNE, Index: 1}, {Jump: x86.AJPC, Index: 0}}, // next == b.Succs[1]
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
2020-12-23 00:57:10 -05:00
|
|
|
var nefJumps = [2][2]ssagen.IndexJump{
|
2016-04-29 09:02:27 -07:00
|
|
|
{{Jump: x86.AJNE, Index: 0}, {Jump: x86.AJPC, Index: 1}}, // next == b.Succs[0]
|
|
|
|
|
{{Jump: x86.AJNE, Index: 0}, {Jump: x86.AJPS, Index: 0}}, // next == b.Succs[1]
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
|
2020-12-23 00:57:10 -05:00
|
|
|
func ssaGenBlock(s *ssagen.State, b, next *ssa.Block) {
|
2016-03-12 14:07:40 -08:00
|
|
|
switch b.Kind {
|
2025-02-19 16:47:31 -05:00
|
|
|
case ssa.BlockPlain, ssa.BlockDefer:
|
2016-04-28 16:52:47 -07:00
|
|
|
if b.Succs[0].Block() != next {
|
2017-03-20 08:01:28 -07:00
|
|
|
p := s.Prog(obj.AJMP)
|
2016-03-12 14:07:40 -08:00
|
|
|
p.To.Type = obj.TYPE_BRANCH
|
2020-12-23 00:57:10 -05:00
|
|
|
s.Branches = append(s.Branches, ssagen.Branch{P: p, B: b.Succs[0].Block()})
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
|
|
|
case ssa.BlockExit, ssa.BlockRetJmp:
|
2016-03-12 14:07:40 -08:00
|
|
|
case ssa.BlockRet:
|
2017-03-20 08:01:28 -07:00
|
|
|
s.Prog(obj.ARET)
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
case ssa.BlockAMD64EQF:
|
cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.
Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.
Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.
This change also updates TestFormats to support using %#x.
Examples exhibiting where does the issue come from:
1: 'if x + 3 < 0' might be converted to:
before:
CMN $3, R0
BGE <else branch> // wrong branch is taken if 'x+3' overflows
after:
CMN $3, R0
BPL <else branch>
2: 'if y - 3 > 0' might be converted to:
before:
CMP $3, R0
BLE <else branch> // wrong branch is taken if 'y-3' underflows
after:
CMP $3, R0
BMI <else branch>
BEQ <else branch>
Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.
S1:
name old time/op new time/op delta
CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10)
CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10)
S2:
name old time/op new time/op delta
CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10)
CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10)
S3:
name old time/op new time/op delta
CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10)
CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9)
S4:
name old time/op new time/op delta
CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10)
CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10)
S5:
name old time/op new time/op delta
CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10)
CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9)
Go1 perf. data:
name old time/op new time/op delta
BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5)
Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5)
FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5)
FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5)
FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5)
FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5)
FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5)
FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5)
FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5)
GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5)
GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5)
Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5)
Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5)
HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4)
JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5)
JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5)
Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5)
GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5)
RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5)
RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4)
RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5)
RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal)
RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5)
RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5)
RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5)
Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5)
Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5)
TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5)
TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5)
Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-06 09:54:40 +00:00
|
|
|
s.CombJump(b, next, &eqfJumps)
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
case ssa.BlockAMD64NEF:
|
cmd/compile: fix incorrect rewriting to if condition
Some ARM64 rewriting rules convert 'comparing to zero' conditions of if
statements to a simplified version utilizing CMN and CMP instructions to
branch over condition flags, in order to save one Add or Sub caculation.
Such optimizations lead to wrong branching in case an overflow/underflow
occurs when executing CMN or CMP.
Fix the issue by introducing new block opcodes that don't honor the
overflow/underflow flag, in the following categories:
Block-Op Meaning ARM condition codes
1. LTnoov less than MI
2. GEnoov greater than or equal PL
3. LEnoov less than or equal MI || EQ
4. GTnoov greater than NEQ & PL
The backend generates two consecutive branch instructions for 'LEnoov'
and 'GTnoov' to model their expected behavior. A slight change to 'gc'
and amd64/386 backends is made to unify the code generation.
Add a test 'TestCondRewrite' as justification, it covers 32 incorrect rules
identified on arm64, more might be needed on other arches, like 32-bit arm.
Add two benchmarks profiling the aforementioned category 1&2 and category
3&4 separetely, we expect the first two categories will show performance
improvement and the second will not result in visible regression compared with
the non-optimized version.
This change also updates TestFormats to support using %#x.
Examples exhibiting where does the issue come from:
1: 'if x + 3 < 0' might be converted to:
before:
CMN $3, R0
BGE <else branch> // wrong branch is taken if 'x+3' overflows
after:
CMN $3, R0
BPL <else branch>
2: 'if y - 3 > 0' might be converted to:
before:
CMP $3, R0
BLE <else branch> // wrong branch is taken if 'y-3' underflows
after:
CMP $3, R0
BMI <else branch>
BEQ <else branch>
Benchmark data from different kinds of arm64 servers, 'old' is the non-optimized
version (not the parent commit), generally the optimization version outperforms.
S1:
name old time/op new time/op delta
CondRewrite/SoloJump 13.6ns ± 0% 12.9ns ± 0% -5.15% (p=0.000 n=10+10)
CondRewrite/CombJump 13.8ns ± 1% 12.9ns ± 0% -6.32% (p=0.000 n=10+10)
S2:
name old time/op new time/op delta
CondRewrite/SoloJump 11.6ns ± 0% 10.9ns ± 0% -6.03% (p=0.000 n=10+10)
CondRewrite/CombJump 11.4ns ± 0% 10.8ns ± 1% -5.53% (p=0.000 n=10+10)
S3:
name old time/op new time/op delta
CondRewrite/SoloJump 7.36ns ± 0% 7.50ns ± 0% +1.79% (p=0.000 n=9+10)
CondRewrite/CombJump 7.35ns ± 0% 7.75ns ± 0% +5.51% (p=0.000 n=8+9)
S4:
name old time/op new time/op delta
CondRewrite/SoloJump-224 11.5ns ± 1% 10.9ns ± 0% -4.97% (p=0.000 n=10+10)
CondRewrite/CombJump-224 11.9ns ± 0% 11.5ns ± 0% -2.95% (p=0.000 n=10+10)
S5:
name old time/op new time/op delta
CondRewrite/SoloJump 10.0ns ± 0% 10.0ns ± 0% -0.45% (p=0.000 n=9+10)
CondRewrite/CombJump 9.93ns ± 0% 9.77ns ± 0% -1.53% (p=0.000 n=10+9)
Go1 perf. data:
name old time/op new time/op delta
BinaryTree17 6.29s ± 1% 6.30s ± 1% ~ (p=1.000 n=5+5)
Fannkuch11 5.40s ± 0% 5.40s ± 0% ~ (p=0.841 n=5+5)
FmtFprintfEmpty 97.9ns ± 0% 98.9ns ± 3% ~ (p=0.937 n=4+5)
FmtFprintfString 171ns ± 3% 171ns ± 2% ~ (p=0.754 n=5+5)
FmtFprintfInt 212ns ± 0% 217ns ± 6% +2.55% (p=0.008 n=5+5)
FmtFprintfIntInt 296ns ± 1% 297ns ± 2% ~ (p=0.516 n=5+5)
FmtFprintfPrefixedInt 371ns ± 2% 374ns ± 7% ~ (p=1.000 n=5+5)
FmtFprintfFloat 435ns ± 1% 439ns ± 2% ~ (p=0.056 n=5+5)
FmtManyArgs 1.37µs ± 1% 1.36µs ± 1% ~ (p=0.730 n=5+5)
GobDecode 14.6ms ± 4% 14.4ms ± 4% ~ (p=0.690 n=5+5)
GobEncode 11.8ms ±20% 11.6ms ±15% ~ (p=1.000 n=5+5)
Gzip 507ms ± 0% 491ms ± 0% -3.22% (p=0.008 n=5+5)
Gunzip 73.8ms ± 0% 73.9ms ± 0% ~ (p=0.690 n=5+5)
HTTPClientServer 116µs ± 0% 116µs ± 0% ~ (p=0.686 n=4+4)
JSONEncode 21.8ms ± 1% 21.6ms ± 2% ~ (p=0.151 n=5+5)
JSONDecode 104ms ± 1% 103ms ± 1% -1.08% (p=0.016 n=5+5)
Mandelbrot200 9.53ms ± 0% 9.53ms ± 0% ~ (p=0.421 n=5+5)
GoParse 7.55ms ± 1% 7.51ms ± 1% ~ (p=0.151 n=5+5)
RegexpMatchEasy0_32 158ns ± 0% 158ns ± 0% ~ (all equal)
RegexpMatchEasy0_1K 606ns ± 1% 608ns ± 3% ~ (p=0.937 n=5+5)
RegexpMatchEasy1_32 143ns ± 0% 144ns ± 1% ~ (p=0.095 n=5+4)
RegexpMatchEasy1_1K 927ns ± 2% 944ns ± 2% ~ (p=0.056 n=5+5)
RegexpMatchMedium_32 16.0ns ± 0% 16.0ns ± 0% ~ (all equal)
RegexpMatchMedium_1K 69.3µs ± 2% 69.7µs ± 0% ~ (p=0.690 n=5+5)
RegexpMatchHard_32 3.73µs ± 0% 3.73µs ± 1% ~ (p=0.984 n=5+5)
RegexpMatchHard_1K 111µs ± 1% 110µs ± 0% ~ (p=0.151 n=5+5)
Revcomp 1.91s ±47% 1.77s ±68% ~ (p=1.000 n=5+5)
Template 138ms ± 1% 138ms ± 1% ~ (p=1.000 n=5+5)
TimeParse 787ns ± 2% 785ns ± 1% ~ (p=0.540 n=5+5)
TimeFormat 729ns ± 1% 726ns ± 1% ~ (p=0.151 n=5+5)
Updates #38740
Change-Id: I06c604874acdc1e63e66452dadee5df053045222
Reviewed-on: https://go-review.googlesource.com/c/go/+/233097
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2020-05-06 09:54:40 +00:00
|
|
|
s.CombJump(b, next, &nefJumps)
|
2016-03-12 14:07:40 -08:00
|
|
|
|
|
|
|
|
case ssa.BlockAMD64EQ, ssa.BlockAMD64NE,
|
|
|
|
|
ssa.BlockAMD64LT, ssa.BlockAMD64GE,
|
|
|
|
|
ssa.BlockAMD64LE, ssa.BlockAMD64GT,
|
2018-01-27 11:55:34 +01:00
|
|
|
ssa.BlockAMD64OS, ssa.BlockAMD64OC,
|
2016-03-12 14:07:40 -08:00
|
|
|
ssa.BlockAMD64ULT, ssa.BlockAMD64UGT,
|
|
|
|
|
ssa.BlockAMD64ULE, ssa.BlockAMD64UGE:
|
|
|
|
|
jmp := blockJump[b.Kind]
|
|
|
|
|
switch next {
|
2016-04-28 16:52:47 -07:00
|
|
|
case b.Succs[0].Block():
|
2018-04-05 16:14:42 -04:00
|
|
|
s.Br(jmp.invasm, b.Succs[1].Block())
|
2016-04-28 16:52:47 -07:00
|
|
|
case b.Succs[1].Block():
|
2018-04-05 16:14:42 -04:00
|
|
|
s.Br(jmp.asm, b.Succs[0].Block())
|
2016-03-12 14:07:40 -08:00
|
|
|
default:
|
2018-04-05 16:14:42 -04:00
|
|
|
if b.Likely != ssa.BranchUnlikely {
|
|
|
|
|
s.Br(jmp.asm, b.Succs[0].Block())
|
|
|
|
|
s.Br(obj.AJMP, b.Succs[1].Block())
|
|
|
|
|
} else {
|
|
|
|
|
s.Br(jmp.invasm, b.Succs[1].Block())
|
|
|
|
|
s.Br(obj.AJMP, b.Succs[0].Block())
|
|
|
|
|
}
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
|
cmd/compile: implement jump tables
Performance is kind of hard to exactly quantify.
One big difference between jump tables and the old binary search
scheme is that there's only 1 branch statement instead of O(n) of
them. That can be both a blessing and a curse, and can make evaluating
jump tables very hard to do.
The single branch can become a choke point for the hardware branch
predictor. A branch table jump must fit all of its state in a single
branch predictor entry (technically, a branch target predictor entry).
With binary search that predictor state can be spread among lots of
entries. In cases where the case selection is repetitive and thus
predictable, binary search can perform better.
The big win for a jump table is that it doesn't consume so much of the
branch predictor's resources. But that benefit is essentially never
observed in microbenchmarks, because the branch predictor can easily
keep state for all the binary search branches in a microbenchmark. So
that benefit is really hard to measure.
So predictable switch microbenchmarks are ~useless - they will almost
always favor the binary search scheme. Fully unpredictable switch
microbenchmarks are better, as they aren't lying to us quite so
much. In a perfectly unpredictable situation, a jump table will expect
to incur 1-1/N branch mispredicts, where a binary search would incur
lg(N)/2 of them. That makes the crossover point at about N=4. But of
course switches in real programs are seldom fully unpredictable, so
we'll use a higher crossover point.
Beyond the branch predictor, jump tables tend to execute more
instructions per switch but have no additional instructions per case,
which also argues for a larger crossover.
As far as code size goes, with this CL cmd/go has a slightly smaller
code segment and a slightly larger overall size (from the jump tables
themselves which live in the data segment).
This is a case where some FDO (feedback-directed optimization) would
be really nice to have. #28262
Some large-program benchmarks might help make the case for this
CL. Especially if we can turn on branch mispredict counters so we can
see how much using jump tables can free up branch prediction resources
that can be gainfully used elsewhere in the program.
name old time/op new time/op delta
Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10)
Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9)
Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9)
Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10)
Fixes #5496
Update #34381
Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f
Reviewed-on: https://go-review.googlesource.com/c/go/+/357330
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2021-10-04 12:17:46 -07:00
|
|
|
case ssa.BlockAMD64JUMPTABLE:
|
|
|
|
|
// JMP *(TABLE)(INDEX*8)
|
|
|
|
|
p := s.Prog(obj.AJMP)
|
|
|
|
|
p.To.Type = obj.TYPE_MEM
|
|
|
|
|
p.To.Reg = b.Controls[1].Reg()
|
|
|
|
|
p.To.Index = b.Controls[0].Reg()
|
|
|
|
|
p.To.Scale = 8
|
|
|
|
|
// Save jump tables for later resolution of the target blocks.
|
|
|
|
|
s.JumpTables = append(s.JumpTables, b)
|
|
|
|
|
|
2016-03-12 14:07:40 -08:00
|
|
|
default:
|
2019-08-12 20:19:58 +01:00
|
|
|
b.Fatalf("branch not implemented: %s", b.LongString())
|
2016-03-12 14:07:40 -08:00
|
|
|
}
|
|
|
|
|
}
|
2021-04-03 20:09:15 -04:00
|
|
|
|
2021-05-25 18:05:02 -04:00
|
|
|
func loadRegResult(s *ssagen.State, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
|
|
|
|
|
p := s.Prog(loadByType(t))
|
|
|
|
|
p.From.Type = obj.TYPE_MEM
|
|
|
|
|
p.From.Name = obj.NAME_AUTO
|
|
|
|
|
p.From.Sym = n.Linksym()
|
|
|
|
|
p.From.Offset = n.FrameOffset() + off
|
|
|
|
|
p.To.Type = obj.TYPE_REG
|
|
|
|
|
p.To.Reg = reg
|
|
|
|
|
return p
|
2021-04-03 20:09:15 -04:00
|
|
|
}
|
2021-04-11 12:42:49 -04:00
|
|
|
|
|
|
|
|
func spillArgReg(pp *objw.Progs, p *obj.Prog, f *ssa.Func, t *types.Type, reg int16, n *ir.Name, off int64) *obj.Prog {
|
|
|
|
|
p = pp.Append(p, storeByType(t), obj.TYPE_REG, reg, 0, obj.TYPE_MEM, 0, n.FrameOffset()+off)
|
|
|
|
|
p.To.Name = obj.NAME_PARAM
|
|
|
|
|
p.To.Sym = n.Linksym()
|
cmd/compile: spos handling fixes to improve prolog debuggability
With the new register ABI, the compiler sometimes introduces spills of
argument registers in function prologs; depending on the positions
assigned to these spills and whether they have the IsStmt flag set,
this can degrade the debugging experience. For example, in this
function from one of the Delve regression tests:
L13: func foo((eface interface{}) {
L14: if eface != nil {
L15: n++
L16: }
L17 }
we wind up with a prolog containing two spill instructions, the first
with line 14, the second with line 13. The end result for the user
is that if you set a breakpoint in foo and run to it, then do "step",
execution will initially stop at L14, then jump "backwards" to L13.
The root of the problem in this case is that an ArgIntReg pseudo-op is
introduced during expand calls, then promoted (due to lowering) to a
first-class statement (IsStmt flag set), which in turn causes
downstream handling to propagate its position to the first of the register
spills in the prolog.
To help improve things, this patch changes the rewriter to avoid
moving an "IsStmt" flag from a deleted/replaced instruction to an
Arg{Int,Float}Reg value, and adds Arg{Int,Float}Reg to the list of
opcodes not suitable for selection as statement boundaries, and
suppresses generation of additional register spills in defframe() when
optimization is disabled (since in that case things will get spilled
in any case).
This is not a comprehensive/complete fix; there are still cases where
we get less-than-ideal source position markers (ex: issue 45680).
Updates #40724.
Change-Id: Ica8bba4940b2291bef6b5d95ff0cfd84412a2d40
Reviewed-on: https://go-review.googlesource.com/c/go/+/312989
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-04-21 16:21:30 -04:00
|
|
|
p.Pos = p.Pos.WithNotStmt()
|
2021-04-11 12:42:49 -04:00
|
|
|
return p
|
|
|
|
|
}
|