2015-06-06 16:03:33 -07:00
// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package main
2016-03-29 16:39:53 -07:00
// Generic opcodes typically specify a width. The inputs and outputs
// of that op are the given number of bits wide. There is no notion of
// "sign", so Add32 can be used both for signed and unsigned 32-bit
// addition.
// Signed/unsigned is explicit with the extension ops
// (SignExt*/ZeroExt*) and implicit as the arg to some opcodes
// (e.g. the second argument to shifts is unsigned). If not mentioned,
// all args take signed inputs, or don't care whether their inputs
// are signed or unsigned.
2015-06-06 16:03:33 -07:00
var genericOps = [ ] opData {
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
pkg: strconv
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
ParseInt/Pos/7bit ~ ~ -11.08% -6.75%
ParseInt/Pos/26bit ~ ~ -13.65% -11.02%
ParseInt/Pos/31bit ~ ~ -14.65% -9.71%
ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78%
ParseInt/Pos/63bit ~ ~ -13.85% -9.63%
ParseInt/Neg/7bit ~ ~ -12.14% -7.26%
ParseInt/Neg/26bit ~ ~ -14.18% -9.81%
ParseInt/Neg/31bit ~ ~ -14.51% -9.02%
ParseInt/Neg/56bit ~ ~ -15.79% -9.79%
ParseInt/Neg/63bit ~ ~ -15.68% -11.07%
AppendFloat/Decimal ~ ~ -7.25% -12.26%
AppendFloat/Float ~ ~ -15.96% -19.45%
AppendFloat/Exp ~ ~ -13.96% -17.76%
AppendFloat/NegExp ~ ~ -14.89% -20.27%
AppendFloat/LongExp ~ ~ -12.68% -17.97%
AppendFloat/Big ~ ~ -11.10% -16.64%
AppendFloat/BinaryExp ~ ~ ~ ~
AppendFloat/32Integer ~ ~ -10.05% -10.91%
AppendFloat/32ExactFraction ~ ~ -8.93% -13.00%
AppendFloat/32Point ~ ~ -10.36% -14.89%
AppendFloat/32Exp ~ ~ -9.88% -13.54%
AppendFloat/32NegExp ~ ~ -10.16% -14.26%
AppendFloat/32Shortest ~ ~ -11.39% -14.96%
AppendFloat/32Fixed8Hard ~ ~ ~ -2.31%
AppendFloat/32Fixed9Hard ~ ~ ~ -7.01%
AppendFloat/64Fixed1 ~ ~ -2.83% -8.23%
AppendFloat/64Fixed2 ~ ~ ~ -7.94%
AppendFloat/64Fixed3 ~ ~ -4.07% -7.22%
AppendFloat/64Fixed4 ~ ~ -7.24% -7.62%
AppendFloat/64Fixed12 ~ ~ -6.57% -4.82%
AppendFloat/64Fixed16 ~ ~ -4.00% -5.81%
AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35%
AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79%
AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~
AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21%
AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41%
pkg: crypto/internal/fips140/nistec/fiat
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-27 19:41:39 -04:00
// Pseudo-op.
{ name : "Last" , argLength : - 1 } , // return last element of tuple; for "let" bindings
2015-06-06 16:03:33 -07:00
// 2-input arithmetic
2016-03-01 23:21:55 +00:00
// Types must be consistent with Go typing. Add, for example, must take two values
2015-06-06 16:03:33 -07:00
// of the same type and produces that same type.
2016-02-27 08:04:48 -06:00
{ name : "Add8" , argLength : 2 , commutative : true } , // arg0 + arg1
{ name : "Add16" , argLength : 2 , commutative : true } ,
{ name : "Add32" , argLength : 2 , commutative : true } ,
{ name : "Add64" , argLength : 2 , commutative : true } ,
{ name : "AddPtr" , argLength : 2 } , // For address calculations. arg0 is a pointer and arg1 is an int.
2017-03-30 03:30:22 +00:00
{ name : "Add32F" , argLength : 2 , commutative : true } ,
{ name : "Add64F" , argLength : 2 , commutative : true } ,
2015-07-19 15:48:20 -07:00
2016-02-27 08:04:48 -06:00
{ name : "Sub8" , argLength : 2 } , // arg0 - arg1
{ name : "Sub16" , argLength : 2 } ,
{ name : "Sub32" , argLength : 2 } ,
{ name : "Sub64" , argLength : 2 } ,
{ name : "SubPtr" , argLength : 2 } ,
{ name : "Sub32F" , argLength : 2 } ,
{ name : "Sub64F" , argLength : 2 } ,
{ name : "Mul8" , argLength : 2 , commutative : true } , // arg0 * arg1
{ name : "Mul16" , argLength : 2 , commutative : true } ,
{ name : "Mul32" , argLength : 2 , commutative : true } ,
{ name : "Mul64" , argLength : 2 , commutative : true } ,
2017-03-30 03:30:22 +00:00
{ name : "Mul32F" , argLength : 2 , commutative : true } ,
{ name : "Mul64F" , argLength : 2 , commutative : true } ,
2016-02-27 08:04:48 -06:00
{ name : "Div32F" , argLength : 2 } , // arg0 / arg1
{ name : "Div64F" , argLength : 2 } ,
2017-03-30 03:30:22 +00:00
{ name : "Hmul32" , argLength : 2 , commutative : true } ,
{ name : "Hmul32u" , argLength : 2 , commutative : true } ,
{ name : "Hmul64" , argLength : 2 , commutative : true } ,
{ name : "Hmul64u" , argLength : 2 , commutative : true } ,
2016-02-05 20:26:18 -08:00
2017-03-30 03:30:22 +00:00
{ name : "Mul32uhilo" , argLength : 2 , typ : "(UInt32,UInt32)" , commutative : true } , // arg0 * arg1, returns (hi, lo)
{ name : "Mul64uhilo" , argLength : 2 , typ : "(UInt64,UInt64)" , commutative : true } , // arg0 * arg1, returns (hi, lo)
2016-10-06 15:43:47 -04:00
2018-01-27 11:55:34 +01:00
{ name : "Mul32uover" , argLength : 2 , typ : "(UInt32,Bool)" , commutative : true } , // Let x = arg0*arg1 (full 32x32-> 64 unsigned multiply), returns (uint32(x), (uint32(x) != x))
{ name : "Mul64uover" , argLength : 2 , typ : "(UInt64,Bool)" , commutative : true } , // Let x = arg0*arg1 (full 64x64->128 unsigned multiply), returns (uint64(x), (uint64(x) != x))
2017-02-13 16:00:09 -08:00
// Weird special instructions for use in the strength reduction of divides.
// These ops compute unsigned (arg0 + arg1) / 2, correct to all
// 32/64 bits, even when the intermediate result of the add has 33/65 bits.
// These ops can assume arg0 >= arg1.
2017-03-30 03:30:22 +00:00
// Note: these ops aren't commutative!
2017-02-13 16:00:09 -08:00
{ name : "Avg32u" , argLength : 2 , typ : "UInt32" } , // 32-bit platforms only
{ name : "Avg64u" , argLength : 2 , typ : "UInt64" } , // 64-bit platforms only
2016-02-27 08:04:48 -06:00
2018-08-06 19:50:38 +10:00
// For Div16, Div32 and Div64, AuxInt non-zero means that the divisor has been proved to be not -1
// or that the dividend is not the most negative value.
2016-03-29 16:39:53 -07:00
{ name : "Div8" , argLength : 2 } , // arg0 / arg1, signed
{ name : "Div8u" , argLength : 2 } , // arg0 / arg1, unsigned
2018-08-06 19:50:38 +10:00
{ name : "Div16" , argLength : 2 , aux : "Bool" } ,
2016-02-27 08:04:48 -06:00
{ name : "Div16u" , argLength : 2 } ,
2018-08-06 19:50:38 +10:00
{ name : "Div32" , argLength : 2 , aux : "Bool" } ,
2016-02-27 08:04:48 -06:00
{ name : "Div32u" , argLength : 2 } ,
2018-08-06 19:50:38 +10:00
{ name : "Div64" , argLength : 2 , aux : "Bool" } ,
2016-02-27 08:04:48 -06:00
{ name : "Div64u" , argLength : 2 } ,
2016-10-06 15:43:47 -04:00
{ name : "Div128u" , argLength : 3 } , // arg0:arg1 / arg2 (128-bit divided by 64-bit), returns (q, r)
2016-02-27 08:04:48 -06:00
2018-08-06 19:50:38 +10:00
// For Mod16, Mod32 and Mod64, AuxInt non-zero means that the divisor has been proved to be not -1.
2016-03-29 16:39:53 -07:00
{ name : "Mod8" , argLength : 2 } , // arg0 % arg1, signed
{ name : "Mod8u" , argLength : 2 } , // arg0 % arg1, unsigned
2018-08-06 19:50:38 +10:00
{ name : "Mod16" , argLength : 2 , aux : "Bool" } ,
2016-02-27 08:04:48 -06:00
{ name : "Mod16u" , argLength : 2 } ,
2018-08-06 19:50:38 +10:00
{ name : "Mod32" , argLength : 2 , aux : "Bool" } ,
2016-02-27 08:04:48 -06:00
{ name : "Mod32u" , argLength : 2 } ,
2018-08-06 19:50:38 +10:00
{ name : "Mod64" , argLength : 2 , aux : "Bool" } ,
2016-02-27 08:04:48 -06:00
{ name : "Mod64u" , argLength : 2 } ,
{ name : "And8" , argLength : 2 , commutative : true } , // arg0 & arg1
{ name : "And16" , argLength : 2 , commutative : true } ,
{ name : "And32" , argLength : 2 , commutative : true } ,
{ name : "And64" , argLength : 2 , commutative : true } ,
{ name : "Or8" , argLength : 2 , commutative : true } , // arg0 | arg1
{ name : "Or16" , argLength : 2 , commutative : true } ,
{ name : "Or32" , argLength : 2 , commutative : true } ,
{ name : "Or64" , argLength : 2 , commutative : true } ,
{ name : "Xor8" , argLength : 2 , commutative : true } , // arg0 ^ arg1
{ name : "Xor16" , argLength : 2 , commutative : true } ,
{ name : "Xor32" , argLength : 2 , commutative : true } ,
{ name : "Xor64" , argLength : 2 , commutative : true } ,
2015-07-28 16:04:50 -07:00
2015-07-29 17:07:09 -07:00
// For shifts, AxB means the shifted value has A bits and the shift amount has B bits.
2016-03-29 16:39:53 -07:00
// Shift amounts are considered unsigned.
2021-09-19 21:09:57 -07:00
// If arg1 is known to be nonnegative and less than the number of bits in arg0,
2018-05-07 13:42:28 -07:00
// then auxInt may be set to 1.
2018-04-26 20:56:03 -07:00
// This enables better code generation on some platforms.
{ name : "Lsh8x8" , argLength : 2 , aux : "Bool" } , // arg0 << arg1
{ name : "Lsh8x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh8x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh8x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh16x8" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh16x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh16x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh16x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh32x8" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh32x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh32x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh32x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh64x8" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh64x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh64x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Lsh64x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh8x8" , argLength : 2 , aux : "Bool" } , // arg0 >> arg1, signed
{ name : "Rsh8x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh8x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh8x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16x8" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32x8" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64x8" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64x16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64x32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64x64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh8Ux8" , argLength : 2 , aux : "Bool" } , // arg0 >> arg1, unsigned
{ name : "Rsh8Ux16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh8Ux32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh8Ux64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16Ux8" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16Ux16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16Ux32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh16Ux64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32Ux8" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32Ux16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32Ux32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh32Ux64" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64Ux8" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64Ux16" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64Ux32" , argLength : 2 , aux : "Bool" } ,
{ name : "Rsh64Ux64" , argLength : 2 , aux : "Bool" } ,
2015-06-06 16:03:33 -07:00
// 2-input comparisons
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
{ name : "Eq8" , argLength : 2 , commutative : true , typ : "Bool" } , // arg0 == arg1
{ name : "Eq16" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "Eq32" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "Eq64" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "EqPtr" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "EqInter" , argLength : 2 , typ : "Bool" } , // arg0 or arg1 is nil; other cases handled by frontend
{ name : "EqSlice" , argLength : 2 , typ : "Bool" } , // arg0 or arg1 is nil; other cases handled by frontend
2017-03-30 03:30:22 +00:00
{ name : "Eq32F" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "Eq64F" , argLength : 2 , commutative : true , typ : "Bool" } ,
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
{ name : "Neq8" , argLength : 2 , commutative : true , typ : "Bool" } , // arg0 != arg1
{ name : "Neq16" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "Neq32" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "Neq64" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "NeqPtr" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "NeqInter" , argLength : 2 , typ : "Bool" } , // arg0 or arg1 is nil; other cases handled by frontend
{ name : "NeqSlice" , argLength : 2 , typ : "Bool" } , // arg0 or arg1 is nil; other cases handled by frontend
2017-03-30 03:30:22 +00:00
{ name : "Neq32F" , argLength : 2 , commutative : true , typ : "Bool" } ,
{ name : "Neq64F" , argLength : 2 , commutative : true , typ : "Bool" } ,
2016-02-27 08:04:48 -06:00
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
{ name : "Less8" , argLength : 2 , typ : "Bool" } , // arg0 < arg1, signed
{ name : "Less8U" , argLength : 2 , typ : "Bool" } , // arg0 < arg1, unsigned
{ name : "Less16" , argLength : 2 , typ : "Bool" } ,
{ name : "Less16U" , argLength : 2 , typ : "Bool" } ,
{ name : "Less32" , argLength : 2 , typ : "Bool" } ,
{ name : "Less32U" , argLength : 2 , typ : "Bool" } ,
{ name : "Less64" , argLength : 2 , typ : "Bool" } ,
{ name : "Less64U" , argLength : 2 , typ : "Bool" } ,
{ name : "Less32F" , argLength : 2 , typ : "Bool" } ,
{ name : "Less64F" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq8" , argLength : 2 , typ : "Bool" } , // arg0 <= arg1, signed
{ name : "Leq8U" , argLength : 2 , typ : "Bool" } , // arg0 <= arg1, unsigned
{ name : "Leq16" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq16U" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq32" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq32U" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq64" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq64U" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq32F" , argLength : 2 , typ : "Bool" } ,
{ name : "Leq64F" , argLength : 2 , typ : "Bool" } ,
2017-08-13 22:36:47 +00:00
// the type of a CondSelect is the same as the type of its first
// two arguments, which should be register-width scalars; the third
// argument should be a boolean
{ name : "CondSelect" , argLength : 3 } , // arg2 ? arg0 : arg1
2016-04-24 21:21:07 +02:00
// boolean ops
2017-05-03 13:33:14 +02:00
{ name : "AndB" , argLength : 2 , commutative : true , typ : "Bool" } , // arg0 && arg1 (not shortcircuited)
{ name : "OrB" , argLength : 2 , commutative : true , typ : "Bool" } , // arg0 || arg1 (not shortcircuited)
{ name : "EqB" , argLength : 2 , commutative : true , typ : "Bool" } , // arg0 == arg1
{ name : "NeqB" , argLength : 2 , commutative : true , typ : "Bool" } , // arg0 != arg1
{ name : "Not" , argLength : 1 , typ : "Bool" } , // !arg0, boolean
2016-02-27 08:04:48 -06:00
2016-04-24 21:21:07 +02:00
// 1-input ops
2016-02-27 08:04:48 -06:00
{ name : "Neg8" , argLength : 1 } , // -arg0
{ name : "Neg16" , argLength : 1 } ,
{ name : "Neg32" , argLength : 1 } ,
{ name : "Neg64" , argLength : 1 } ,
{ name : "Neg32F" , argLength : 1 } ,
{ name : "Neg64F" , argLength : 1 } ,
{ name : "Com8" , argLength : 1 } , // ^arg0
{ name : "Com16" , argLength : 1 } ,
{ name : "Com32" , argLength : 1 } ,
{ name : "Com64" , argLength : 1 } ,
2018-04-25 11:52:06 -07:00
{ name : "Ctz8" , argLength : 1 } , // Count trailing (low order) zeroes (returns 0-8)
{ name : "Ctz16" , argLength : 1 } , // Count trailing (low order) zeroes (returns 0-16)
{ name : "Ctz32" , argLength : 1 } , // Count trailing (low order) zeroes (returns 0-32)
{ name : "Ctz64" , argLength : 1 } , // Count trailing (low order) zeroes (returns 0-64)
2024-10-22 17:18:11 +08:00
{ name : "Ctz64On32" , argLength : 2 } , // Count trailing (low order) zeroes (returns 0-64) in arg[1]<<32+arg[0]
2018-04-25 11:52:06 -07:00
{ name : "Ctz8NonZero" , argLength : 1 } , // same as above, but arg[0] known to be non-zero, returns 0-7
{ name : "Ctz16NonZero" , argLength : 1 } , // same as above, but arg[0] known to be non-zero, returns 0-15
{ name : "Ctz32NonZero" , argLength : 1 } , // same as above, but arg[0] known to be non-zero, returns 0-31
{ name : "Ctz64NonZero" , argLength : 1 } , // same as above, but arg[0] known to be non-zero, returns 0-63
{ name : "BitLen8" , argLength : 1 } , // Number of bits in arg[0] (returns 0-8)
{ name : "BitLen16" , argLength : 1 } , // Number of bits in arg[0] (returns 0-16)
{ name : "BitLen32" , argLength : 1 } , // Number of bits in arg[0] (returns 0-32)
{ name : "BitLen64" , argLength : 1 } , // Number of bits in arg[0] (returns 0-64)
2016-03-11 00:10:52 -05:00
2022-10-31 11:47:17 -05:00
{ name : "Bswap16" , argLength : 1 } , // Swap bytes
2016-03-11 00:10:52 -05:00
{ name : "Bswap32" , argLength : 1 } , // Swap bytes
{ name : "Bswap64" , argLength : 1 } , // Swap bytes
2017-03-16 22:34:38 -07:00
{ name : "BitRev8" , argLength : 1 } , // Reverse the bits in arg[0]
{ name : "BitRev16" , argLength : 1 } , // Reverse the bits in arg[0]
{ name : "BitRev32" , argLength : 1 } , // Reverse the bits in arg[0]
{ name : "BitRev64" , argLength : 1 } , // Reverse the bits in arg[0]
2022-08-03 22:58:30 -07:00
{ name : "PopCount8" , argLength : 1 } , // Count bits in arg[0]
{ name : "PopCount16" , argLength : 1 } , // Count bits in arg[0]
{ name : "PopCount32" , argLength : 1 } , // Count bits in arg[0]
{ name : "PopCount64" , argLength : 1 } , // Count bits in arg[0]
// RotateLeftX instructions rotate the X bits of arg[0] to the left
// by the low lg_2(X) bits of arg[1], interpreted as an unsigned value.
// Note that this works out regardless of the bit width or signedness of
// arg[1]. In particular, RotateLeft by x is the same as RotateRight by -x.
{ name : "RotateLeft64" , argLength : 2 } ,
{ name : "RotateLeft32" , argLength : 2 } ,
{ name : "RotateLeft16" , argLength : 2 } ,
{ name : "RotateLeft8" , argLength : 2 } ,
2017-03-16 21:33:03 -07:00
2020-12-07 19:15:15 +08:00
// Square root.
2017-09-14 20:00:02 +01:00
// Special cases:
// +∞ → +∞
// ±0 → ±0 (sign preserved)
// x<0 → NaN
// NaN → NaN
2021-03-14 14:27:06 -07:00
{ name : "Sqrt" , argLength : 1 } , // √arg0 (floating point, double precision)
2020-12-07 19:15:15 +08:00
{ name : "Sqrt32" , argLength : 1 } , // √arg0 (floating point, single precision)
2017-09-14 20:00:02 +01:00
// Round to integer, float64 only.
// Special cases:
// ±∞ → ±∞ (sign preserved)
// ±0 → ±0 (sign preserved)
// NaN → NaN
2017-10-30 09:02:44 -04:00
{ name : "Floor" , argLength : 1 } , // round arg0 toward -∞
{ name : "Ceil" , argLength : 1 } , // round arg0 toward +∞
{ name : "Trunc" , argLength : 1 } , // round arg0 toward 0
{ name : "Round" , argLength : 1 } , // round arg0 to nearest, ties away from 0
{ name : "RoundToEven" , argLength : 1 } , // round arg0 to nearest, ties to even
2016-02-27 08:04:48 -06:00
2017-09-28 17:11:31 -04:00
// Modify the sign bit
{ name : "Abs" , argLength : 1 } , // absolute value arg0
{ name : "Copysign" , argLength : 2 } , // copy sign from arg0 to arg1
2024-02-01 00:39:30 +11:00
// Integer min/max implementation, if hardware is available.
{ name : "Min64" , argLength : 2 } , // min(arg0,arg1), signed
{ name : "Max64" , argLength : 2 } , // max(arg0,arg1), signed
{ name : "Min64u" , argLength : 2 } , // min(arg0,arg1), unsigned
{ name : "Max64u" , argLength : 2 } , // max(arg0,arg1), unsigned
2023-07-31 14:08:42 -07:00
// Float min/max implementation, if hardware is available.
{ name : "Min64F" , argLength : 2 } , // min(arg0,arg1)
{ name : "Min32F" , argLength : 2 } , // min(arg0,arg1)
{ name : "Max64F" , argLength : 2 } , // max(arg0,arg1)
{ name : "Max32F" , argLength : 2 } , // max(arg0,arg1)
2018-08-29 20:57:33 -04:00
// 3-input opcode.
// Fused-multiply-add, float64 only.
// When a*b+c is exactly zero (before rounding), then the result is +0 or -0.
// The 0's sign is determined according to the standard rules for the
// addition (-0 if both a*b and c are -0, +0 otherwise).
//
// Otherwise, when a*b+c rounds to zero, then the resulting 0's sign is
// determined by the sign of the exact result a*b+c.
// See section 6.3 in ieee754.
//
// When the multiply is an infinity times a zero, the result is NaN.
// See section 7.2 in ieee754.
math, cmd/compile: rename Fma to FMA
This API was added for #25819, where it was discussed as math.FMA.
The commit adding it used math.Fma, presumably for consistency
with the rest of the unusual names in package math
(Sincos, Acosh, Erfcinv, Float32bits, etc).
I believe that using an idiomatic Go name is more important here
than consistency with these other names, most of which are historical
baggage from C's standard library.
Early additions like Float32frombits happened before "uppercase for export"
(so they were originally like "float32frombits") and they were not properly
reconsidered when we uppercased the symbols to export them.
That's a mistake we live with.
The names of functions we have added since then, and even a few
that were legacy, are more properly Go-cased, such as IsNaN, IsInf,
and RoundToEven, rather than Isnan, Isinf, and Roundtoeven.
And also constants like MaxFloat32.
For new API, we should keep using proper Go-cased symbols
instead of minimally-upper-cased-C symbols.
So math.FMA, not math.Fma.
This API has not yet been released, so this change does not break
the compatibility promise.
This CL also modifies cmd/compile, since the compiler knows
the name of the function. I could have stopped at changing the
string constants, but it seemed to make more sense to use a
consistent casing everywhere.
Change-Id: I0f6f3407f41e99bfa8239467345c33945088896e
Reviewed-on: https://go-review.googlesource.com/c/go/+/205317
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-11-04 19:43:45 -05:00
{ name : "FMA" , argLength : 3 } , // compute (a*b)+c without intermediate rounding
2018-08-29 20:57:33 -04:00
2019-10-28 08:32:06 -07:00
// Data movement. Max argument length for Phi is indefinite.
2018-02-28 16:30:07 -05:00
{ name : "Phi" , argLength : - 1 , zeroWidth : true } , // select an argument based on which predecessor block we came from
{ name : "Copy" , argLength : 1 } , // output = arg0
2015-11-10 15:35:36 -08:00
// Convert converts between pointers and integers.
// We have a special op for this so as to not confuse GC
// (particularly stack maps). It takes a memory arg so it
// gets correctly ordered with respect to GC safepoints.
cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.
This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.
The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.
For #24543.
Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-02 16:08:09 -04:00
// It gets compiled to nothing, so its result must in the same
// register as its argument. regalloc knows it can use any
// allocatable integer register for OpConvert.
2015-11-10 15:35:36 -08:00
// arg0=ptr/int arg1=mem, output=int/ptr
cmd/compile: don't lower OpConvert
Currently, each architecture lowers OpConvert to an arch-specific
OpXXXconvert. This is silly because OpConvert means the same thing on
all architectures and is logically a no-op that exists only to keep
track of conversions to and from unsafe.Pointer. Furthermore, lowering
it makes it harder to recognize in other analyses, particularly
liveness analysis.
This CL eliminates the lowering of OpConvert, leaving it as the
generic op until code generation time.
The main complexity here is that we still need to register-allocate
OpConvert operations. Currently, each arch's lowered OpConvert
specifies all GP registers in its register mask. Ideally, OpConvert
wouldn't affect value homing at all, and we could just copy the home
of OpConvert's source, but this can potentially home an OpConvert in a
LocalSlot, which neither regalloc nor stackalloc expect. Rather than
try to disentangle this assumption from regalloc and stackalloc, we
continue to register-allocate OpConvert, but teach regalloc that
OpConvert can be allocated to any allocatable GP register.
For #24543.
Change-Id: I795a6aee5fd94d4444a7bafac3838a400c9f7bb6
Reviewed-on: https://go-review.googlesource.com/108496
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-04-02 16:08:09 -04:00
{ name : "Convert" , argLength : 2 , zeroWidth : true , resultInArg0 : true } ,
2015-06-06 16:03:33 -07:00
2016-03-01 23:21:55 +00:00
// constants. Constant values are stored in the aux or
2016-01-25 09:21:17 -08:00
// auxint fields.
2016-01-31 11:39:39 -08:00
{ name : "ConstBool" , aux : "Bool" } , // auxint is 0 for false and 1 for true
{ name : "ConstString" , aux : "String" } , // value is aux.(string)
{ name : "ConstNil" , typ : "BytePtr" } , // nil pointer
2016-03-29 16:39:53 -07:00
{ name : "Const8" , aux : "Int8" } , // auxint is sign-extended 8 bits
{ name : "Const16" , aux : "Int16" } , // auxint is sign-extended 16 bits
{ name : "Const32" , aux : "Int32" } , // auxint is sign-extended 32 bits
2017-02-13 16:00:09 -08:00
// Note: ConstX are sign-extended even when the type of the value is unsigned.
// For instance, uint8(0xaa) is stored as auxint=0xffffffffffffffaa.
2020-03-03 17:56:20 +00:00
{ name : "Const64" , aux : "Int64" } , // value is auxint
// Note: for both Const32F and Const64F, we disallow encoding NaNs.
// Signaling NaNs are tricky because if you do anything with them, they become quiet.
// Particularly, converting a 32 bit sNaN to 64 bit and back converts it to a qNaN.
// See issue 36399 and 36400.
// Encodings of +inf, -inf, and -0 are fine.
2019-12-30 08:35:40 -08:00
{ name : "Const32F" , aux : "Float32" } , // value is math.Float64frombits(uint64(auxint)) and is exactly representable as float 32
2017-02-13 16:00:09 -08:00
{ name : "Const64F" , aux : "Float64" } , // value is math.Float64frombits(uint64(auxint))
{ name : "ConstInterface" } , // nil interface
{ name : "ConstSlice" } , // nil slice
2015-06-06 16:03:33 -07:00
// Constant-like things
2018-02-28 16:30:07 -05:00
{ name : "InitMem" , zeroWidth : true } , // memory input to the function.
{ name : "Arg" , aux : "SymOff" , symEffect : "Read" , zeroWidth : true } , // argument to the function. aux=GCNode of arg, off = offset in that arg.
2015-06-19 21:02:28 -07:00
2020-10-07 09:44:16 -04:00
// Like Arg, these are generic ops that survive lowering. AuxInt is a register index, and the actual output register for each index is defined by the architecture.
// AuxInt = integer argument index (not a register number). ABI-specified spill loc obtained from function
2021-03-01 11:02:48 -05:00
{ name : "ArgIntReg" , aux : "NameOffsetInt8" , zeroWidth : true } , // argument to the function in an int reg.
{ name : "ArgFloatReg" , aux : "NameOffsetInt8" , zeroWidth : true } , // argument to the function in a float reg.
2020-10-07 09:44:16 -04:00
2017-09-18 14:53:56 -07:00
// The address of a variable. arg0 is the base pointer.
// If the variable is a global, the base pointer will be SB and
// the Aux field will be a *obj.LSym.
// If the variable is a local, the base pointer will be SP and
2025-03-26 15:46:30 +01:00
// the Aux field will be a *ir.Name
2018-07-03 11:34:38 -04:00
{ name : "Addr" , argLength : 1 , aux : "Sym" , symEffect : "Addr" } , // Address of a variable. Arg0=SB. Aux identifies the variable.
{ name : "LocalAddr" , argLength : 2 , aux : "Sym" , symEffect : "Addr" } , // Address of a variable. Arg0=SP. Arg1=mem. Aux identifies the variable.
2015-06-19 21:02:28 -07:00
2024-11-24 15:29:56 -08:00
{ name : "SP" , zeroWidth : true , fixedReg : true } , // stack pointer
{ name : "SB" , typ : "Uintptr" , zeroWidth : true , fixedReg : true } , // static base pointer (a.k.a. globals pointer)
{ name : "Invalid" } , // unused value
2022-11-21 22:22:36 -08:00
{ name : "SPanchored" , typ : "Uintptr" , argLength : 2 , zeroWidth : true } , // arg0 = SP, arg1 = mem. Result is identical to arg0, but cannot be scheduled before memory state arg1.
2015-06-06 16:03:33 -07:00
// Memory operations
2017-08-09 14:00:38 -05:00
{ name : "Load" , argLength : 2 } , // Load from arg0. arg1=memory
2020-06-15 14:20:36 -04:00
{ name : "Dereference" , argLength : 2 } , // Load from arg0. arg1=memory. Helper op for arg/result passing, result is an otherwise not-SSA-able "value".
2017-08-09 14:00:38 -05:00
{ name : "Store" , argLength : 3 , typ : "Mem" , aux : "Typ" } , // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
2025-07-16 13:29:14 -04:00
// masked memory operations.
// TODO add 16 and 8
2025-07-23 21:04:38 -04:00
{ name : "LoadMasked8" , argLength : 3 } , // Load from arg0, arg1 = mask of 8-bits, arg2 = memory
{ name : "LoadMasked16" , argLength : 3 } , // Load from arg0, arg1 = mask of 16-bits, arg2 = memory
2025-07-16 13:29:14 -04:00
{ name : "LoadMasked32" , argLength : 3 } , // Load from arg0, arg1 = mask of 32-bits, arg2 = memory
{ name : "LoadMasked64" , argLength : 3 } , // Load from arg0, arg1 = mask of 64-bits, arg2 = memory
2025-07-23 21:04:38 -04:00
{ name : "StoreMasked8" , argLength : 4 , typ : "Mem" , aux : "Typ" } , // Store arg2 to arg0, arg1=mask of 8-bits, arg3 = memory
{ name : "StoreMasked16" , argLength : 4 , typ : "Mem" , aux : "Typ" } , // Store arg2 to arg0, arg1=mask of 16-bits, arg3 = memory
2025-07-16 13:29:14 -04:00
{ name : "StoreMasked32" , argLength : 4 , typ : "Mem" , aux : "Typ" } , // Store arg2 to arg0, arg1=mask of 32-bits, arg3 = memory
{ name : "StoreMasked64" , argLength : 4 , typ : "Mem" , aux : "Typ" } , // Store arg2 to arg0, arg1=mask of 64-bits, arg3 = memory
2022-08-22 10:26:50 -07:00
// Normally we require that the source and destination of Move do not overlap.
// There is an exception when we know all the loads will happen before all
// the stores. In that case, overlap is ok. See
2017-08-09 14:00:38 -05:00
// memmove inlining in generic.rules. When inlineablememmovesize (in ../rewrite.go)
// returns true, we must do all loads before all stores, when lowering Move.
2020-02-24 16:16:27 -08:00
// The type of Move is used for the write barrier pass to insert write barriers
// and for alignment on some architectures.
// For pointerless types, it is possible for the type to be inaccurate.
// For type alignment and pointer information, use the type in Aux;
// for type size, use the size in AuxInt.
// The "inline runtime.memmove" rewrite rule generates Moves with inaccurate types,
// such as type byte instead of the more accurate type [8]byte.
2017-03-13 21:51:08 -04:00
{ name : "Move" , argLength : 3 , typ : "Mem" , aux : "TypSize" } , // arg0=destptr, arg1=srcptr, arg2=mem, auxint=size, aux=type. Returns memory.
{ name : "Zero" , argLength : 2 , typ : "Mem" , aux : "TypSize" } , // arg0=destptr, arg1=mem, auxint=size, aux=type. Returns memory.
2015-06-06 16:03:33 -07:00
2016-10-13 06:57:00 -04:00
// Memory operations with write barriers.
// Expand to runtime calls. Write barrier will be removed if write on stack.
2017-03-13 21:51:08 -04:00
{ name : "StoreWB" , argLength : 3 , typ : "Mem" , aux : "Typ" } , // Store arg1 to arg0. arg2=memory, aux=type. Returns memory.
{ name : "MoveWB" , argLength : 3 , typ : "Mem" , aux : "TypSize" } , // arg0=destptr, arg1=srcptr, arg2=mem, auxint=size, aux=type. Returns memory.
{ name : "ZeroWB" , argLength : 2 , typ : "Mem" , aux : "TypSize" } , // arg0=destptr, arg1=mem, auxint=size, aux=type. Returns memory.
2022-10-09 19:06:23 -07:00
{ name : "WBend" , argLength : 1 , typ : "Mem" } , // Write barrier code is done, interrupting is now allowed.
2016-10-13 06:57:00 -04:00
2022-11-01 16:46:43 -07:00
// WB invokes runtime.gcWriteBarrier. This is not a normal
2017-10-26 12:33:04 -04:00
// call: it takes arguments in registers, doesn't clobber
// general-purpose registers (the exact clobber set is
// arch-dependent), and is not a safe-point.
2022-11-01 16:46:43 -07:00
{ name : "WB" , argLength : 1 , typ : "(BytePtr,Mem)" , aux : "Int64" } , // arg0=mem, auxint=# of buffer entries needed. Returns buffer pointer and memory.
2017-10-26 12:33:04 -04:00
2019-12-19 10:58:28 -08:00
{ name : "HasCPUFeature" , argLength : 0 , typ : "bool" , aux : "Sym" , symEffect : "None" } , // aux=place that this feature flag can be loaded from
2019-02-06 14:12:36 -08:00
// PanicBounds and PanicExtend generate a runtime panic.
// Their arguments provide index values to use in panic messages.
// Both PanicBounds and PanicExtend have an AuxInt value from the BoundsKind type (in ../op.go).
// PanicBounds' index is int sized.
// PanicExtend's index is int64 sized. (PanicExtend is only used on 32-bit archs.)
2020-04-27 15:58:16 -04:00
{ name : "PanicBounds" , argLength : 3 , aux : "Int64" , typ : "Mem" , call : true } , // arg0=idx, arg1=len, arg2=mem, returns memory.
{ name : "PanicExtend" , argLength : 4 , aux : "Int64" , typ : "Mem" , call : true } , // arg0=idxHi, arg1=idxLo, arg2=len, arg3=mem, returns memory.
2019-02-06 14:12:36 -08:00
2016-03-01 23:21:55 +00:00
// Function calls. Arguments to the call have already been written to the stack.
// Return values appear on the stack. The method receiver, if any, is treated
2015-06-06 16:03:33 -07:00
// as a phantom first argument.
2020-04-23 22:46:11 -07:00
// TODO(josharian): ClosureCall and InterCall should have Int32 aux
// to match StaticCall's 32 bit arg size limit.
2020-06-15 14:20:36 -04:00
// TODO(drchase,josharian): could the arg size limit be bundled into the rules for CallOff?
2021-03-05 14:24:41 -05:00
// Before lowering, LECalls receive their fixed inputs (first), memory (last),
// and a variable number of input values in the middle.
// They produce a variable number of result values.
// These values are not necessarily "SSA-able"; they can be too large,
// but in that case inputs are loaded immediately before with OpDereference,
// and outputs are stored immediately with OpStore.
//
// After call expansion, Calls have the same fixed-middle-memory arrangement of inputs,
// with the difference that the "middle" is only the register-resident inputs,
// and the non-register inputs are instead stored at ABI-defined offsets from SP
// (and the stores thread through the memory that is ultimately an input to the call).
// Outputs follow a similar pattern; register-resident outputs are the leading elements
// of a Result-typed output, with memory last, and any memory-resident outputs have been
// stored to ABI-defined locations. Each non-memory input or output fits in a register.
//
// Subsequent architecture-specific lowering only changes the opcode.
{ name : "ClosureCall" , argLength : - 1 , aux : "CallOff" , call : true } , // arg0=code pointer, arg1=context ptr, arg2..argN-1 are register inputs, argN=memory. auxint=arg size. Returns Result of register results, plus memory.
{ name : "StaticCall" , argLength : - 1 , aux : "CallOff" , call : true } , // call function aux.(*obj.LSym), arg0..argN-1 are register inputs, argN=memory. auxint=arg size. Returns Result of register results, plus memory.
{ name : "InterCall" , argLength : - 1 , aux : "CallOff" , call : true } , // interface call. arg0=code pointer, arg1..argN-1 are register inputs, argN=memory, auxint=arg size. Returns Result of register results, plus memory.
cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
{ name : "TailCall" , argLength : - 1 , aux : "CallOff" , call : true } , // tail call function aux.(*obj.LSym), arg0..argN-1 are register inputs, argN=memory. auxint=arg size. Returns Result of register results, plus memory.
2021-03-05 14:24:41 -05:00
2020-08-07 22:46:43 -04:00
{ name : "ClosureLECall" , argLength : - 1 , aux : "CallOff" , call : true } , // late-expanded closure call. arg0=code pointer, arg1=context ptr, arg2..argN-1 are inputs, argN is mem. auxint = arg size. Result is tuple of result(s), plus mem.
{ name : "StaticLECall" , argLength : - 1 , aux : "CallOff" , call : true } , // late-expanded static call function aux.(*ssa.AuxCall.Fn). arg0..argN-1 are inputs, argN is mem. auxint = arg size. Result is tuple of result(s), plus mem.
{ name : "InterLECall" , argLength : - 1 , aux : "CallOff" , call : true } , // late-expanded interface call. arg0=code pointer, arg1..argN-1 are inputs, argN is mem. auxint = arg size. Result is tuple of result(s), plus mem.
cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.
This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as
OTAILCALL (OCALL target args...)
This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.
(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)
The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).
This slightly reduces binary size:
old new
cmd/go 14003088 13953936
cmd/link 6275552 6271456
Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-10 22:05:55 -04:00
{ name : "TailLECall" , argLength : - 1 , aux : "CallOff" , call : true } , // late-expanded static tail call function aux.(*ssa.AuxCall.Fn). arg0..argN-1 are inputs, argN is mem. auxint = arg size. Result is tuple of result(s), plus mem.
2015-06-06 16:03:33 -07:00
2015-07-28 14:31:25 -07:00
// Conversions: signed extensions, zero (unsigned) extensions, truncations
2016-02-27 08:04:48 -06:00
{ name : "SignExt8to16" , argLength : 1 , typ : "Int16" } ,
2016-05-06 10:13:31 -07:00
{ name : "SignExt8to32" , argLength : 1 , typ : "Int32" } ,
2016-06-25 16:07:56 -07:00
{ name : "SignExt8to64" , argLength : 1 , typ : "Int64" } ,
2016-05-06 10:13:31 -07:00
{ name : "SignExt16to32" , argLength : 1 , typ : "Int32" } ,
2016-06-25 16:07:56 -07:00
{ name : "SignExt16to64" , argLength : 1 , typ : "Int64" } ,
{ name : "SignExt32to64" , argLength : 1 , typ : "Int64" } ,
2016-02-27 08:04:48 -06:00
{ name : "ZeroExt8to16" , argLength : 1 , typ : "UInt16" } ,
2016-05-06 10:13:31 -07:00
{ name : "ZeroExt8to32" , argLength : 1 , typ : "UInt32" } ,
2016-06-25 16:07:56 -07:00
{ name : "ZeroExt8to64" , argLength : 1 , typ : "UInt64" } ,
2016-05-06 10:13:31 -07:00
{ name : "ZeroExt16to32" , argLength : 1 , typ : "UInt32" } ,
2016-06-25 16:07:56 -07:00
{ name : "ZeroExt16to64" , argLength : 1 , typ : "UInt64" } ,
{ name : "ZeroExt32to64" , argLength : 1 , typ : "UInt64" } ,
2016-02-27 08:04:48 -06:00
{ name : "Trunc16to8" , argLength : 1 } ,
{ name : "Trunc32to8" , argLength : 1 } ,
{ name : "Trunc32to16" , argLength : 1 } ,
{ name : "Trunc64to8" , argLength : 1 } ,
{ name : "Trunc64to16" , argLength : 1 } ,
{ name : "Trunc64to32" , argLength : 1 } ,
{ name : "Cvt32to32F" , argLength : 1 } ,
{ name : "Cvt32to64F" , argLength : 1 } ,
{ name : "Cvt64to32F" , argLength : 1 } ,
{ name : "Cvt64to64F" , argLength : 1 } ,
{ name : "Cvt32Fto32" , argLength : 1 } ,
{ name : "Cvt32Fto64" , argLength : 1 } ,
{ name : "Cvt64Fto32" , argLength : 1 } ,
{ name : "Cvt64Fto64" , argLength : 1 } ,
{ name : "Cvt32Fto64F" , argLength : 1 } ,
{ name : "Cvt64Fto32F" , argLength : 1 } ,
2020-02-28 17:04:16 -08:00
{ name : "CvtBoolToUint8" , argLength : 1 } ,
2015-08-20 15:14:20 -04:00
2017-02-12 22:12:12 -05:00
// Force rounding to precision of type.
{ name : "Round32F" , argLength : 1 } ,
{ name : "Round64F" , argLength : 1 } ,
2015-07-24 11:55:52 -07:00
// Automatically inserted safety checks
2016-02-27 08:04:48 -06:00
{ name : "IsNonNil" , argLength : 1 , typ : "Bool" } , // arg0 != nil
2016-03-29 16:39:53 -07:00
{ name : "IsInBounds" , argLength : 2 , typ : "Bool" } , // 0 <= arg0 < arg1. arg1 is guaranteed >= 0.
{ name : "IsSliceInBounds" , argLength : 2 , typ : "Bool" } , // 0 <= arg0 <= arg1. arg1 is guaranteed >= 0.
2023-10-25 13:35:13 -07:00
{ name : "NilCheck" , argLength : 2 , nilCheck : true } , // arg0=ptr, arg1=mem. Panics if arg0 is nil. Returns the ptr unmodified.
2015-06-06 16:03:33 -07:00
2015-08-12 11:22:16 -07:00
// Pseudo-ops
2018-02-28 16:30:07 -05:00
{ name : "GetG" , argLength : 1 , zeroWidth : true } , // runtime.getg() (read g pointer). arg0=mem
{ name : "GetClosurePtr" } , // get closure pointer from dedicated register
2024-09-16 14:07:43 -04:00
{ name : "GetCallerPC" } , // for GetCallerPC intrinsic
2024-09-16 15:58:36 -04:00
{ name : "GetCallerSP" , argLength : 1 } , // for GetCallerSP intrinsic. arg0=mem.
2015-08-11 09:47:45 -07:00
2015-06-06 16:03:33 -07:00
// Indexing operations
2016-10-30 21:10:03 -07:00
{ name : "PtrIndex" , argLength : 2 } , // arg0=ptr, arg1=index. Computes ptr+sizeof(*v.type)*index, where index is extended to ptrwidth type
{ name : "OffPtr" , argLength : 1 , aux : "Int64" } , // arg0 + auxint (arg0 and result are pointers)
2015-06-06 16:03:33 -07:00
// Slices
2016-02-27 08:04:48 -06:00
{ name : "SliceMake" , argLength : 3 } , // arg0=ptr, arg1=len, arg2=cap
{ name : "SlicePtr" , argLength : 1 , typ : "BytePtr" } , // ptr(arg0)
{ name : "SliceLen" , argLength : 1 } , // len(arg0)
{ name : "SliceCap" , argLength : 1 } , // cap(arg0)
2021-03-14 14:24:47 -07:00
// SlicePtrUnchecked, like SlicePtr, extracts the pointer from a slice.
// SlicePtr values are assumed non-nil, because they are guarded by bounds checks.
// SlicePtrUnchecked values can be nil.
{ name : "SlicePtrUnchecked" , argLength : 1 } ,
2015-06-06 16:03:33 -07:00
2015-08-28 14:24:10 -04:00
// Complex (part/whole)
2016-02-27 08:04:48 -06:00
{ name : "ComplexMake" , argLength : 2 } , // arg0=real, arg1=imag
{ name : "ComplexReal" , argLength : 1 } , // real(arg0)
{ name : "ComplexImag" , argLength : 1 } , // imag(arg0)
2015-08-28 14:24:10 -04:00
2015-06-06 16:03:33 -07:00
// Strings
2016-04-21 19:28:28 -07:00
{ name : "StringMake" , argLength : 2 } , // arg0=ptr, arg1=len
{ name : "StringPtr" , argLength : 1 , typ : "BytePtr" } , // ptr(arg0)
{ name : "StringLen" , argLength : 1 , typ : "Int" } , // len(arg0)
2015-06-06 16:03:33 -07:00
2015-08-04 15:47:22 -07:00
// Interfaces
2016-02-27 08:04:48 -06:00
{ name : "IMake" , argLength : 2 } , // arg0=itab, arg1=data
2018-02-27 13:46:03 -08:00
{ name : "ITab" , argLength : 1 , typ : "Uintptr" } , // arg0=interface, returns itable field
2016-02-27 08:04:48 -06:00
{ name : "IData" , argLength : 1 } , // arg0=interface, returns data field
2015-08-04 15:47:22 -07:00
2016-01-11 21:05:33 -08:00
// Structs
2024-09-05 14:56:43 +07:00
{ name : "StructMake" , argLength : - 1 } , // args...=field0..n-1. Returns struct with n fields.
2016-02-27 08:04:48 -06:00
{ name : "StructSelect" , argLength : 1 , aux : "Int64" } , // arg0=struct, auxint=field index. Returns the auxint'th field.
2016-01-11 21:05:33 -08:00
2016-10-30 21:10:03 -07:00
// Arrays
{ name : "ArrayMake0" } , // Returns array with 0 elements
{ name : "ArrayMake1" , argLength : 1 } , // Returns array with 1 element
{ name : "ArraySelect" , argLength : 1 , aux : "Int64" } , // arg0=array, auxint=index. Returns a[i].
2016-03-01 23:21:55 +00:00
// Spill&restore ops for the register allocator. These are
2015-06-06 16:03:33 -07:00
// semantically identical to OpCopy; they do not take/return
2016-03-01 23:21:55 +00:00
// stores like regular memory ops do. We can get away without memory
2015-06-06 16:03:33 -07:00
// args because we know there is no aliasing of spill slots on the stack.
2016-02-27 08:04:48 -06:00
{ name : "StoreReg" , argLength : 1 } ,
{ name : "LoadReg" , argLength : 1 } ,
2015-06-06 16:03:33 -07:00
2016-03-01 23:21:55 +00:00
// Used during ssa construction. Like Copy, but the arg has not been specified yet.
2017-03-09 14:46:43 -08:00
{ name : "FwdRef" , aux : "Sym" , symEffect : "None" } ,
2015-08-24 02:16:19 -07:00
2016-03-01 23:21:55 +00:00
// Unknown value. Used for Values whose values don't matter because they are dead code.
2016-01-14 16:02:23 -08:00
{ name : "Unknown" } ,
2025-03-26 15:46:30 +01:00
{ name : "VarDef" , argLength : 1 , aux : "Sym" , typ : "Mem" , symEffect : "None" , zeroWidth : true } , // aux is a *ir.Name of a variable that is about to be initialized. arg0=mem, returns mem
2019-09-08 19:36:13 +03:00
// TODO: what's the difference between VarLive and KeepAlive?
2025-03-26 15:46:30 +01:00
{ name : "VarLive" , argLength : 1 , aux : "Sym" , symEffect : "Read" , zeroWidth : true } , // aux is a *ir.Name of a variable that must be kept live. arg0=mem, returns mem
2018-09-07 14:55:09 -07:00
{ name : "KeepAlive" , argLength : 2 , typ : "Mem" , zeroWidth : true } , // arg[0] is a value that must be kept alive until this mark. arg[1]=mem, returns mem
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
2018-12-04 07:58:18 -08:00
// InlMark marks the start of an inlined function body. Its AuxInt field
// distinguishes which entry in the local inline tree it is marking.
{ name : "InlMark" , argLength : 1 , aux : "Int32" , typ : "Void" } , // arg[0]=mem, returns void.
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
// Ops for breaking 64-bit operations on 32-bit architectures
{ name : "Int64Make" , argLength : 2 , typ : "UInt64" } , // arg0=hi, arg1=lo
{ name : "Int64Hi" , argLength : 1 , typ : "UInt32" } , // high 32-bit of arg0
{ name : "Int64Lo" , argLength : 1 , typ : "UInt32" } , // low 32-bit of arg0
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
pkg: strconv
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
ParseInt/Pos/7bit ~ ~ -11.08% -6.75%
ParseInt/Pos/26bit ~ ~ -13.65% -11.02%
ParseInt/Pos/31bit ~ ~ -14.65% -9.71%
ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78%
ParseInt/Pos/63bit ~ ~ -13.85% -9.63%
ParseInt/Neg/7bit ~ ~ -12.14% -7.26%
ParseInt/Neg/26bit ~ ~ -14.18% -9.81%
ParseInt/Neg/31bit ~ ~ -14.51% -9.02%
ParseInt/Neg/56bit ~ ~ -15.79% -9.79%
ParseInt/Neg/63bit ~ ~ -15.68% -11.07%
AppendFloat/Decimal ~ ~ -7.25% -12.26%
AppendFloat/Float ~ ~ -15.96% -19.45%
AppendFloat/Exp ~ ~ -13.96% -17.76%
AppendFloat/NegExp ~ ~ -14.89% -20.27%
AppendFloat/LongExp ~ ~ -12.68% -17.97%
AppendFloat/Big ~ ~ -11.10% -16.64%
AppendFloat/BinaryExp ~ ~ ~ ~
AppendFloat/32Integer ~ ~ -10.05% -10.91%
AppendFloat/32ExactFraction ~ ~ -8.93% -13.00%
AppendFloat/32Point ~ ~ -10.36% -14.89%
AppendFloat/32Exp ~ ~ -9.88% -13.54%
AppendFloat/32NegExp ~ ~ -10.16% -14.26%
AppendFloat/32Shortest ~ ~ -11.39% -14.96%
AppendFloat/32Fixed8Hard ~ ~ ~ -2.31%
AppendFloat/32Fixed9Hard ~ ~ ~ -7.01%
AppendFloat/64Fixed1 ~ ~ -2.83% -8.23%
AppendFloat/64Fixed2 ~ ~ ~ -7.94%
AppendFloat/64Fixed3 ~ ~ -4.07% -7.22%
AppendFloat/64Fixed4 ~ ~ -7.24% -7.62%
AppendFloat/64Fixed12 ~ ~ -6.57% -4.82%
AppendFloat/64Fixed16 ~ ~ -4.00% -5.81%
AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35%
AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79%
AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~
AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21%
AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41%
pkg: crypto/internal/fips140/nistec/fiat
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-27 19:41:39 -04:00
{ name : "Add32carry" , argLength : 2 , commutative : true , typ : "(UInt32,Flags)" } , // arg0 + arg1, returns (value, carry)
{ name : "Add32withcarry" , argLength : 3 , commutative : true } , // arg0 + arg1 + arg2, arg2=carry (0 or 1)
{ name : "Add32carrywithcarry" , argLength : 3 , commutative : true , typ : "(UInt32,Flags)" } , // arg0 + arg1 + arg2, arg2=carry, returns (value, carry)
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
2016-08-23 16:49:28 -07:00
{ name : "Sub32carry" , argLength : 2 , typ : "(UInt32,Flags)" } , // arg0 - arg1, returns (value, carry)
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
{ name : "Sub32withcarry" , argLength : 3 } , // arg0 - arg1 - arg2, arg2=carry (0 or 1)
2018-10-23 14:05:38 -07:00
{ name : "Add64carry" , argLength : 3 , commutative : true , typ : "(UInt64,UInt64)" } , // arg0 + arg1 + arg2, arg2 must be 0 or 1. returns (value, value>>64)
2018-10-23 14:38:22 -07:00
{ name : "Sub64borrow" , argLength : 3 , typ : "(UInt64,UInt64)" } , // arg0 - (arg1 + arg2), arg2 must be 0 or 1. returns (value, value>>64&1)
2018-10-23 14:05:38 -07:00
2016-05-25 23:17:42 -04:00
{ name : "Signmask" , argLength : 1 , typ : "Int32" } , // 0 if arg0 >= 0, -1 if arg0 < 0
{ name : "Zeromask" , argLength : 1 , typ : "UInt32" } , // 0 if arg0 == 0, 0xffffffff if arg0 != 0
2016-10-25 15:49:52 -07:00
{ name : "Slicemask" , argLength : 1 } , // 0 if arg0 == 0, -1 if arg0 > 0, undef if arg0<0. Type is native int size.
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
2020-01-06 10:31:39 -05:00
{ name : "SpectreIndex" , argLength : 2 } , // arg0 if 0 <= arg0 < arg1, 0 otherwise. Type is native int size.
{ name : "SpectreSliceIndex" , argLength : 2 } , // arg0 if 0 <= arg0 <= arg1, 0 otherwise. Type is native int size.
2016-05-31 11:27:16 -04:00
{ name : "Cvt32Uto32F" , argLength : 1 } , // uint32 -> float32, only used on 32-bit arch
{ name : "Cvt32Uto64F" , argLength : 1 } , // uint32 -> float64, only used on 32-bit arch
{ name : "Cvt32Fto32U" , argLength : 1 } , // float32 -> uint32, only used on 32-bit arch
{ name : "Cvt64Fto32U" , argLength : 1 } , // float64 -> uint32, only used on 32-bit arch
2016-08-16 14:17:33 -04:00
{ name : "Cvt64Uto32F" , argLength : 1 } , // uint64 -> float32, only used on archs that has the instruction
{ name : "Cvt64Uto64F" , argLength : 1 } , // uint64 -> float64, only used on archs that has the instruction
{ name : "Cvt32Fto64U" , argLength : 1 } , // float32 -> uint64, only used on archs that has the instruction
{ name : "Cvt64Fto64U" , argLength : 1 } , // float64 -> uint64, only used on archs that has the instruction
2016-05-31 11:27:16 -04:00
[dev.ssa] cmd/compile: decompose 64-bit integer on ARM
Introduce dec64 rules to (generically) decompose 64-bit integer on
32-bit architectures. 64-bit integer is composed/decomposed with
Int64Make/Hi/Lo ops, as for complex types.
The idea of dealing with Add64 is the following:
(Add64 (Int64Make xh xl) (Int64Make yh yl))
->
(Int64Make
(Add32withcarry xh yh (Select0 (Add32carry xl yl)))
(Select1 (Add32carry xl yl)))
where Add32carry returns a tuple (flags,uint32). Select0 and Select1
read the first and the second component of the tuple, respectively.
The two Add32carry will be CSE'd.
Similarly for multiplication, Mul32uhilo returns a tuple (hi, lo).
Also add support of KeepAlive, to fix build after merge.
Tests addressed_ssa.go, array_ssa.go, break_ssa.go, chan_ssa.go,
cmp_ssa.go, ctl_ssa.go, map_ssa.go, and string_ssa.go in
cmd/compile/internal/gc/testdata passed.
Progress on SSA for ARM. Still not complete.
Updates #15365.
Change-Id: I7867c76785a456312de5d8398a6b3f7ca5a4f7ec
Reviewed-on: https://go-review.googlesource.com/23213
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-18 18:14:36 -04:00
// pseudo-ops for breaking Tuple
2020-06-15 14:20:36 -04:00
{ name : "Select0" , argLength : 1 , zeroWidth : true } , // the first component of a tuple
{ name : "Select1" , argLength : 1 , zeroWidth : true } , // the second component of a tuple
2025-03-11 09:52:10 +01:00
{ name : "MakeTuple" , argLength : 2 } , // arg0 arg1 are components of a "Tuple" (like the result from a 128bits op).
2020-08-13 20:43:39 -04:00
{ name : "SelectN" , argLength : 1 , aux : "Int64" } , // arg0=result, auxint=field index. Returns the auxint'th member.
{ name : "SelectNAddr" , argLength : 1 , aux : "Int64" } , // arg0=result, auxint=field index. Returns the address of auxint'th member. Used for un-SSA-able result types.
{ name : "MakeResult" , argLength : - 1 } , // arg0 .. are components of a "Result" (like the result from a Call). The last arg should be memory (like the result from a call).
2016-08-23 16:49:28 -07:00
2020-10-09 12:44:07 -04:00
// Atomic operations used for semantically inlining sync/atomic and
2024-02-01 10:21:14 +08:00
// internal/runtime/atomic. Atomic loads return a new memory so that
2020-10-09 12:44:07 -04:00
// the loads are properly ordered with respect to other loads and
// stores.
2019-03-28 14:58:06 -04:00
{ name : "AtomicLoad8" , argLength : 2 , typ : "(UInt8,Mem)" } , // Load from arg0. arg1=memory. Returns loaded value and new memory.
2018-08-06 15:36:16 -05:00
{ name : "AtomicLoad32" , argLength : 2 , typ : "(UInt32,Mem)" } , // Load from arg0. arg1=memory. Returns loaded value and new memory.
{ name : "AtomicLoad64" , argLength : 2 , typ : "(UInt64,Mem)" } , // Load from arg0. arg1=memory. Returns loaded value and new memory.
{ name : "AtomicLoadPtr" , argLength : 2 , typ : "(BytePtr,Mem)" } , // Load from arg0. arg1=memory. Returns loaded value and new memory.
{ name : "AtomicLoadAcq32" , argLength : 2 , typ : "(UInt32,Mem)" } , // Load from arg0. arg1=memory. Lock acquisition, returns loaded value and new memory.
cmd/compiler,cmd/go,sync: add internal {LoadAcq,StoreRel}64 on ppc64
Add an internal atomic intrinsic for load with acquire semantics
(extending LoadAcq to 64b) and add LoadAcquintptr for internal
use within the sync package. For other arches, this remaps to the
appropriate atomic.Load{,64} intrinsic which should not alter code
generation.
Similarly, add StoreRel{uintptr,64} for consistency, and inline.
Finally, add an exception to allow sync to directly use the
runtime/internal/atomic package which avoids more convoluted
workarounds (contributed by Lynn Boger).
In an extreme example, sync.(*Pool).pin consumes 20% of wall time
during fmt tests. This is reduced to 5% on ppc64le/power9.
From the fmt benchmarks on ppc64le:
name old time/op new time/op delta
SprintfPadding 468ns ± 0% 451ns ± 0% -3.63%
SprintfEmpty 73.3ns ± 0% 51.9ns ± 0% -29.20%
SprintfString 135ns ± 0% 122ns ± 0% -9.63%
SprintfTruncateString 232ns ± 0% 214ns ± 0% -7.76%
SprintfTruncateBytes 216ns ± 0% 202ns ± 0% -6.48%
SprintfSlowParsingPath 162ns ± 0% 142ns ± 0% -12.35%
SprintfQuoteString 1.00µs ± 0% 0.99µs ± 0% -1.39%
SprintfInt 117ns ± 0% 104ns ± 0% -11.11%
SprintfIntInt 190ns ± 0% 175ns ± 0% -7.89%
SprintfPrefixedInt 232ns ± 0% 212ns ± 0% -8.62%
SprintfFloat 270ns ± 0% 255ns ± 0% -5.56%
SprintfComplex 1.01µs ± 0% 0.99µs ± 0% -1.68%
SprintfBoolean 127ns ± 0% 111ns ± 0% -12.60%
SprintfHexString 220ns ± 0% 198ns ± 0% -10.00%
SprintfHexBytes 261ns ± 0% 252ns ± 0% -3.45%
SprintfBytes 600ns ± 0% 590ns ± 0% -1.67%
SprintfStringer 684ns ± 0% 658ns ± 0% -3.80%
SprintfStructure 2.57µs ± 0% 2.57µs ± 0% -0.12%
ManyArgs 669ns ± 0% 646ns ± 0% -3.44%
FprintInt 140ns ± 0% 136ns ± 0% -2.86%
FprintfBytes 184ns ± 0% 181ns ± 0% -1.63%
FprintIntNoAlloc 140ns ± 0% 136ns ± 0% -2.86%
ScanInts 929µs ± 0% 921µs ± 0% -0.79%
ScanRecursiveInt 122ms ± 0% 121ms ± 0% -0.11%
ScanRecursiveIntReaderWrapper 122ms ± 0% 122ms ± 0% -0.18%
Change-Id: I4d66780261b57b06ef600229e475462e7313f0d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/253748
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-09-09 17:24:23 -05:00
{ name : "AtomicLoadAcq64" , argLength : 2 , typ : "(UInt64,Mem)" } , // Load from arg0. arg1=memory. Lock acquisition, returns loaded value and new memory.
2019-10-23 10:20:49 -04:00
{ name : "AtomicStore8" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
2018-08-06 15:36:16 -05:00
{ name : "AtomicStore32" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
{ name : "AtomicStore64" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
{ name : "AtomicStorePtrNoWB" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
{ name : "AtomicStoreRel32" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Lock release, returns memory.
cmd/compiler,cmd/go,sync: add internal {LoadAcq,StoreRel}64 on ppc64
Add an internal atomic intrinsic for load with acquire semantics
(extending LoadAcq to 64b) and add LoadAcquintptr for internal
use within the sync package. For other arches, this remaps to the
appropriate atomic.Load{,64} intrinsic which should not alter code
generation.
Similarly, add StoreRel{uintptr,64} for consistency, and inline.
Finally, add an exception to allow sync to directly use the
runtime/internal/atomic package which avoids more convoluted
workarounds (contributed by Lynn Boger).
In an extreme example, sync.(*Pool).pin consumes 20% of wall time
during fmt tests. This is reduced to 5% on ppc64le/power9.
From the fmt benchmarks on ppc64le:
name old time/op new time/op delta
SprintfPadding 468ns ± 0% 451ns ± 0% -3.63%
SprintfEmpty 73.3ns ± 0% 51.9ns ± 0% -29.20%
SprintfString 135ns ± 0% 122ns ± 0% -9.63%
SprintfTruncateString 232ns ± 0% 214ns ± 0% -7.76%
SprintfTruncateBytes 216ns ± 0% 202ns ± 0% -6.48%
SprintfSlowParsingPath 162ns ± 0% 142ns ± 0% -12.35%
SprintfQuoteString 1.00µs ± 0% 0.99µs ± 0% -1.39%
SprintfInt 117ns ± 0% 104ns ± 0% -11.11%
SprintfIntInt 190ns ± 0% 175ns ± 0% -7.89%
SprintfPrefixedInt 232ns ± 0% 212ns ± 0% -8.62%
SprintfFloat 270ns ± 0% 255ns ± 0% -5.56%
SprintfComplex 1.01µs ± 0% 0.99µs ± 0% -1.68%
SprintfBoolean 127ns ± 0% 111ns ± 0% -12.60%
SprintfHexString 220ns ± 0% 198ns ± 0% -10.00%
SprintfHexBytes 261ns ± 0% 252ns ± 0% -3.45%
SprintfBytes 600ns ± 0% 590ns ± 0% -1.67%
SprintfStringer 684ns ± 0% 658ns ± 0% -3.80%
SprintfStructure 2.57µs ± 0% 2.57µs ± 0% -0.12%
ManyArgs 669ns ± 0% 646ns ± 0% -3.44%
FprintInt 140ns ± 0% 136ns ± 0% -2.86%
FprintfBytes 184ns ± 0% 181ns ± 0% -1.63%
FprintIntNoAlloc 140ns ± 0% 136ns ± 0% -2.86%
ScanInts 929µs ± 0% 921µs ± 0% -0.79%
ScanRecursiveInt 122ms ± 0% 121ms ± 0% -0.11%
ScanRecursiveIntReaderWrapper 122ms ± 0% 122ms ± 0% -0.18%
Change-Id: I4d66780261b57b06ef600229e475462e7313f0d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/253748
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
2020-09-09 17:24:23 -05:00
{ name : "AtomicStoreRel64" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Lock release, returns memory.
2024-08-19 13:58:42 -07:00
{ name : "AtomicExchange8" , argLength : 3 , typ : "(UInt8,Mem)" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
2018-08-06 15:36:16 -05:00
{ name : "AtomicExchange32" , argLength : 3 , typ : "(UInt32,Mem)" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicExchange64" , argLength : 3 , typ : "(UInt64,Mem)" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicAdd32" , argLength : 3 , typ : "(UInt32,Mem)" , hasSideEffects : true } , // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
{ name : "AtomicAdd64" , argLength : 3 , typ : "(UInt64,Mem)" , hasSideEffects : true } , // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
{ name : "AtomicCompareAndSwap32" , argLength : 4 , typ : "(Bool,Mem)" , hasSideEffects : true } , // if *arg0==arg1, then set *arg0=arg2. Returns true if store happens and new memory.
{ name : "AtomicCompareAndSwap64" , argLength : 4 , typ : "(Bool,Mem)" , hasSideEffects : true } , // if *arg0==arg1, then set *arg0=arg2. Returns true if store happens and new memory.
2018-11-02 15:18:43 +00:00
{ name : "AtomicCompareAndSwapRel32" , argLength : 4 , typ : "(Bool,Mem)" , hasSideEffects : true } , // if *arg0==arg1, then set *arg0=arg2. Lock release, reports whether store happens and new memory.
2024-06-25 14:01:09 -07:00
// Older atomic logical operations which don't return the old value.
{ name : "AtomicAnd8" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns memory.
{ name : "AtomicOr8" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns memory.
{ name : "AtomicAnd32" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns memory.
{ name : "AtomicOr32" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns memory.
// Newer atomic logical operations which return the old value.
{ name : "AtomicAnd64value" , argLength : 3 , typ : "(Uint64, Mem)" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicAnd32value" , argLength : 3 , typ : "(Uint32, Mem)" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicAnd8value" , argLength : 3 , typ : "(Uint8, Mem)" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicOr64value" , argLength : 3 , typ : "(Uint64, Mem)" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicOr32value" , argLength : 3 , typ : "(Uint32, Mem)" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicOr8value" , argLength : 3 , typ : "(Uint8, Mem)" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
2016-06-08 22:02:08 -07:00
2017-11-03 02:05:28 +00:00
// Atomic operation variants
// These variants have the same semantics as above atomic operations.
// But they are used for generating more efficient code on certain modern machines, with run-time CPU feature detection.
2024-10-21 06:29:38 +00:00
// On ARM64, these are used when the LSE hardware feature is available (either known at compile time or detected at runtime). If LSE is not available,
2024-11-20 21:56:27 +08:00
// then the basic atomic operations are used instead.
cmd/compiler,internal/runtime/atomic: optimize Store{64,32,8} on loong64
On Loong64, AMSWAPDB{W,V} instructions are supported by default, and AMSWAPDB{B,H} [1]
is a new instruction added by LA664(Loongson 3A6000) and later microarchitectures.
Therefore, AMSWAPDB{W,V} (full barrier) is used to implement AtomicStore{32,64}, and
the traditional MOVB or the new AMSWAPDBB is used to implement AtomicStore8 according
to the CPU feature.
The StoreRelease barrier on Loong64 is "dbar 0x12", but it is still necessary to
ensure consistency in the order of Store/Load [2].
LoweredAtomicStorezero{32,64} was removed because on loong64 the constant "0" uses
the R0 register, and there is no performance difference between the implementations
of LoweredAtomicStorezero{32,64} and LoweredAtomicStore{32,64}.
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A5000-HV @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
AtomicStore64 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20)
AtomicStore64-2 19.61n ± 0% 13.61n ± 0% -30.57% (p=0.000 n=20)
AtomicStore64-4 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20)
AtomicStore 19.61n ± 0% 13.61n ± 0% -30.60% (p=0.000 n=20)
AtomicStore-2 19.62n ± 0% 13.61n ± 0% -30.63% (p=0.000 n=20)
AtomicStore-4 19.62n ± 0% 13.62n ± 0% -30.58% (p=0.000 n=20)
AtomicStore8 19.61n ± 0% 20.01n ± 0% +2.04% (p=0.000 n=20)
AtomicStore8-2 19.62n ± 0% 20.02n ± 0% +2.01% (p=0.000 n=20)
AtomicStore8-4 19.61n ± 0% 20.02n ± 0% +2.09% (p=0.000 n=20)
geomean 19.61n 15.48n -21.08%
goos: linux
goarch: loong64
pkg: internal/runtime/atomic
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
AtomicStore64 18.03n ± 0% 12.81n ± 0% -28.93% (p=0.000 n=20)
AtomicStore64-2 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20)
AtomicStore64-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore 18.02n ± 0% 12.81n ± 0% -28.91% (p=0.000 n=20)
AtomicStore-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8-2 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
AtomicStore8-4 18.01n ± 0% 12.81n ± 0% -28.87% (p=0.000 n=20)
geomean 18.01n 12.81n -28.89%
[1]: https://loongson.github.io/LoongArch-Documentation/LoongArch-ELF-ABI-EN.html
[2]: https://gcc.gnu.org/git/?p=gcc.git;a=blob_plain;f=gcc/config/loongarch/sync.md
Change-Id: I4ae5e8dd0e6f026129b6e503990a763ed40c6097
Reviewed-on: https://go-review.googlesource.com/c/go/+/581356
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-13 18:47:56 +08:00
{ name : "AtomicStore8Variant" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
{ name : "AtomicStore32Variant" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
{ name : "AtomicStore64Variant" , argLength : 3 , typ : "Mem" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns memory.
2020-11-04 16:18:23 +00:00
{ name : "AtomicAdd32Variant" , argLength : 3 , typ : "(UInt32,Mem)" , hasSideEffects : true } , // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
{ name : "AtomicAdd64Variant" , argLength : 3 , typ : "(UInt64,Mem)" , hasSideEffects : true } , // Do *arg0 += arg1. arg2=memory. Returns sum and new memory.
2024-10-07 20:53:01 +00:00
{ name : "AtomicExchange8Variant" , argLength : 3 , typ : "(UInt8,Mem)" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
2020-11-04 16:18:23 +00:00
{ name : "AtomicExchange32Variant" , argLength : 3 , typ : "(UInt32,Mem)" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicExchange64Variant" , argLength : 3 , typ : "(UInt64,Mem)" , hasSideEffects : true } , // Store arg1 to *arg0. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicCompareAndSwap32Variant" , argLength : 4 , typ : "(Bool,Mem)" , hasSideEffects : true } , // if *arg0==arg1, then set *arg0=arg2. Returns true if store happens and new memory.
{ name : "AtomicCompareAndSwap64Variant" , argLength : 4 , typ : "(Bool,Mem)" , hasSideEffects : true } , // if *arg0==arg1, then set *arg0=arg2. Returns true if store happens and new memory.
2024-06-25 14:01:09 -07:00
{ name : "AtomicAnd64valueVariant" , argLength : 3 , typ : "(Uint64, Mem)" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicOr64valueVariant" , argLength : 3 , typ : "(Uint64, Mem)" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicAnd32valueVariant" , argLength : 3 , typ : "(Uint32, Mem)" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicOr32valueVariant" , argLength : 3 , typ : "(Uint32, Mem)" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicAnd8valueVariant" , argLength : 3 , typ : "(Uint8, Mem)" , hasSideEffects : true } , // *arg0 &= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
{ name : "AtomicOr8valueVariant" , argLength : 3 , typ : "(Uint8, Mem)" , hasSideEffects : true } , // *arg0 |= arg1. arg2=memory. Returns old contents of *arg0 and new memory.
2017-11-03 02:05:28 +00:00
2021-06-11 09:27:09 +00:00
// Publication barrier
{ name : "PubBarrier" , argLength : 1 , hasSideEffects : true } , // Do data barrier. arg0=memory.
2016-06-08 22:02:08 -07:00
// Clobber experiment op
{ name : "Clobber" , argLength : 0 , typ : "Void" , aux : "SymOff" , symEffect : "None" } , // write an invalid pointer value to the given pointer slot of a stack variable
2021-03-17 19:15:38 -04:00
{ name : "ClobberReg" , argLength : 0 , typ : "Void" } , // clobber a register
2021-06-15 14:04:30 +00:00
// Prefetch instruction
{ name : "PrefetchCache" , argLength : 2 , hasSideEffects : true } , // Do prefetch arg0 to cache. arg0=addr, arg1=memory.
{ name : "PrefetchCacheStreamed" , argLength : 2 , hasSideEffects : true } , // Do non-temporal or streamed prefetch arg0 to cache. arg0=addr, arg1=memory.
2025-03-31 10:45:23 +11:00
2025-07-05 23:16:36 +03:00
// Helper instruction which is semantically equivalent to calling runtime.memequal, but some targets may prefer to custom lower it later, e.g. for specific constant sizes.
{ name : "MemEq" , argLength : 4 , commutative : true , typ : "Bool" } , // arg0=ptr0, arg1=ptr1, arg2=size, arg3=memory.
2025-09-18 16:08:06 -04:00
// SIMD
2025-10-10 17:42:59 +00:00
{ name : "ZeroSIMD" , argLength : 0 } , // zero value of a vector
2025-07-23 13:47:08 -04:00
// Convert integers to masks
{ name : "Cvt16toMask8x16" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt32toMask8x32" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt64toMask8x64" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt8toMask16x8" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt16toMask16x16" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt32toMask16x32" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt8toMask32x4" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt8toMask32x8" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt16toMask32x16" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt8toMask64x2" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt8toMask64x4" , argLength : 1 } , // arg0 = integer mask value
{ name : "Cvt8toMask64x8" , argLength : 1 } , // arg0 = integer mask value
2025-08-06 19:03:52 +00:00
// Convert masks to integers
{ name : "CvtMask8x16to16" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask8x32to32" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask8x64to64" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask16x8to8" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask16x16to16" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask16x32to32" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask32x4to8" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask32x8to8" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask32x16to16" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask64x2to8" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask64x4to8" , argLength : 1 } , // arg0 = mask
{ name : "CvtMask64x8to8" , argLength : 1 } , // arg0 = mask
2025-09-09 16:29:38 +00:00
// Returns true if arg0 is all zero.
{ name : "IsZeroVec" , argLength : 1 } ,
2015-06-06 16:03:33 -07:00
}
2025-02-19 16:47:31 -05:00
// kind controls successors implicit exit
// ------------------------------------------------------------
// Exit [return mem] [] yes
// Ret [return mem] [] yes
// RetJmp [return mem] [] yes
// Plain [] [next]
// If [boolean Value] [then, else]
// First [] [always, never]
// Defer [mem] [nopanic, recovery] (control opcode should be OpStaticCall to runtime.defer*)
// JumpTable [integer Value] [succ1,succ2,..]
2015-06-06 16:03:33 -07:00
var genericBlocks = [ ] blockData {
cmd/compile: implement jump tables
Performance is kind of hard to exactly quantify.
One big difference between jump tables and the old binary search
scheme is that there's only 1 branch statement instead of O(n) of
them. That can be both a blessing and a curse, and can make evaluating
jump tables very hard to do.
The single branch can become a choke point for the hardware branch
predictor. A branch table jump must fit all of its state in a single
branch predictor entry (technically, a branch target predictor entry).
With binary search that predictor state can be spread among lots of
entries. In cases where the case selection is repetitive and thus
predictable, binary search can perform better.
The big win for a jump table is that it doesn't consume so much of the
branch predictor's resources. But that benefit is essentially never
observed in microbenchmarks, because the branch predictor can easily
keep state for all the binary search branches in a microbenchmark. So
that benefit is really hard to measure.
So predictable switch microbenchmarks are ~useless - they will almost
always favor the binary search scheme. Fully unpredictable switch
microbenchmarks are better, as they aren't lying to us quite so
much. In a perfectly unpredictable situation, a jump table will expect
to incur 1-1/N branch mispredicts, where a binary search would incur
lg(N)/2 of them. That makes the crossover point at about N=4. But of
course switches in real programs are seldom fully unpredictable, so
we'll use a higher crossover point.
Beyond the branch predictor, jump tables tend to execute more
instructions per switch but have no additional instructions per case,
which also argues for a larger crossover.
As far as code size goes, with this CL cmd/go has a slightly smaller
code segment and a slightly larger overall size (from the jump tables
themselves which live in the data segment).
This is a case where some FDO (feedback-directed optimization) would
be really nice to have. #28262
Some large-program benchmarks might help make the case for this
CL. Especially if we can turn on branch mispredict counters so we can
see how much using jump tables can free up branch prediction resources
that can be gainfully used elsewhere in the program.
name old time/op new time/op delta
Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10)
Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9)
Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9)
Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10)
Fixes #5496
Update #34381
Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f
Reviewed-on: https://go-review.googlesource.com/c/go/+/357330
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2021-10-04 12:17:46 -07:00
{ name : "Plain" } , // a single successor
{ name : "If" , controls : 1 } , // if Controls[0] goto Succs[0] else goto Succs[1]
2025-02-19 16:47:31 -05:00
{ name : "Defer" , controls : 1 } , // Succs[0]=defer queued, Succs[1]=defer recovery branch (jmp performed by runtime). Controls[0] is call op (of memory type).
cmd/compile: implement jump tables
Performance is kind of hard to exactly quantify.
One big difference between jump tables and the old binary search
scheme is that there's only 1 branch statement instead of O(n) of
them. That can be both a blessing and a curse, and can make evaluating
jump tables very hard to do.
The single branch can become a choke point for the hardware branch
predictor. A branch table jump must fit all of its state in a single
branch predictor entry (technically, a branch target predictor entry).
With binary search that predictor state can be spread among lots of
entries. In cases where the case selection is repetitive and thus
predictable, binary search can perform better.
The big win for a jump table is that it doesn't consume so much of the
branch predictor's resources. But that benefit is essentially never
observed in microbenchmarks, because the branch predictor can easily
keep state for all the binary search branches in a microbenchmark. So
that benefit is really hard to measure.
So predictable switch microbenchmarks are ~useless - they will almost
always favor the binary search scheme. Fully unpredictable switch
microbenchmarks are better, as they aren't lying to us quite so
much. In a perfectly unpredictable situation, a jump table will expect
to incur 1-1/N branch mispredicts, where a binary search would incur
lg(N)/2 of them. That makes the crossover point at about N=4. But of
course switches in real programs are seldom fully unpredictable, so
we'll use a higher crossover point.
Beyond the branch predictor, jump tables tend to execute more
instructions per switch but have no additional instructions per case,
which also argues for a larger crossover.
As far as code size goes, with this CL cmd/go has a slightly smaller
code segment and a slightly larger overall size (from the jump tables
themselves which live in the data segment).
This is a case where some FDO (feedback-directed optimization) would
be really nice to have. #28262
Some large-program benchmarks might help make the case for this
CL. Especially if we can turn on branch mispredict counters so we can
see how much using jump tables can free up branch prediction resources
that can be gainfully used elsewhere in the program.
name old time/op new time/op delta
Switch8Predictable 1.89ns ± 2% 1.27ns ± 3% -32.58% (p=0.000 n=9+10)
Switch8Unpredictable 9.33ns ± 1% 7.50ns ± 1% -19.60% (p=0.000 n=10+9)
Switch32Predictable 2.20ns ± 2% 1.64ns ± 1% -25.39% (p=0.000 n=10+9)
Switch32Unpredictable 10.0ns ± 2% 7.6ns ± 2% -24.04% (p=0.000 n=10+10)
Fixes #5496
Update #34381
Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f
Reviewed-on: https://go-review.googlesource.com/c/go/+/357330
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2021-10-04 12:17:46 -07:00
{ name : "Ret" , controls : 1 } , // no successors, Controls[0] value is memory result
{ name : "RetJmp" , controls : 1 } , // no successors, Controls[0] value is a tail call
{ name : "Exit" , controls : 1 } , // no successors, Controls[0] value generates a panic
{ name : "JumpTable" , controls : 1 } , // multiple successors, the integer Controls[0] selects which one
2015-09-09 18:03:41 -07:00
2016-04-28 15:04:10 -07:00
// transient block state used for dead code removal
2015-09-09 18:03:41 -07:00
{ name : "First" } , // 2 successors, always takes the first one (second is dead)
2015-06-06 16:03:33 -07:00
}
func init ( ) {
2025-03-31 10:45:23 +11:00
genericOps = append ( genericOps , simdGenericOps ( ) ... )
2016-03-12 14:07:40 -08:00
archs = append ( archs , arch {
2016-03-21 22:57:26 -07:00
name : "generic" ,
ops : genericOps ,
blocks : genericBlocks ,
generic : true ,
2016-03-12 14:07:40 -08:00
} )
2015-06-06 16:03:33 -07:00
}