mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
753 lines
18 KiB
Go
753 lines
18 KiB
Go
// asmcheck
|
|
|
|
// Copyright 2018 The Go Authors. All rights reserved.
|
|
// Use of this source code is governed by a BSD-style
|
|
// license that can be found in the LICENSE file.
|
|
|
|
package codegen
|
|
|
|
// This file contains codegen tests related to arithmetic
|
|
// simplifications and optimizations on integer types.
|
|
// For codegen tests on float types, see floats.go.
|
|
|
|
// Addition
|
|
|
|
func AddLargeConst(a uint64, out []uint64) {
|
|
// ppc64x/power10:"ADD [$]4294967296,"
|
|
// ppc64x/power9:"MOVD [$]1", "SLD [$]32" "ADD R[0-9]*"
|
|
// ppc64x/power8:"MOVD [$]1", "SLD [$]32" "ADD R[0-9]*"
|
|
out[0] = a + 0x100000000
|
|
// ppc64x/power10:"ADD [$]-8589934592,"
|
|
// ppc64x/power9:"MOVD [$]-1", "SLD [$]33" "ADD R[0-9]*"
|
|
// ppc64x/power8:"MOVD [$]-1", "SLD [$]33" "ADD R[0-9]*"
|
|
out[1] = a + 0xFFFFFFFE00000000
|
|
// ppc64x/power10:"ADD [$]1234567,"
|
|
// ppc64x/power9:"ADDIS [$]19,", "ADD [$]-10617,"
|
|
// ppc64x/power8:"ADDIS [$]19,", "ADD [$]-10617,"
|
|
out[2] = a + 1234567
|
|
// ppc64x/power10:"ADD [$]-1234567,"
|
|
// ppc64x/power9:"ADDIS [$]-19,", "ADD [$]10617,"
|
|
// ppc64x/power8:"ADDIS [$]-19,", "ADD [$]10617,"
|
|
out[3] = a - 1234567
|
|
// ppc64x/power10:"ADD [$]2147450879,"
|
|
// ppc64x/power9:"ADDIS [$]32767,", "ADD [$]32767,"
|
|
// ppc64x/power8:"ADDIS [$]32767,", "ADD [$]32767,"
|
|
out[4] = a + 0x7FFF7FFF
|
|
// ppc64x/power10:"ADD [$]-2147483647,"
|
|
// ppc64x/power9:"ADDIS [$]-32768,", "ADD [$]1,"
|
|
// ppc64x/power8:"ADDIS [$]-32768,", "ADD [$]1,"
|
|
out[5] = a - 2147483647
|
|
// ppc64x:"ADDIS [$]-32768,", ^"ADD "
|
|
out[6] = a - 2147483648
|
|
// ppc64x:"ADD [$]2147450880,", ^"ADDIS "
|
|
out[7] = a + 0x7FFF8000
|
|
// ppc64x:"ADD [$]-32768,", ^"ADDIS "
|
|
out[8] = a - 32768
|
|
// ppc64x/power10:"ADD [$]-32769,"
|
|
// ppc64x/power9:"ADDIS [$]-1,", "ADD [$]32767,"
|
|
// ppc64x/power8:"ADDIS [$]-1,", "ADD [$]32767,"
|
|
out[9] = a - 32769
|
|
}
|
|
|
|
func AddLargeConst2(a int, out []int) {
|
|
// loong64: -"ADDVU" "ADDV16"
|
|
out[0] = a + 0x10000
|
|
}
|
|
|
|
// Subtraction
|
|
|
|
var ef int
|
|
|
|
func SubMem(arr []int, b, c, d int) int {
|
|
// 386:`SUBL\s[A-Z]+,\s8\([A-Z]+\)`
|
|
// amd64:`SUBQ\s[A-Z]+,\s16\([A-Z]+\)`
|
|
arr[2] -= b
|
|
// 386:`SUBL\s[A-Z]+,\s12\([A-Z]+\)`
|
|
// amd64:`SUBQ\s[A-Z]+,\s24\([A-Z]+\)`
|
|
arr[3] -= b
|
|
// 386:`DECL\s16\([A-Z]+\)`
|
|
arr[4]--
|
|
// 386:`ADDL\s[$]-20,\s20\([A-Z]+\)`
|
|
arr[5] -= 20
|
|
// 386:`SUBL\s\([A-Z]+\)\([A-Z]+\*4\),\s[A-Z]+`
|
|
ef -= arr[b]
|
|
// 386:`SUBL\s[A-Z]+,\s\([A-Z]+\)\([A-Z]+\*4\)`
|
|
arr[c] -= b
|
|
// 386:`ADDL\s[$]-15,\s\([A-Z]+\)\([A-Z]+\*4\)`
|
|
arr[d] -= 15
|
|
// 386:`DECL\s\([A-Z]+\)\([A-Z]+\*4\)`
|
|
arr[b]--
|
|
// amd64:`DECQ\s64\([A-Z]+\)`
|
|
arr[8]--
|
|
// 386:"SUBL 4"
|
|
// amd64:"SUBQ 8"
|
|
return arr[0] - arr[1]
|
|
}
|
|
|
|
func SubFromConst(a int) int {
|
|
// ppc64x: `SUBC R[0-9]+,\s[$]40,\sR`
|
|
// riscv64: "ADDI [$]-40" "NEG"
|
|
b := 40 - a
|
|
return b
|
|
}
|
|
|
|
func SubFromConstNeg(a int) int {
|
|
// arm64: "ADD [$]40"
|
|
// loong64: "ADDV[U] [$]40"
|
|
// mips: "ADD[U] [$]40"
|
|
// mips64: "ADDV[U] [$]40"
|
|
// ppc64x: `ADD [$]40,\sR[0-9]+,\sR`
|
|
// riscv64: "ADDI [$]40" -"NEG"
|
|
c := 40 - (-a)
|
|
return c
|
|
}
|
|
|
|
func SubSubFromConst(a int) int {
|
|
// arm64: "ADD [$]20"
|
|
// loong64: "ADDV[U] [$]20"
|
|
// mips: "ADD[U] [$]20"
|
|
// mips64: "ADDV[U] [$]20"
|
|
// ppc64x: `ADD [$]20,\sR[0-9]+,\sR`
|
|
// riscv64: "ADDI [$]20" -"NEG"
|
|
c := 40 - (20 - a)
|
|
return c
|
|
}
|
|
|
|
func AddSubFromConst(a int) int {
|
|
// ppc64x: `SUBC R[0-9]+,\s[$]60,\sR`
|
|
// riscv64: "ADDI [$]-60" "NEG"
|
|
c := 40 + (20 - a)
|
|
return c
|
|
}
|
|
|
|
func NegSubFromConst(a int) int {
|
|
// arm64: "SUB [$]20"
|
|
// loong64: "ADDV[U] [$]-20"
|
|
// mips: "ADD[U] [$]-20"
|
|
// mips64: "ADDV[U] [$]-20"
|
|
// ppc64x: `ADD [$]-20,\sR[0-9]+,\sR`
|
|
// riscv64: "ADDI [$]-20"
|
|
c := -(20 - a)
|
|
return c
|
|
}
|
|
|
|
func NegAddFromConstNeg(a int) int {
|
|
// arm64: "SUB [$]40" "NEG"
|
|
// loong64: "ADDV[U] [$]-40" "SUBV"
|
|
// mips: "ADD[U] [$]-40" "SUB"
|
|
// mips64: "ADDV[U] [$]-40" "SUBV"
|
|
// ppc64x: `SUBC R[0-9]+,\s[$]40,\sR`
|
|
// riscv64: "ADDI [$]-40" "NEG"
|
|
c := -(-40 + a)
|
|
return c
|
|
}
|
|
|
|
func SubSubNegSimplify(a, b int) int {
|
|
// amd64:"NEGQ"
|
|
// arm64:"NEG"
|
|
// loong64:"SUBV"
|
|
// mips:"SUB"
|
|
// mips64:"SUBV"
|
|
// ppc64x:"NEG"
|
|
// riscv64:"NEG" -"SUB"
|
|
r := (a - b) - a
|
|
return r
|
|
}
|
|
|
|
func SubAddSimplify(a, b int) int {
|
|
// amd64:-"SUBQ" -"ADDQ"
|
|
// arm64:-"SUB" -"ADD"
|
|
// loong64:-"SUBV" -"ADDV"
|
|
// mips:-"SUB" -"ADD"
|
|
// mips64:-"SUBV" -"ADDV"
|
|
// ppc64x:-"SUB" -"ADD"
|
|
// riscv64:-"SUB" -"ADD"
|
|
r := a + (b - a)
|
|
return r
|
|
}
|
|
|
|
func SubAddSimplify2(a, b, c int) (int, int, int, int, int, int) {
|
|
// amd64:-"ADDQ"
|
|
// arm64:-"ADD"
|
|
// mips:"SUB" -"ADD"
|
|
// mips64:"SUBV" -"ADDV"
|
|
// loong64:"SUBV" -"ADDV"
|
|
r := (a + b) - (a + c)
|
|
// amd64:-"ADDQ"
|
|
r1 := (a + b) - (c + a)
|
|
// amd64:-"ADDQ"
|
|
r2 := (b + a) - (a + c)
|
|
// amd64:-"ADDQ"
|
|
r3 := (b + a) - (c + a)
|
|
// amd64:-"SUBQ"
|
|
// arm64:-"SUB"
|
|
// mips:"ADD" -"SUB"
|
|
// mips64:"ADDV" -"SUBV"
|
|
// loong64:"ADDV" -"SUBV"
|
|
r4 := (a - c) + (c + b)
|
|
// amd64:-"SUBQ"
|
|
r5 := (a - c) + (b + c)
|
|
return r, r1, r2, r3, r4, r5
|
|
}
|
|
|
|
func SubAddNegSimplify(a, b int) int {
|
|
// amd64:"NEGQ" -"ADDQ" -"SUBQ"
|
|
// arm64:"NEG" -"ADD" -"SUB"
|
|
// loong64:"SUBV" -"ADDV"
|
|
// mips:"SUB" -"ADD"
|
|
// mips64:"SUBV" -"ADDV"
|
|
// ppc64x:"NEG" -"ADD" -"SUB"
|
|
// riscv64:"NEG" -"ADD" -"SUB"
|
|
r := a - (b + a)
|
|
return r
|
|
}
|
|
|
|
func AddAddSubSimplify(a, b, c int) int {
|
|
// amd64:-"SUBQ"
|
|
// arm64:"ADD" -"SUB"
|
|
// loong64:"ADDV" -"SUBV"
|
|
// mips:"ADD" -"SUB"
|
|
// mips64:"ADDV" -"SUBV"
|
|
// ppc64x:-"SUB"
|
|
// riscv64:"ADD" "ADD" -"SUB"
|
|
r := a + (b + (c - a))
|
|
return r
|
|
}
|
|
|
|
func NegToInt32(a int) int {
|
|
// riscv64: "NEGW" -"MOVW"
|
|
r := int(int32(-a))
|
|
return r
|
|
}
|
|
|
|
// -------------------- //
|
|
// Multiplication //
|
|
// -------------------- //
|
|
|
|
func Pow2Muls(n1, n2 int) (int, int) {
|
|
// amd64:"SHLQ [$]5" -"IMULQ"
|
|
// 386:"SHLL [$]5" -"IMULL"
|
|
// arm:"SLL [$]5" -"MUL"
|
|
// arm64:"LSL [$]5" -"MUL"
|
|
// loong64:"SLLV [$]5" -"MULV"
|
|
// ppc64x:"SLD [$]5" -"MUL"
|
|
a := n1 * 32
|
|
|
|
// amd64:"SHLQ [$]6" -"IMULQ"
|
|
// 386:"SHLL [$]6" -"IMULL"
|
|
// arm:"SLL [$]6" -"MUL"
|
|
// arm64:`NEG\sR[0-9]+<<6,\sR[0-9]+`,-`LSL`,-`MUL`
|
|
// loong64:"SLLV [$]6" -"MULV"
|
|
// ppc64x:"SLD [$]6" "NEG\\sR[0-9]+,\\sR[0-9]+" -"MUL"
|
|
b := -64 * n2
|
|
|
|
return a, b
|
|
}
|
|
|
|
func Mul_2(n1 int32, n2 int64) (int32, int64) {
|
|
// amd64:"ADDL", -"SHLL"
|
|
a := n1 * 2
|
|
// amd64:"ADDQ", -"SHLQ"
|
|
b := n2 * 2
|
|
|
|
return a, b
|
|
}
|
|
|
|
func Mul_96(n int) int {
|
|
// amd64:`SHLQ [$]5`,`LEAQ \(.*\)\(.*\*2\),`,-`IMULQ`
|
|
// 386:`SHLL [$]5`,`LEAL \(.*\)\(.*\*2\),`,-`IMULL`
|
|
// arm64:`LSL [$]5`,`ADD\sR[0-9]+<<1,\sR[0-9]+`,-`MUL`
|
|
// arm:`SLL [$]5`,`ADD\sR[0-9]+<<1,\sR[0-9]+`,-`MUL`
|
|
// loong64:"SLLV [$]5" "ALSLV [$]1,"
|
|
// s390x:`SLD [$]5`,`SLD [$]6`,-`MULLD`
|
|
return n * 96
|
|
}
|
|
|
|
func Mul_n120(n int) int {
|
|
// loong64:"SLLV [$]3" "SLLV [$]7" "SUBVU" -"MULV"
|
|
// s390x:`SLD [$]3`,`SLD [$]7`,-`MULLD`
|
|
return n * -120
|
|
}
|
|
|
|
func MulMemSrc(a []uint32, b []float32) {
|
|
// 386:`IMULL\s4\([A-Z]+\),\s[A-Z]+`
|
|
a[0] *= a[1]
|
|
// 386/sse2:`MULSS\s4\([A-Z]+\),\sX[0-9]+`
|
|
// amd64:`MULSS\s4\([A-Z]+\),\sX[0-9]+`
|
|
b[0] *= b[1]
|
|
}
|
|
|
|
// Multiplications merging tests
|
|
|
|
func MergeMuls1(n int) int {
|
|
// amd64:"IMUL3Q [$]46"
|
|
// 386:"IMUL3L [$]46"
|
|
// ppc64x:"MULLD [$]46"
|
|
return 15*n + 31*n // 46n
|
|
}
|
|
|
|
func MergeMuls2(n int) int {
|
|
// amd64:"IMUL3Q [$]23" "(ADDQ [$]29)|(LEAQ 29)"
|
|
// 386:"IMUL3L [$]23" "ADDL [$]29"
|
|
// ppc64x/power9:"MADDLD" -"MULLD [$]23" -"ADD [$]29"
|
|
// ppc64x/power8:"MULLD [$]23" "ADD [$]29"
|
|
return 5*n + 7*(n+1) + 11*(n+2) // 23n + 29
|
|
}
|
|
|
|
func MergeMuls3(a, n int) int {
|
|
// amd64:"ADDQ [$]19" -"IMULQ [$]19"
|
|
// 386:"ADDL [$]19" -"IMULL [$]19"
|
|
// ppc64x:"ADD [$]19" -"MULLD [$]19"
|
|
return a*n + 19*n // (a+19)n
|
|
}
|
|
|
|
func MergeMuls4(n int) int {
|
|
// amd64:"IMUL3Q [$]14"
|
|
// 386:"IMUL3L [$]14"
|
|
// ppc64x:"MULLD [$]14"
|
|
return 23*n - 9*n // 14n
|
|
}
|
|
|
|
func MergeMuls5(a, n int) int {
|
|
// amd64:"ADDQ [$]-19" -"IMULQ [$]19"
|
|
// 386:"ADDL [$]-19" -"IMULL [$]19"
|
|
// ppc64x:"ADD [$]-19" -"MULLD [$]19"
|
|
return a*n - 19*n // (a-19)n
|
|
}
|
|
|
|
// Multiplications folded negation
|
|
|
|
func FoldNegMul(a int) int {
|
|
// loong64:"SUBVU" "ALSLV [$]2" "ALSLV [$]1"
|
|
return (-a) * 11
|
|
}
|
|
|
|
func Fold2NegMul(a, b int) int {
|
|
// loong64:"MULV" -"SUBVU R[0-9], R0,"
|
|
return (-a) * (-b)
|
|
}
|
|
|
|
// -------------- //
|
|
// Division //
|
|
// -------------- //
|
|
|
|
func DivMemSrc(a []float64) {
|
|
// 386/sse2:`DIVSD\s8\([A-Z]+\),\sX[0-9]+`
|
|
// amd64:`DIVSD\s8\([A-Z]+\),\sX[0-9]+`
|
|
a[0] /= a[1]
|
|
}
|
|
|
|
func Pow2Divs(n1 uint, n2 int) (uint, int) {
|
|
// 386:"SHRL [$]5" -"DIVL"
|
|
// amd64:"SHRQ [$]5" -"DIVQ"
|
|
// arm:"SRL [$]5" -".*udiv"
|
|
// arm64:"LSR [$]5" -"UDIV"
|
|
// ppc64x:"SRD"
|
|
a := n1 / 32 // unsigned
|
|
|
|
// amd64:"SARQ [$]6" -"IDIVQ"
|
|
// 386:"SARL [$]6" -"IDIVL"
|
|
// arm:"SRA [$]6" -".*udiv"
|
|
// arm64:"ASR [$]6" -"SDIV"
|
|
// ppc64x:"SRAD"
|
|
b := n2 / 64 // signed
|
|
|
|
return a, b
|
|
}
|
|
|
|
// Check that constant divisions get turned into MULs
|
|
func ConstDivs(n1 uint, n2 int) (uint, int) {
|
|
// amd64: "MOVQ [$]-1085102592571150095" "MULQ" -"DIVQ"
|
|
// 386: "MOVL [$]-252645135" "MULL" -"DIVL"
|
|
// arm64: `MOVD`,`UMULH`,-`DIV`
|
|
// arm: `MOVW`,`MUL`,-`.*udiv`
|
|
a := n1 / 17 // unsigned
|
|
|
|
// amd64: "MOVQ [$]-1085102592571150095" "IMULQ" -"IDIVQ"
|
|
// 386: "IMULL" "SARL [$]4," "SARL [$]31," "SUBL" -".*DIV"
|
|
// arm64: `SMULH` -`DIV`
|
|
// arm: `MOVW` `MUL` -`.*udiv`
|
|
b := n2 / 17 // signed
|
|
|
|
return a, b
|
|
}
|
|
|
|
func FloatDivs(a []float32) float32 {
|
|
// amd64:`DIVSS\s8\([A-Z]+\),\sX[0-9]+`
|
|
// 386/sse2:`DIVSS\s8\([A-Z]+\),\sX[0-9]+`
|
|
return a[1] / a[2]
|
|
}
|
|
|
|
func Pow2Mods(n1 uint, n2 int) (uint, int) {
|
|
// 386:"ANDL [$]31" -"DIVL"
|
|
// amd64:"ANDL [$]31" -"DIVQ"
|
|
// arm:"AND [$]31" -".*udiv"
|
|
// arm64:"AND [$]31" -"UDIV"
|
|
// ppc64x:"RLDICL"
|
|
a := n1 % 32 // unsigned
|
|
|
|
// 386:"SHRL" -"IDIVL"
|
|
// amd64:"SHRQ" -"IDIVQ"
|
|
// arm:"SRA" -".*udiv"
|
|
// arm64:"ASR" -"REM"
|
|
// ppc64x:"SRAD"
|
|
b := n2 % 64 // signed
|
|
|
|
return a, b
|
|
}
|
|
|
|
// Check that signed divisibility checks get converted to AND on low bits
|
|
func Pow2DivisibleSigned(n1, n2 int) (bool, bool) {
|
|
// 386:"TESTL [$]63" -"DIVL" -"SHRL"
|
|
// amd64:"TESTQ [$]63" -"DIVQ" -"SHRQ"
|
|
// arm:"AND [$]63" -".*udiv" -"SRA"
|
|
// arm64:"TST [$]63" -"UDIV" -"ASR" -"AND"
|
|
// ppc64x:"ANDCC" -"RLDICL" -"SRAD" -"CMP"
|
|
a := n1%64 == 0 // signed divisible
|
|
|
|
// 386:"TESTL [$]63" -"DIVL" -"SHRL"
|
|
// amd64:"TESTQ [$]63" -"DIVQ" -"SHRQ"
|
|
// arm:"AND [$]63" -".*udiv" -"SRA"
|
|
// arm64:"TST [$]63" -"UDIV" -"ASR" -"AND"
|
|
// ppc64x:"ANDCC" -"RLDICL" -"SRAD" -"CMP"
|
|
b := n2%64 != 0 // signed indivisible
|
|
|
|
return a, b
|
|
}
|
|
|
|
// Check that constant modulo divs get turned into MULs
|
|
func ConstMods(n1 uint, n2 int) (uint, int) {
|
|
// amd64: "MOVQ [$]-1085102592571150095" "MULQ" -"DIVQ"
|
|
// 386: "MOVL [$]-252645135" "MULL" -".*DIVL"
|
|
// arm64: `MOVD` `UMULH` -`DIV`
|
|
// arm: `MOVW` `MUL` -`.*udiv`
|
|
a := n1 % 17 // unsigned
|
|
|
|
// amd64: "MOVQ [$]-1085102592571150095" "IMULQ" -"IDIVQ"
|
|
// 386: "IMULL" "SARL [$]4," "SARL [$]31," "SUBL" "SHLL [$]4," "SUBL" -".*DIV"
|
|
// arm64: `SMULH` -`DIV`
|
|
// arm: `MOVW` `MUL` -`.*udiv`
|
|
b := n2 % 17 // signed
|
|
|
|
return a, b
|
|
}
|
|
|
|
// Check that divisibility checks x%c==0 are converted to MULs and rotates
|
|
func DivisibleU(n uint) (bool, bool) {
|
|
// amd64:"MOVQ [$]-6148914691236517205" "IMULQ" "ROLQ [$]63" -"DIVQ"
|
|
// 386:"IMUL3L [$]-1431655765" "ROLL [$]31" -"DIVQ"
|
|
// arm64:"MOVD [$]-6148914691236517205" "MOVD [$]3074457345618258602" "MUL" "ROR" -"DIV"
|
|
// arm:"MUL" "CMP [$]715827882" -".*udiv"
|
|
// ppc64x:"MULLD" "ROTL [$]63"
|
|
even := n%6 == 0
|
|
|
|
// amd64:"MOVQ [$]-8737931403336103397" "IMULQ" -"ROLQ" -"DIVQ"
|
|
// 386:"IMUL3L [$]678152731" -"ROLL" -"DIVQ"
|
|
// arm64:"MOVD [$]-8737931403336103397" "MUL" -"ROR" -"DIV"
|
|
// arm:"MUL" "CMP [$]226050910" -".*udiv"
|
|
// ppc64x:"MULLD" -"ROTL"
|
|
odd := n%19 == 0
|
|
|
|
return even, odd
|
|
}
|
|
|
|
func Divisible(n int) (bool, bool) {
|
|
// amd64:"IMULQ" "ADD" "ROLQ [$]63" -"DIVQ"
|
|
// 386:"IMUL3L [$]-1431655765" "ADDL [$]715827882" "ROLL [$]31" -"DIVQ"
|
|
// arm64:"MOVD [$]-6148914691236517205" "MOVD [$]3074457345618258602" "MUL" "ADD R" "ROR" -"DIV"
|
|
// arm:"MUL" "ADD [$]715827882" -".*udiv"
|
|
// ppc64x/power8:"MULLD" "ADD" "ROTL [$]63"
|
|
// ppc64x/power9:"MADDLD" "ROTL [$]63"
|
|
even := n%6 == 0
|
|
|
|
// amd64:"IMULQ" "ADD" -"ROLQ" -"DIVQ"
|
|
// 386:"IMUL3L [$]678152731" "ADDL [$]113025455" -"ROLL" -"DIVQ"
|
|
// arm64:"MUL" "MOVD [$]485440633518672410" "ADD" -"ROR" -"DIV"
|
|
// arm:"MUL" "ADD [$]113025455" -".*udiv"
|
|
// ppc64x/power8:"MULLD" "ADD" -"ROTL"
|
|
// ppc64x/power9:"MADDLD" -"ROTL"
|
|
odd := n%19 == 0
|
|
|
|
return even, odd
|
|
}
|
|
|
|
// Check that fix-up code is not generated for divisions where it has been proven that
|
|
// that the divisor is not -1 or that the dividend is > MinIntNN.
|
|
func NoFix64A(divr int64) (int64, int64) {
|
|
var d int64 = 42
|
|
var e int64 = 84
|
|
if divr > 5 {
|
|
d /= divr // amd64:-"JMP"
|
|
e %= divr // amd64:-"JMP"
|
|
// The following statement is to avoid conflict between the above check
|
|
// and the normal JMP generated at the end of the block.
|
|
d += e
|
|
}
|
|
return d, e
|
|
}
|
|
|
|
func NoFix64B(divd int64) (int64, int64) {
|
|
var d int64
|
|
var e int64
|
|
var divr int64 = -1
|
|
if divd > -9223372036854775808 {
|
|
d = divd / divr // amd64:-"JMP"
|
|
e = divd % divr // amd64:-"JMP"
|
|
d += e
|
|
}
|
|
return d, e
|
|
}
|
|
|
|
func NoFix32A(divr int32) (int32, int32) {
|
|
var d int32 = 42
|
|
var e int32 = 84
|
|
if divr > 5 {
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
d /= divr
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
e %= divr
|
|
d += e
|
|
}
|
|
return d, e
|
|
}
|
|
|
|
func NoFix32B(divd int32) (int32, int32) {
|
|
var d int32
|
|
var e int32
|
|
var divr int32 = -1
|
|
if divd > -2147483648 {
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
d = divd / divr
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
e = divd % divr
|
|
d += e
|
|
}
|
|
return d, e
|
|
}
|
|
|
|
func NoFix16A(divr int16) (int16, int16) {
|
|
var d int16 = 42
|
|
var e int16 = 84
|
|
if divr > 5 {
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
d /= divr
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
e %= divr
|
|
d += e
|
|
}
|
|
return d, e
|
|
}
|
|
|
|
func NoFix16B(divd int16) (int16, int16) {
|
|
var d int16
|
|
var e int16
|
|
var divr int16 = -1
|
|
if divd > -32768 {
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
d = divd / divr
|
|
// amd64:-"JMP"
|
|
// 386:-"JMP"
|
|
e = divd % divr
|
|
d += e
|
|
}
|
|
return d, e
|
|
}
|
|
|
|
// Check that len() and cap() calls divided by powers of two are
|
|
// optimized into shifts and ands
|
|
|
|
func LenDiv1(a []int) int {
|
|
// 386:"SHRL [$]10"
|
|
// amd64:"SHRQ [$]10"
|
|
// arm64:"LSR [$]10" -"SDIV"
|
|
// arm:"SRL [$]10" -".*udiv"
|
|
// ppc64x:"SRD" [$]10"
|
|
return len(a) / 1024
|
|
}
|
|
|
|
func LenDiv2(s string) int {
|
|
// 386:"SHRL [$]11"
|
|
// amd64:"SHRQ [$]11"
|
|
// arm64:"LSR [$]11" -"SDIV"
|
|
// arm:"SRL [$]11" -".*udiv"
|
|
// ppc64x:"SRD [$]11"
|
|
return len(s) / (4097 >> 1)
|
|
}
|
|
|
|
func LenMod1(a []int) int {
|
|
// 386:"ANDL [$]1023"
|
|
// amd64:"ANDL [$]1023"
|
|
// arm64:"AND [$]1023" -"SDIV"
|
|
// arm/6:"AND" -".*udiv"
|
|
// arm/7:"BFC" -".*udiv" -"AND"
|
|
// ppc64x:"RLDICL"
|
|
return len(a) % 1024
|
|
}
|
|
|
|
func LenMod2(s string) int {
|
|
// 386:"ANDL [$]2047"
|
|
// amd64:"ANDL [$]2047"
|
|
// arm64:"AND [$]2047" -"SDIV"
|
|
// arm/6:"AND" -".*udiv"
|
|
// arm/7:"BFC" -".*udiv" -"AND"
|
|
// ppc64x:"RLDICL"
|
|
return len(s) % (4097 >> 1)
|
|
}
|
|
|
|
func CapDiv(a []int) int {
|
|
// 386:"SHRL [$]12"
|
|
// amd64:"SHRQ [$]12"
|
|
// arm64:"LSR [$]12" -"SDIV"
|
|
// arm:"SRL [$]12" -".*udiv"
|
|
// ppc64x:"SRD [$]12"
|
|
return cap(a) / ((1 << 11) + 2048)
|
|
}
|
|
|
|
func CapMod(a []int) int {
|
|
// 386:"ANDL [$]4095"
|
|
// amd64:"ANDL [$]4095"
|
|
// arm64:"AND [$]4095" -"SDIV"
|
|
// arm/6:"AND" -".*udiv"
|
|
// arm/7:"BFC" -".*udiv" -"AND"
|
|
// ppc64x:"RLDICL"
|
|
return cap(a) % ((1 << 11) + 2048)
|
|
}
|
|
|
|
func AddMul(x int) int {
|
|
// amd64:"LEAQ 1"
|
|
return 2*x + 1
|
|
}
|
|
|
|
func AddShift(a, b int) int {
|
|
// loong64: "ALSLV"
|
|
return a + (b << 4)
|
|
}
|
|
|
|
func MULA(a, b, c uint32) (uint32, uint32, uint32) {
|
|
// arm:`MULA`,-`MUL\s`
|
|
// arm64:`MADDW`,-`MULW`
|
|
r0 := a*b + c
|
|
// arm:`MULA`,-`MUL\s`
|
|
// arm64:`MADDW`,-`MULW`
|
|
r1 := c*79 + a
|
|
// arm:`ADD`,-`MULA`,-`MUL\s`
|
|
// arm64:`ADD`,-`MADD`,-`MULW`
|
|
// ppc64x:`ADD`,-`MULLD`
|
|
r2 := b*64 + c
|
|
return r0, r1, r2
|
|
}
|
|
|
|
func MULS(a, b, c uint32) (uint32, uint32, uint32) {
|
|
// arm/7:`MULS`,-`MUL\s`
|
|
// arm/6:`SUB`,`MUL\s`,-`MULS`
|
|
// arm64:`MSUBW`,-`MULW`
|
|
r0 := c - a*b
|
|
// arm/7:`MULS`,-`MUL\s`
|
|
// arm/6:`SUB`,`MUL\s`,-`MULS`
|
|
// arm64:`MSUBW`,-`MULW`
|
|
r1 := a - c*79
|
|
// arm/7:`SUB`,-`MULS`,-`MUL\s`
|
|
// arm64:`SUB`,-`MSUBW`,-`MULW`
|
|
// ppc64x:`SUB`,-`MULLD`
|
|
r2 := c - b*64
|
|
return r0, r1, r2
|
|
}
|
|
|
|
func addSpecial(a, b, c uint32) (uint32, uint32, uint32) {
|
|
// amd64:`INCL`
|
|
a++
|
|
// amd64:`DECL`
|
|
b--
|
|
// amd64:`SUBL.*-128`
|
|
c += 128
|
|
return a, b, c
|
|
}
|
|
|
|
// Divide -> shift rules usually require fixup for negative inputs.
|
|
// If the input is non-negative, make sure the unsigned form is generated.
|
|
func divInt(v int64) int64 {
|
|
if v < 0 {
|
|
// amd64:`SARQ.*63,`, `SHRQ.*56,`, `SARQ.*8,`
|
|
return v / 256
|
|
}
|
|
// amd64:-`.*SARQ`, `SHRQ.*9,`
|
|
return v / 512
|
|
}
|
|
|
|
// The reassociate rules "x - (z + C) -> (x - z) - C" and
|
|
// "(z + C) -x -> C + (z - x)" can optimize the following cases.
|
|
func constantFold1(i0, j0, i1, j1, i2, j2, i3, j3 int) (int, int, int, int) {
|
|
// arm64:"SUB" "ADD [$]2"
|
|
// ppc64x:"SUB" "ADD [$]2"
|
|
r0 := (i0 + 3) - (j0 + 1)
|
|
// arm64:"SUB" "SUB [$]4"
|
|
// ppc64x:"SUB" "ADD [$]-4"
|
|
r1 := (i1 - 3) - (j1 + 1)
|
|
// arm64:"SUB" "ADD [$]4"
|
|
// ppc64x:"SUB" "ADD [$]4"
|
|
r2 := (i2 + 3) - (j2 - 1)
|
|
// arm64:"SUB" "SUB [$]2"
|
|
// ppc64x:"SUB" "ADD [$]-2"
|
|
r3 := (i3 - 3) - (j3 - 1)
|
|
return r0, r1, r2, r3
|
|
}
|
|
|
|
// The reassociate rules "x - (z + C) -> (x - z) - C" and
|
|
// "(C - z) - x -> C - (z + x)" can optimize the following cases.
|
|
func constantFold2(i0, j0, i1, j1 int) (int, int) {
|
|
// arm64:"ADD" "MOVD [$]2" "SUB"
|
|
// ppc64x: `SUBC R[0-9]+,\s[$]2,\sR`
|
|
r0 := (3 - i0) - (j0 + 1)
|
|
// arm64:"ADD" "MOVD [$]4" "SUB"
|
|
// ppc64x: `SUBC R[0-9]+,\s[$]4,\sR`
|
|
r1 := (3 - i1) - (j1 - 1)
|
|
return r0, r1
|
|
}
|
|
|
|
func constantFold3(i, j int) int {
|
|
// arm64: "LSL [$]5," "SUB R[0-9]+<<1," -"ADD"
|
|
// ppc64x:"MULLD [$]30" "MULLD"
|
|
r := (5 * i) * (6 * j)
|
|
return r
|
|
}
|
|
|
|
// Integer Min/Max
|
|
|
|
func Int64Min(a, b int64) int64 {
|
|
// amd64: "CMPQ" "CMOVQLT"
|
|
// arm64: "CMP" "CSEL"
|
|
// riscv64/rva20u64:"BLT "
|
|
// riscv64/rva22u64,riscv64/rva23u64:"MIN "
|
|
return min(a, b)
|
|
}
|
|
|
|
func Int64Max(a, b int64) int64 {
|
|
// amd64: "CMPQ" "CMOVQGT"
|
|
// arm64: "CMP" "CSEL"
|
|
// riscv64/rva20u64:"BLT "
|
|
// riscv64/rva22u64,riscv64/rva23u64:"MAX "
|
|
return max(a, b)
|
|
}
|
|
|
|
func Uint64Min(a, b uint64) uint64 {
|
|
// amd64: "CMPQ" "CMOVQCS"
|
|
// arm64: "CMP" "CSEL"
|
|
// riscv64/rva20u64:"BLTU"
|
|
// riscv64/rva22u64,riscv64/rva23u64:"MINU"
|
|
return min(a, b)
|
|
}
|
|
|
|
func Uint64Max(a, b uint64) uint64 {
|
|
// amd64: "CMPQ" "CMOVQHI"
|
|
// arm64: "CMP" "CSEL"
|
|
// riscv64/rva20u64:"BLTU"
|
|
// riscv64/rva22u64,riscv64/rva23u64:"MAXU"
|
|
return max(a, b)
|
|
}
|