mirror of
https://github.com/golang/go.git
synced 2025-12-08 06:10:04 +00:00
3 commits
| Author | SHA1 | Message | Date | |
|---|---|---|---|---|
|
|
6e165b4d17 |
cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm
This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.
goos: wasip1
goarch: wasm
pkg: internal/strconv
│ old.txt │ new.txt │
│ sec/op │ sec/op vs base │
AppendFloat/Decimal 192.8n ± 1% 194.6n ± 0% +0.91% (p=0.000 n=10)
AppendFloat/Float 328.6n ± 0% 279.6n ± 0% -14.93% (p=0.000 n=10)
AppendFloat/Exp 335.6n ± 1% 289.2n ± 1% -13.80% (p=0.000 n=10)
AppendFloat/NegExp 336.0n ± 0% 289.1n ± 1% -13.97% (p=0.000 n=10)
AppendFloat/LongExp 332.4n ± 0% 285.2n ± 1% -14.20% (p=0.000 n=10)
AppendFloat/Big 348.2n ± 0% 300.1n ± 0% -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp 137.4n ± 0% 138.2n ± 0% +0.55% (p=0.001 n=10)
AppendFloat/32Integer 193.3n ± 1% 196.5n ± 0% +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction 283.3n ± 0% 268.9n ± 1% -5.08% (p=0.000 n=10)
AppendFloat/32Point 279.9n ± 0% 266.5n ± 0% -4.80% (p=0.000 n=10)
AppendFloat/32Exp 300.1n ± 0% 288.3n ± 1% -3.90% (p=0.000 n=10)
AppendFloat/32NegExp 288.2n ± 1% 277.9n ± 1% -3.59% (p=0.000 n=10)
AppendFloat/32Shortest 261.7n ± 0% 250.2n ± 0% -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard 173.3n ± 1% 158.9n ± 1% -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard 180.0n ± 0% 167.9n ± 2% -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1 167.1n ± 0% 149.6n ± 1% -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2 162.4n ± 1% 146.5n ± 0% -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5 165.5n ± 0% 149.4n ± 1% -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3 166.4n ± 1% 150.2n ± 0% -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4 163.7n ± 0% 149.6n ± 1% -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard 182.8n ± 1% 167.1n ± 1% -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12 222.2n ± 0% 208.8n ± 0% -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16 197.6n ± 1% 181.7n ± 0% -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard 194.5n ± 0% 181.0n ± 0% -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard 205.1n ± 1% 191.9n ± 0% -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard 6.269µ ± 0% 6.643µ ± 0% +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1 211.7n ± 1% 197.0n ± 0% -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2 189.4n ± 0% 174.2n ± 0% -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3 169.0n ± 0% 154.9n ± 0% -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64 321.2n ± 0% 274.2n ± 1% -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64 307.4n ± 1% 261.2n ± 0% -15.03% (p=0.000 n=10)
AppendInt 3.367µ ± 1% 3.376µ ± 0% ~ (p=0.517 n=10)
AppendUint 675.5n ± 0% 676.9n ± 0% ~ (p=0.196 n=10)
AppendIntSmall 28.13n ± 1% 28.17n ± 0% +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1 20.70n ± 0% 20.51n ± 1% -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2 20.43n ± 0% 20.27n ± 0% -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3 38.48n ± 0% 37.93n ± 0% -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4 41.10n ± 0% 38.78n ± 1% -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5 42.25n ± 1% 42.11n ± 0% -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6 45.40n ± 1% 43.14n ± 0% -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7 46.81n ± 1% 46.03n ± 0% -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8 48.88n ± 1% 46.59n ± 1% -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9 49.94n ± 2% 49.41n ± 1% -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10 57.28n ± 1% 56.92n ± 1% -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11 60.09n ± 1% 58.11n ± 2% -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12 62.22n ± 0% 61.85n ± 0% -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13 64.94n ± 0% 62.92n ± 0% -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14 65.42n ± 1% 65.19n ± 1% -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15 68.17n ± 0% 66.13n ± 0% -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16 70.21n ± 1% 70.09n ± 1% ~ (p=0.517 n=10)
AppendUintVarlen/digits=17 72.93n ± 0% 70.49n ± 0% -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18 73.01n ± 0% 72.75n ± 0% -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19 79.27n ± 1% 79.49n ± 1% ~ (p=0.671 n=10)
AppendUintVarlen/digits=20 82.18n ± 0% 80.43n ± 1% -2.14% (p=0.000 n=10)
geomean 143.4n 136.0n -5.20%
Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
|
||
|
|
1e5bb416d8 |
cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.
Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:
(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)
For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:
(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)
The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.
The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.
This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.
pkg: cmd/compile/internal/test
benchmark \ host local linux-amd64 s7 linux-386 s7:GOARCH=386
vs base vs base vs base vs base vs base
DivconstI64 ~ ~ ~ -49.66% -21.02%
ModconstI64 ~ ~ ~ -13.45% +14.52%
DivisiblePow2constI64 ~ ~ ~ +0.97% -1.32%
DivisibleconstI64 ~ ~ ~ -20.01% -48.28%
DivisibleWDivconstI64 ~ ~ -1.76% -38.59% -42.74%
DivconstU64/3 ~ ~ ~ -13.82% -4.09%
DivconstU64/5 ~ ~ ~ -14.10% -3.54%
DivconstU64/37 -2.07% -4.45% ~ -19.60% -9.55%
DivconstU64/1234567 ~ ~ ~ -61.55% -56.93%
ModconstU64 ~ ~ ~ -6.25% ~
DivisibleconstU64 ~ ~ ~ -2.78% -7.82%
DivisibleWDivconstU64 ~ ~ ~ +4.23% +2.56%
pkg: math/bits
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Add ~ ~ ~ ~
Add32 +1.59% ~ ~ ~
Add64 ~ ~ ~ ~
Add64multiple ~ ~ ~ ~
Sub ~ ~ ~ ~
Sub32 ~ ~ ~ ~
Sub64 ~ ~ -9.20% ~
Sub64multiple ~ ~ ~ ~
Mul ~ ~ ~ ~
Mul32 ~ ~ ~ ~
Mul64 ~ ~ -41.58% -53.21%
Div ~ ~ ~ ~
Div32 ~ ~ ~ ~
Div64 ~ ~ ~ ~
pkg: strconv
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
ParseInt/Pos/7bit ~ ~ -11.08% -6.75%
ParseInt/Pos/26bit ~ ~ -13.65% -11.02%
ParseInt/Pos/31bit ~ ~ -14.65% -9.71%
ParseInt/Pos/56bit -1.80% ~ -17.97% -10.78%
ParseInt/Pos/63bit ~ ~ -13.85% -9.63%
ParseInt/Neg/7bit ~ ~ -12.14% -7.26%
ParseInt/Neg/26bit ~ ~ -14.18% -9.81%
ParseInt/Neg/31bit ~ ~ -14.51% -9.02%
ParseInt/Neg/56bit ~ ~ -15.79% -9.79%
ParseInt/Neg/63bit ~ ~ -15.68% -11.07%
AppendFloat/Decimal ~ ~ -7.25% -12.26%
AppendFloat/Float ~ ~ -15.96% -19.45%
AppendFloat/Exp ~ ~ -13.96% -17.76%
AppendFloat/NegExp ~ ~ -14.89% -20.27%
AppendFloat/LongExp ~ ~ -12.68% -17.97%
AppendFloat/Big ~ ~ -11.10% -16.64%
AppendFloat/BinaryExp ~ ~ ~ ~
AppendFloat/32Integer ~ ~ -10.05% -10.91%
AppendFloat/32ExactFraction ~ ~ -8.93% -13.00%
AppendFloat/32Point ~ ~ -10.36% -14.89%
AppendFloat/32Exp ~ ~ -9.88% -13.54%
AppendFloat/32NegExp ~ ~ -10.16% -14.26%
AppendFloat/32Shortest ~ ~ -11.39% -14.96%
AppendFloat/32Fixed8Hard ~ ~ ~ -2.31%
AppendFloat/32Fixed9Hard ~ ~ ~ -7.01%
AppendFloat/64Fixed1 ~ ~ -2.83% -8.23%
AppendFloat/64Fixed2 ~ ~ ~ -7.94%
AppendFloat/64Fixed3 ~ ~ -4.07% -7.22%
AppendFloat/64Fixed4 ~ ~ -7.24% -7.62%
AppendFloat/64Fixed12 ~ ~ -6.57% -4.82%
AppendFloat/64Fixed16 ~ ~ -4.00% -5.81%
AppendFloat/64Fixed12Hard -2.22% ~ -4.07% -6.35%
AppendFloat/64Fixed17Hard -2.12% ~ ~ -3.79%
AppendFloat/64Fixed18Hard -1.89% ~ +2.48% ~
AppendFloat/Slowpath64 -1.85% ~ -14.49% -18.21%
AppendFloat/SlowpathDenormal64 ~ ~ -13.08% -19.41%
pkg: crypto/internal/fips140/nistec/fiat
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
Mul/P224 ~ ~ -29.95% -39.60%
Mul/P384 ~ ~ -37.11% -63.33%
Mul/P521 ~ ~ -26.62% -12.42%
Square/P224 +1.46% ~ -40.62% -49.18%
Square/P384 ~ ~ -45.51% -69.68%
Square/P521 +90.37% ~ -25.26% -11.23%
(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)
pkg: crypto/internal/fips140/edwards25519
benchmark \ host s7 linux-amd64 linux-386 s7:GOARCH=386
vs base vs base vs base vs base
EncodingDecoding ~ ~ -34.67% -35.75%
ScalarBaseMult ~ ~ -31.25% -30.29%
ScalarMult ~ ~ -33.45% -32.54%
VarTimeDoubleScalarBaseMult ~ ~ -33.78% -33.68%
Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
|
||
|
|
9bbda7c99d |
cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite divisibility checks and div, mod, and mul. These happen after prove, so that prove can make better sense of the code for deriving bounds, and they must run before decompose, so that 64-bit ops can be lowered to 32-bit ops on 32-bit systems. And then they need another generic pass as well, to optimize the generated code before decomposing. The three opt passes are "opt", "middle opt", and "late opt". (Perhaps instead they should be "generic", "opt", and "late opt"?) The "late opt" pass repeats the "middle opt" work on any new code that has been generated in the interim. There will not be new divs or mods, but there may be new muls. The x%c==0 rewrite rules are much simpler now, since they can match before divs have been rewritten. This has the effect of applying them more consistently and making the rewrite rules independent of the exact div rewrites. Prove is also now charged with marking signed div/mod as unsigned when the arguments call for it, allowing simpler code to be emitted in various cases. For example, t.Seconds()/2 and len(x)/2 are now recognized as unsigned, meaning they compile to a simple shift (unsigned division), avoiding the more complex fixup we need for signed values. https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32 shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std' output before and after. "Proved Rsh64x64 shifts to zero" is replaced by the higher-level "Proved Div64 is unsigned" (the shift was in the signed expansion of div by constant), but otherwise prove is only finding more things to prove. One short example, in code that does x[i%len(x)]: < runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero --- > runtime/mfinal.go:131:34: Proved Div64 is unsigned > runtime/mfinal.go:131:38: Proved IsInBounds A longer example: < crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero < crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero --- > crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds > crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds These diffs are due to the smaller opt being better and taking work away from prove: < image/jpeg/dct.go:307:5: Proved IsInBounds < image/jpeg/dct.go:308:5: Proved IsInBounds ... < image/jpeg/dct.go:442:5: Proved IsInBounds In the old opt, Mul by 8 was rewritten to Lsh by 3 early. This CL delays that rule to help prove recognize mods, but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8]. Specifically, computing the length, opt can now do: (Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) -> (Add 8 (Sub (Mul 8 i) (Mul 8 i))) -> (Add 8 (Mul 8 (Sub i i))) -> (Add 8 (Mul 8 0)) -> (Add 8 0) -> 8 The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)), Leaving the multiply as Mul enables using that step; the old rewrite to Lsh blocked it, leaving prove to figure out the length and then remove the bounds checks. But now opt can evaluate the length down to a constant 8 and then constant-fold away the bounds checks 0 < 8, 1 < 8, and so on. After that, the compiler has nothing left to prove. Benchmarks are noisy in general; I checked the assembly for the many large increases below, and the vast majority are unchanged and presumably hitting the caches differently in some way. The divisibility optimizations were not reliably triggering before. This leads to a very large improvement in some cases, like DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems and DivisbleconstU64 on 32-bit systems. Another way the divisibility optimizations were unreliable before was incorrectly triggering for x/3, x%3 even though they are written not to do that. There is a real but small slowdown in the DivisibleWDivconst benchmarks on Mac because in the cases used in the benchmark, it is still faster (on Mac) to do the divisibility check than to remultiply. This may be worth further study. Perhaps when there is no rotate (meaning the divisor is odd), the divisibility optimization should be enabled always. In any event, this CL makes it possible to study that. benchmark \ host s7 linux-amd64 mac linux-arm64 linux-ppc64le linux-386 s7:GOARCH=386 linux-arm vs base vs base vs base vs base vs base vs base vs base vs base LoadAdd ~ ~ ~ ~ ~ -1.59% ~ ~ ExtShift ~ ~ -42.14% +0.10% ~ +1.44% +5.66% +8.50% Modify ~ ~ ~ ~ ~ ~ ~ -1.53% MullImm ~ ~ ~ ~ ~ +37.90% -21.87% +3.05% ConstModify ~ ~ ~ ~ -49.14% ~ ~ ~ BitSet ~ ~ ~ ~ -15.86% -14.57% +6.44% +0.06% BitClear ~ ~ ~ ~ ~ +1.78% +3.50% +0.06% BitToggle ~ ~ ~ ~ ~ -16.09% +2.91% ~ BitSetConst ~ ~ ~ ~ ~ ~ ~ -0.49% BitClearConst ~ ~ ~ ~ -28.29% ~ ~ -0.40% BitToggleConst ~ ~ ~ +8.89% -31.19% ~ ~ -0.77% MulNeg ~ ~ ~ ~ ~ ~ ~ ~ Mul2Neg ~ ~ -4.83% ~ ~ -13.75% -5.92% ~ DivconstI64 ~ ~ ~ ~ ~ -30.12% ~ +0.50% ModconstI64 ~ ~ -9.94% -4.63% ~ +3.15% ~ +5.32% DivisiblePow2constI64 -34.49% -12.58% ~ ~ -12.25% ~ ~ ~ DivisibleconstI64 -24.69% -25.06% -0.40% -2.27% -42.61% -3.31% ~ +1.63% DivisibleWDivconstI64 ~ ~ ~ ~ ~ -17.55% ~ -0.60% DivconstU64/3 ~ ~ ~ ~ ~ +1.51% ~ ~ DivconstU64/5 ~ ~ ~ ~ ~ ~ ~ ~ DivconstU64/37 ~ ~ -0.18% ~ ~ +2.70% ~ ~ DivconstU64/1234567 ~ ~ ~ ~ ~ ~ ~ +0.12% ModconstU64 ~ ~ ~ -0.24% ~ -5.10% -1.07% -1.56% DivisibleconstU64 ~ ~ ~ ~ ~ -29.01% -59.13% -50.72% DivisibleWDivconstU64 ~ ~ -12.18% -18.88% ~ -5.50% -3.91% +5.17% DivconstI32 ~ ~ -0.48% ~ -34.69% +89.01% -6.01% -16.67% ModconstI32 ~ +2.95% -0.33% ~ ~ -2.98% -5.40% -8.30% DivisiblePow2constI32 ~ ~ ~ ~ ~ ~ ~ -16.22% DivisibleconstI32 ~ ~ ~ ~ ~ -37.27% -47.75% -25.03% DivisibleWDivconstI32 -11.59% +5.22% -12.99% -23.83% ~ +45.95% -7.03% -10.01% DivconstU32 ~ ~ ~ ~ ~ +74.71% +4.81% ~ ModconstU32 ~ ~ +0.53% +0.18% ~ +51.16% ~ ~ DivisibleconstU32 ~ ~ ~ -0.62% ~ -4.25% ~ ~ DivisibleWDivconstU32 -2.77% +5.56% +11.12% -5.15% ~ +48.70% +25.11% -4.07% DivconstI16 -6.06% ~ -0.33% +0.22% ~ ~ -9.68% +5.47% ModconstI16 ~ ~ +4.44% +2.82% ~ ~ ~ +5.06% DivisiblePow2constI16 ~ ~ ~ ~ ~ ~ ~ -0.17% DivisibleconstI16 ~ ~ -0.23% ~ ~ ~ +4.60% +6.64% DivisibleWDivconstI16 -1.44% -0.43% +13.48% -5.76% ~ +1.62% -23.15% -9.06% DivconstU16 +1.61% ~ -0.35% -0.47% ~ ~ +15.59% ~ ModconstU16 ~ ~ ~ ~ ~ -0.72% ~ +14.23% DivisibleconstU16 ~ ~ -0.05% +3.00% ~ ~ ~ +5.06% DivisibleWDivconstU16 +52.10% +0.75% +17.28% +4.79% ~ -37.39% +5.28% -9.06% DivconstI8 ~ ~ -0.34% -0.96% ~ ~ -9.20% ~ ModconstI8 +2.29% ~ +4.38% +2.96% ~ ~ ~ ~ DivisiblePow2constI8 ~ ~ ~ ~ ~ ~ ~ ~ DivisibleconstI8 ~ ~ ~ ~ ~ ~ +6.04% ~ DivisibleWDivconstI8 -26.44% +1.69% +17.03% +4.05% ~ +32.48% -24.90% ~ DivconstU8 -4.50% +14.06% -0.28% ~ ~ ~ +4.16% +0.88% ModconstU8 ~ ~ +25.84% -0.64% ~ ~ ~ ~ DivisibleconstU8 ~ ~ -5.70% ~ ~ ~ ~ ~ DivisibleWDivconstU8 +49.55% +9.07% ~ +4.03% +53.87% -40.03% +39.72% -3.01% Mul2 ~ ~ ~ ~ ~ ~ ~ ~ MulNeg2 ~ ~ ~ ~ -11.73% ~ ~ -0.02% EfaceInteger ~ ~ ~ ~ ~ +18.11% ~ +2.53% TypeAssert +33.90% +2.86% ~ ~ ~ -1.07% -5.29% -1.04% Div64UnsignedSmall ~ ~ ~ ~ ~ ~ ~ ~ Div64Small ~ ~ ~ ~ ~ -0.88% ~ +2.39% Div64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +0.35% Div64SmallNegDividend ~ ~ ~ ~ ~ -0.84% ~ +3.57% Div64SmallNegBoth ~ ~ ~ ~ ~ -0.86% ~ +3.55% Div64Unsigned ~ ~ ~ ~ ~ ~ ~ -0.11% Div64 ~ ~ ~ ~ ~ ~ ~ +0.11% Div64NegDivisor ~ ~ ~ ~ ~ -1.29% ~ ~ Div64NegDividend ~ ~ ~ ~ ~ -1.44% ~ ~ Div64NegBoth ~ ~ ~ ~ ~ ~ ~ +0.28% Mod64UnsignedSmall ~ ~ ~ ~ ~ +0.48% ~ +0.93% Mod64Small ~ ~ ~ ~ ~ ~ ~ ~ Mod64SmallNegDivisor ~ ~ ~ ~ ~ ~ ~ +1.44% Mod64SmallNegDividend ~ ~ ~ ~ ~ +0.22% ~ +1.37% Mod64SmallNegBoth ~ ~ ~ ~ ~ ~ ~ -2.22% Mod64Unsigned ~ ~ ~ ~ ~ -0.95% ~ +0.11% Mod64 ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegDivisor ~ ~ ~ ~ ~ ~ ~ -0.02% Mod64NegDividend ~ ~ ~ ~ ~ ~ ~ ~ Mod64NegBoth ~ ~ ~ ~ ~ ~ ~ -0.02% MulconstI32/3 ~ ~ ~ -25.00% ~ ~ ~ +47.37% MulconstI32/5 ~ ~ ~ +33.28% ~ ~ ~ +32.21% MulconstI32/12 ~ ~ ~ -2.13% ~ ~ ~ -0.02% MulconstI32/120 ~ ~ ~ +2.93% ~ ~ ~ -0.03% MulconstI32/-120 ~ ~ ~ -2.17% ~ ~ ~ -0.03% MulconstI32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstI32/65538 ~ ~ ~ ~ ~ -33.38% ~ +0.04% MulconstI64/3 ~ ~ ~ +33.35% ~ -0.37% ~ -0.13% MulconstI64/5 ~ ~ ~ -25.00% ~ -0.34% ~ ~ MulconstI64/12 ~ ~ ~ +2.13% ~ +11.62% ~ +2.30% MulconstI64/120 ~ ~ ~ -1.98% ~ ~ ~ ~ MulconstI64/-120 ~ ~ ~ +0.75% ~ ~ ~ ~ MulconstI64/65537 ~ ~ ~ ~ ~ +5.61% ~ ~ MulconstI64/65538 ~ ~ ~ ~ ~ +5.25% ~ ~ MulconstU32/3 ~ +0.81% ~ +33.39% ~ +77.92% ~ -32.31% MulconstU32/5 ~ ~ ~ -24.97% ~ +77.92% ~ -24.47% MulconstU32/12 ~ ~ ~ +2.06% ~ ~ ~ +0.03% MulconstU32/120 ~ ~ ~ -2.74% ~ ~ ~ +0.03% MulconstU32/65537 ~ ~ ~ ~ ~ ~ ~ +0.03% MulconstU32/65538 ~ ~ ~ ~ ~ -33.42% ~ -0.03% MulconstU64/3 ~ ~ ~ +33.33% ~ -0.28% ~ +1.22% MulconstU64/5 ~ ~ ~ -25.00% ~ ~ ~ -0.64% MulconstU64/12 ~ ~ ~ +2.30% ~ +11.59% ~ +0.14% MulconstU64/120 ~ ~ ~ -2.82% ~ ~ ~ +0.04% MulconstU64/65537 ~ +0.37% ~ ~ ~ +5.58% ~ ~ MulconstU64/65538 ~ ~ ~ ~ ~ +5.16% ~ ~ ShiftArithmeticRight ~ ~ ~ ~ ~ -10.81% ~ +0.31% Switch8Predictable +14.69% ~ ~ ~ ~ -24.85% ~ ~ Switch8Unpredictable ~ -0.58% -3.80% ~ ~ -11.78% ~ -0.79% Switch32Predictable -10.33% +17.89% ~ ~ ~ +5.76% ~ ~ Switch32Unpredictable -3.15% +1.19% +9.42% ~ ~ -10.30% -5.09% +0.44% SwitchStringPredictable +70.88% +20.48% ~ ~ ~ +2.39% ~ +0.31% SwitchStringUnpredictable ~ +3.91% -5.06% -0.98% ~ +0.61% +2.03% ~ SwitchTypePredictable +146.58% -1.10% ~ -12.45% ~ -0.46% -3.81% ~ SwitchTypeUnpredictable +0.46% -0.83% ~ +4.18% ~ +0.43% ~ +0.62% SwitchInterfaceTypePredictable -13.41% -10.13% +11.03% ~ ~ -4.38% ~ +0.75% SwitchInterfaceTypeUnpredictable -6.37% -2.14% ~ -3.21% ~ -4.20% ~ +1.08% Fixes #63110. Fixes #75954. Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0 Reviewed-on: https://go-review.googlesource.com/c/go/+/714160 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Russ Cox <rsc@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> |