Commit graph

3 commits

Author SHA1 Message Date
Russ Cox
6e165b4d17 cmd/compile: implement Avg64u, Hmul64, Hmul64u for wasm
This lets us remove useAvg and useHmul from the division rules.
The compiler is simpler and the generated code is faster.

goos: wasip1
goarch: wasm
pkg: internal/strconv
                               │   old.txt   │               new.txt               │
                               │   sec/op    │   sec/op     vs base                │
AppendFloat/Decimal              192.8n ± 1%   194.6n ± 0%   +0.91% (p=0.000 n=10)
AppendFloat/Float                328.6n ± 0%   279.6n ± 0%  -14.93% (p=0.000 n=10)
AppendFloat/Exp                  335.6n ± 1%   289.2n ± 1%  -13.80% (p=0.000 n=10)
AppendFloat/NegExp               336.0n ± 0%   289.1n ± 1%  -13.97% (p=0.000 n=10)
AppendFloat/LongExp              332.4n ± 0%   285.2n ± 1%  -14.20% (p=0.000 n=10)
AppendFloat/Big                  348.2n ± 0%   300.1n ± 0%  -13.83% (p=0.000 n=10)
AppendFloat/BinaryExp            137.4n ± 0%   138.2n ± 0%   +0.55% (p=0.001 n=10)
AppendFloat/32Integer            193.3n ± 1%   196.5n ± 0%   +1.66% (p=0.000 n=10)
AppendFloat/32ExactFraction      283.3n ± 0%   268.9n ± 1%   -5.08% (p=0.000 n=10)
AppendFloat/32Point              279.9n ± 0%   266.5n ± 0%   -4.80% (p=0.000 n=10)
AppendFloat/32Exp                300.1n ± 0%   288.3n ± 1%   -3.90% (p=0.000 n=10)
AppendFloat/32NegExp             288.2n ± 1%   277.9n ± 1%   -3.59% (p=0.000 n=10)
AppendFloat/32Shortest           261.7n ± 0%   250.2n ± 0%   -4.39% (p=0.000 n=10)
AppendFloat/32Fixed8Hard         173.3n ± 1%   158.9n ± 1%   -8.31% (p=0.000 n=10)
AppendFloat/32Fixed9Hard         180.0n ± 0%   167.9n ± 2%   -6.70% (p=0.000 n=10)
AppendFloat/64Fixed1             167.1n ± 0%   149.6n ± 1%  -10.50% (p=0.000 n=10)
AppendFloat/64Fixed2             162.4n ± 1%   146.5n ± 0%   -9.73% (p=0.000 n=10)
AppendFloat/64Fixed2.5           165.5n ± 0%   149.4n ± 1%   -9.70% (p=0.000 n=10)
AppendFloat/64Fixed3             166.4n ± 1%   150.2n ± 0%   -9.74% (p=0.000 n=10)
AppendFloat/64Fixed4             163.7n ± 0%   149.6n ± 1%   -8.62% (p=0.000 n=10)
AppendFloat/64Fixed5Hard         182.8n ± 1%   167.1n ± 1%   -8.61% (p=0.000 n=10)
AppendFloat/64Fixed12            222.2n ± 0%   208.8n ± 0%   -6.05% (p=0.000 n=10)
AppendFloat/64Fixed16            197.6n ± 1%   181.7n ± 0%   -8.02% (p=0.000 n=10)
AppendFloat/64Fixed12Hard        194.5n ± 0%   181.0n ± 0%   -6.99% (p=0.000 n=10)
AppendFloat/64Fixed17Hard        205.1n ± 1%   191.9n ± 0%   -6.44% (p=0.000 n=10)
AppendFloat/64Fixed18Hard        6.269µ ± 0%   6.643µ ± 0%   +5.97% (p=0.000 n=10)
AppendFloat/64FixedF1            211.7n ± 1%   197.0n ± 0%   -6.95% (p=0.000 n=10)
AppendFloat/64FixedF2            189.4n ± 0%   174.2n ± 0%   -8.08% (p=0.000 n=10)
AppendFloat/64FixedF3            169.0n ± 0%   154.9n ± 0%   -8.32% (p=0.000 n=10)
AppendFloat/Slowpath64           321.2n ± 0%   274.2n ± 1%  -14.63% (p=0.000 n=10)
AppendFloat/SlowpathDenormal64   307.4n ± 1%   261.2n ± 0%  -15.03% (p=0.000 n=10)
AppendInt                        3.367µ ± 1%   3.376µ ± 0%        ~ (p=0.517 n=10)
AppendUint                       675.5n ± 0%   676.9n ± 0%        ~ (p=0.196 n=10)
AppendIntSmall                   28.13n ± 1%   28.17n ± 0%   +0.14% (p=0.015 n=10)
AppendUintVarlen/digits=1        20.70n ± 0%   20.51n ± 1%   -0.89% (p=0.018 n=10)
AppendUintVarlen/digits=2        20.43n ± 0%   20.27n ± 0%   -0.81% (p=0.001 n=10)
AppendUintVarlen/digits=3        38.48n ± 0%   37.93n ± 0%   -1.43% (p=0.000 n=10)
AppendUintVarlen/digits=4        41.10n ± 0%   38.78n ± 1%   -5.62% (p=0.000 n=10)
AppendUintVarlen/digits=5        42.25n ± 1%   42.11n ± 0%   -0.32% (p=0.041 n=10)
AppendUintVarlen/digits=6        45.40n ± 1%   43.14n ± 0%   -4.98% (p=0.000 n=10)
AppendUintVarlen/digits=7        46.81n ± 1%   46.03n ± 0%   -1.66% (p=0.000 n=10)
AppendUintVarlen/digits=8        48.88n ± 1%   46.59n ± 1%   -4.68% (p=0.000 n=10)
AppendUintVarlen/digits=9        49.94n ± 2%   49.41n ± 1%   -1.06% (p=0.000 n=10)
AppendUintVarlen/digits=10       57.28n ± 1%   56.92n ± 1%   -0.62% (p=0.045 n=10)
AppendUintVarlen/digits=11       60.09n ± 1%   58.11n ± 2%   -3.30% (p=0.000 n=10)
AppendUintVarlen/digits=12       62.22n ± 0%   61.85n ± 0%   -0.59% (p=0.000 n=10)
AppendUintVarlen/digits=13       64.94n ± 0%   62.92n ± 0%   -3.10% (p=0.000 n=10)
AppendUintVarlen/digits=14       65.42n ± 1%   65.19n ± 1%   -0.34% (p=0.005 n=10)
AppendUintVarlen/digits=15       68.17n ± 0%   66.13n ± 0%   -2.99% (p=0.000 n=10)
AppendUintVarlen/digits=16       70.21n ± 1%   70.09n ± 1%        ~ (p=0.517 n=10)
AppendUintVarlen/digits=17       72.93n ± 0%   70.49n ± 0%   -3.34% (p=0.000 n=10)
AppendUintVarlen/digits=18       73.01n ± 0%   72.75n ± 0%   -0.35% (p=0.000 n=10)
AppendUintVarlen/digits=19       79.27n ± 1%   79.49n ± 1%        ~ (p=0.671 n=10)
AppendUintVarlen/digits=20       82.18n ± 0%   80.43n ± 1%   -2.14% (p=0.000 n=10)
geomean                          143.4n        136.0n        -5.20%


Change-Id: I8245814a0259ad13cf9225f57db8e9fe3d2e4267
Reviewed-on: https://go-review.googlesource.com/c/go/+/717407
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2025-11-04 11:38:18 -08:00
Russ Cox
1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.

Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:

	(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
	(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
	(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)

For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:

	(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
	(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
	(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)

The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.

The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.

This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.

pkg: cmd/compile/internal/test

benchmark \ host                      local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                    vs base      vs base  vs base    vs base        vs base
DivconstI64                               ~            ~        ~    -49.66%        -21.02%
ModconstI64                               ~            ~        ~    -13.45%        +14.52%
DivisiblePow2constI64                     ~            ~        ~     +0.97%         -1.32%
DivisibleconstI64                         ~            ~        ~    -20.01%        -48.28%
DivisibleWDivconstI64                     ~            ~   -1.76%    -38.59%        -42.74%
DivconstU64/3                             ~            ~        ~    -13.82%         -4.09%
DivconstU64/5                             ~            ~        ~    -14.10%         -3.54%
DivconstU64/37                       -2.07%       -4.45%        ~    -19.60%         -9.55%
DivconstU64/1234567                       ~            ~        ~    -61.55%        -56.93%
ModconstU64                               ~            ~        ~     -6.25%              ~
DivisibleconstU64                         ~            ~        ~     -2.78%         -7.82%
DivisibleWDivconstU64                     ~            ~        ~     +4.23%         +2.56%

pkg: math/bits

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Add                       ~            ~          ~              ~
Add32                +1.59%            ~          ~              ~
Add64                     ~            ~          ~              ~
Add64multiple             ~            ~          ~              ~
Sub                       ~            ~          ~              ~
Sub32                     ~            ~          ~              ~
Sub64                     ~            ~     -9.20%              ~
Sub64multiple             ~            ~          ~              ~
Mul                       ~            ~          ~              ~
Mul32                     ~            ~          ~              ~
Mul64                     ~            ~    -41.58%        -53.21%
Div                       ~            ~          ~              ~
Div32                     ~            ~          ~              ~
Div64                     ~            ~          ~              ~

pkg: strconv

benchmark \ host                       s7  linux-amd64  linux-386  s7:GOARCH=386
                                  vs base      vs base    vs base        vs base
ParseInt/Pos/7bit                       ~            ~    -11.08%         -6.75%
ParseInt/Pos/26bit                      ~            ~    -13.65%        -11.02%
ParseInt/Pos/31bit                      ~            ~    -14.65%         -9.71%
ParseInt/Pos/56bit                 -1.80%            ~    -17.97%        -10.78%
ParseInt/Pos/63bit                      ~            ~    -13.85%         -9.63%
ParseInt/Neg/7bit                       ~            ~    -12.14%         -7.26%
ParseInt/Neg/26bit                      ~            ~    -14.18%         -9.81%
ParseInt/Neg/31bit                      ~            ~    -14.51%         -9.02%
ParseInt/Neg/56bit                      ~            ~    -15.79%         -9.79%
ParseInt/Neg/63bit                      ~            ~    -15.68%        -11.07%
AppendFloat/Decimal                     ~            ~     -7.25%        -12.26%
AppendFloat/Float                       ~            ~    -15.96%        -19.45%
AppendFloat/Exp                         ~            ~    -13.96%        -17.76%
AppendFloat/NegExp                      ~            ~    -14.89%        -20.27%
AppendFloat/LongExp                     ~            ~    -12.68%        -17.97%
AppendFloat/Big                         ~            ~    -11.10%        -16.64%
AppendFloat/BinaryExp                   ~            ~          ~              ~
AppendFloat/32Integer                   ~            ~    -10.05%        -10.91%
AppendFloat/32ExactFraction             ~            ~     -8.93%        -13.00%
AppendFloat/32Point                     ~            ~    -10.36%        -14.89%
AppendFloat/32Exp                       ~            ~     -9.88%        -13.54%
AppendFloat/32NegExp                    ~            ~    -10.16%        -14.26%
AppendFloat/32Shortest                  ~            ~    -11.39%        -14.96%
AppendFloat/32Fixed8Hard                ~            ~          ~         -2.31%
AppendFloat/32Fixed9Hard                ~            ~          ~         -7.01%
AppendFloat/64Fixed1                    ~            ~     -2.83%         -8.23%
AppendFloat/64Fixed2                    ~            ~          ~         -7.94%
AppendFloat/64Fixed3                    ~            ~     -4.07%         -7.22%
AppendFloat/64Fixed4                    ~            ~     -7.24%         -7.62%
AppendFloat/64Fixed12                   ~            ~     -6.57%         -4.82%
AppendFloat/64Fixed16                   ~            ~     -4.00%         -5.81%
AppendFloat/64Fixed12Hard          -2.22%            ~     -4.07%         -6.35%
AppendFloat/64Fixed17Hard          -2.12%            ~          ~         -3.79%
AppendFloat/64Fixed18Hard          -1.89%            ~     +2.48%              ~
AppendFloat/Slowpath64             -1.85%            ~    -14.49%        -18.21%
AppendFloat/SlowpathDenormal64          ~            ~    -13.08%        -19.41%

pkg: crypto/internal/fips140/nistec/fiat

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Mul/P224                  ~            ~    -29.95%        -39.60%
Mul/P384                  ~            ~    -37.11%        -63.33%
Mul/P521                  ~            ~    -26.62%        -12.42%
Square/P224          +1.46%            ~    -40.62%        -49.18%
Square/P384               ~            ~    -45.51%        -69.68%
Square/P521         +90.37%            ~    -25.26%        -11.23%

(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)

pkg: crypto/internal/fips140/edwards25519

benchmark \ host                    s7  linux-amd64  linux-386  s7:GOARCH=386
                               vs base      vs base    vs base        vs base
EncodingDecoding                     ~            ~    -34.67%        -35.75%
ScalarBaseMult                       ~            ~    -31.25%        -30.29%
ScalarMult                           ~            ~    -33.45%        -32.54%
VarTimeDoubleScalarBaseMult          ~            ~    -33.78%        -33.68%

Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-30 08:04:20 -07:00
Russ Cox
9bbda7c99d cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.

The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)

The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.

The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.

Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.

https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.

One short example, in code that does x[i%len(x)]:

< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds

A longer example:

< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds

These diffs are due to the smaller opt being better
and taking work away from prove:

< image/jpeg/dct.go:307:5: Proved IsInBounds
< image/jpeg/dct.go:308:5: Proved IsInBounds
...
< image/jpeg/dct.go:442:5: Proved IsInBounds

In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:

	(Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) ->
	(Add 8 (Sub (Mul 8 i) (Mul 8 i))) ->
	(Add 8 (Mul 8 (Sub i i))) ->
	(Add 8 (Mul 8 0)) ->
	(Add 8 0) ->
	8

The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.

Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.

The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.

Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.

benchmark \ host                          s7  linux-amd64      mac  linux-arm64  linux-ppc64le  linux-386  s7:GOARCH=386  linux-arm
                                     vs base      vs base  vs base      vs base        vs base    vs base        vs base    vs base
LoadAdd                                    ~            ~        ~            ~              ~     -1.59%              ~          ~
ExtShift                                   ~            ~  -42.14%       +0.10%              ~     +1.44%         +5.66%     +8.50%
Modify                                     ~            ~        ~            ~              ~          ~              ~     -1.53%
MullImm                                    ~            ~        ~            ~              ~    +37.90%        -21.87%     +3.05%
ConstModify                                ~            ~        ~            ~        -49.14%          ~              ~          ~
BitSet                                     ~            ~        ~            ~        -15.86%    -14.57%         +6.44%     +0.06%
BitClear                                   ~            ~        ~            ~              ~     +1.78%         +3.50%     +0.06%
BitToggle                                  ~            ~        ~            ~              ~    -16.09%         +2.91%          ~
BitSetConst                                ~            ~        ~            ~              ~          ~              ~     -0.49%
BitClearConst                              ~            ~        ~            ~        -28.29%          ~              ~     -0.40%
BitToggleConst                             ~            ~        ~       +8.89%        -31.19%          ~              ~     -0.77%
MulNeg                                     ~            ~        ~            ~              ~          ~              ~          ~
Mul2Neg                                    ~            ~   -4.83%            ~              ~    -13.75%         -5.92%          ~
DivconstI64                                ~            ~        ~            ~              ~    -30.12%              ~     +0.50%
ModconstI64                                ~            ~   -9.94%       -4.63%              ~     +3.15%              ~     +5.32%
DivisiblePow2constI64                -34.49%      -12.58%        ~            ~        -12.25%          ~              ~          ~
DivisibleconstI64                    -24.69%      -25.06%   -0.40%       -2.27%        -42.61%     -3.31%              ~     +1.63%
DivisibleWDivconstI64                      ~            ~        ~            ~              ~    -17.55%              ~     -0.60%
DivconstU64/3                              ~            ~        ~            ~              ~     +1.51%              ~          ~
DivconstU64/5                              ~            ~        ~            ~              ~          ~              ~          ~
DivconstU64/37                             ~            ~   -0.18%            ~              ~     +2.70%              ~          ~
DivconstU64/1234567                        ~            ~        ~            ~              ~          ~              ~     +0.12%
ModconstU64                                ~            ~        ~       -0.24%              ~     -5.10%         -1.07%     -1.56%
DivisibleconstU64                          ~            ~        ~            ~              ~    -29.01%        -59.13%    -50.72%
DivisibleWDivconstU64                      ~            ~  -12.18%      -18.88%              ~     -5.50%         -3.91%     +5.17%
DivconstI32                                ~            ~   -0.48%            ~        -34.69%    +89.01%         -6.01%    -16.67%
ModconstI32                                ~       +2.95%   -0.33%            ~              ~     -2.98%         -5.40%     -8.30%
DivisiblePow2constI32                      ~            ~        ~            ~              ~          ~              ~    -16.22%
DivisibleconstI32                          ~            ~        ~            ~              ~    -37.27%        -47.75%    -25.03%
DivisibleWDivconstI32                -11.59%       +5.22%  -12.99%      -23.83%              ~    +45.95%         -7.03%    -10.01%
DivconstU32                                ~            ~        ~            ~              ~    +74.71%         +4.81%          ~
ModconstU32                                ~            ~   +0.53%       +0.18%              ~    +51.16%              ~          ~
DivisibleconstU32                          ~            ~        ~       -0.62%              ~     -4.25%              ~          ~
DivisibleWDivconstU32                 -2.77%       +5.56%  +11.12%       -5.15%              ~    +48.70%        +25.11%     -4.07%
DivconstI16                           -6.06%            ~   -0.33%       +0.22%              ~          ~         -9.68%     +5.47%
ModconstI16                                ~            ~   +4.44%       +2.82%              ~          ~              ~     +5.06%
DivisiblePow2constI16                      ~            ~        ~            ~              ~          ~              ~     -0.17%
DivisibleconstI16                          ~            ~   -0.23%            ~              ~          ~         +4.60%     +6.64%
DivisibleWDivconstI16                 -1.44%       -0.43%  +13.48%       -5.76%              ~     +1.62%        -23.15%     -9.06%
DivconstU16                           +1.61%            ~   -0.35%       -0.47%              ~          ~        +15.59%          ~
ModconstU16                                ~            ~        ~            ~              ~     -0.72%              ~    +14.23%
DivisibleconstU16                          ~            ~   -0.05%       +3.00%              ~          ~              ~     +5.06%
DivisibleWDivconstU16                +52.10%       +0.75%  +17.28%       +4.79%              ~    -37.39%         +5.28%     -9.06%
DivconstI8                                 ~            ~   -0.34%       -0.96%              ~          ~         -9.20%          ~
ModconstI8                            +2.29%            ~   +4.38%       +2.96%              ~          ~              ~          ~
DivisiblePow2constI8                       ~            ~        ~            ~              ~          ~              ~          ~
DivisibleconstI8                           ~            ~        ~            ~              ~          ~         +6.04%          ~
DivisibleWDivconstI8                 -26.44%       +1.69%  +17.03%       +4.05%              ~    +32.48%        -24.90%          ~
DivconstU8                            -4.50%      +14.06%   -0.28%            ~              ~          ~         +4.16%     +0.88%
ModconstU8                                 ~            ~  +25.84%       -0.64%              ~          ~              ~          ~
DivisibleconstU8                           ~            ~   -5.70%            ~              ~          ~              ~          ~
DivisibleWDivconstU8                 +49.55%       +9.07%        ~       +4.03%        +53.87%    -40.03%        +39.72%     -3.01%
Mul2                                       ~            ~        ~            ~              ~          ~              ~          ~
MulNeg2                                    ~            ~        ~            ~        -11.73%          ~              ~     -0.02%
EfaceInteger                               ~            ~        ~            ~              ~    +18.11%              ~     +2.53%
TypeAssert                           +33.90%       +2.86%        ~            ~              ~     -1.07%         -5.29%     -1.04%
Div64UnsignedSmall                         ~            ~        ~            ~              ~          ~              ~          ~
Div64Small                                 ~            ~        ~            ~              ~     -0.88%              ~     +2.39%
Div64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +0.35%
Div64SmallNegDividend                      ~            ~        ~            ~              ~     -0.84%              ~     +3.57%
Div64SmallNegBoth                          ~            ~        ~            ~              ~     -0.86%              ~     +3.55%
Div64Unsigned                              ~            ~        ~            ~              ~          ~              ~     -0.11%
Div64                                      ~            ~        ~            ~              ~          ~              ~     +0.11%
Div64NegDivisor                            ~            ~        ~            ~              ~     -1.29%              ~          ~
Div64NegDividend                           ~            ~        ~            ~              ~     -1.44%              ~          ~
Div64NegBoth                               ~            ~        ~            ~              ~          ~              ~     +0.28%
Mod64UnsignedSmall                         ~            ~        ~            ~              ~     +0.48%              ~     +0.93%
Mod64Small                                 ~            ~        ~            ~              ~          ~              ~          ~
Mod64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +1.44%
Mod64SmallNegDividend                      ~            ~        ~            ~              ~     +0.22%              ~     +1.37%
Mod64SmallNegBoth                          ~            ~        ~            ~              ~          ~              ~     -2.22%
Mod64Unsigned                              ~            ~        ~            ~              ~     -0.95%              ~     +0.11%
Mod64                                      ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegDivisor                            ~            ~        ~            ~              ~          ~              ~     -0.02%
Mod64NegDividend                           ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegBoth                               ~            ~        ~            ~              ~          ~              ~     -0.02%
MulconstI32/3                              ~            ~        ~      -25.00%              ~          ~              ~    +47.37%
MulconstI32/5                              ~            ~        ~      +33.28%              ~          ~              ~    +32.21%
MulconstI32/12                             ~            ~        ~       -2.13%              ~          ~              ~     -0.02%
MulconstI32/120                            ~            ~        ~       +2.93%              ~          ~              ~     -0.03%
MulconstI32/-120                           ~            ~        ~       -2.17%              ~          ~              ~     -0.03%
MulconstI32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstI32/65538                          ~            ~        ~            ~              ~    -33.38%              ~     +0.04%
MulconstI64/3                              ~            ~        ~      +33.35%              ~     -0.37%              ~     -0.13%
MulconstI64/5                              ~            ~        ~      -25.00%              ~     -0.34%              ~          ~
MulconstI64/12                             ~            ~        ~       +2.13%              ~    +11.62%              ~     +2.30%
MulconstI64/120                            ~            ~        ~       -1.98%              ~          ~              ~          ~
MulconstI64/-120                           ~            ~        ~       +0.75%              ~          ~              ~          ~
MulconstI64/65537                          ~            ~        ~            ~              ~     +5.61%              ~          ~
MulconstI64/65538                          ~            ~        ~            ~              ~     +5.25%              ~          ~
MulconstU32/3                              ~       +0.81%        ~      +33.39%              ~    +77.92%              ~    -32.31%
MulconstU32/5                              ~            ~        ~      -24.97%              ~    +77.92%              ~    -24.47%
MulconstU32/12                             ~            ~        ~       +2.06%              ~          ~              ~     +0.03%
MulconstU32/120                            ~            ~        ~       -2.74%              ~          ~              ~     +0.03%
MulconstU32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstU32/65538                          ~            ~        ~            ~              ~    -33.42%              ~     -0.03%
MulconstU64/3                              ~            ~        ~      +33.33%              ~     -0.28%              ~     +1.22%
MulconstU64/5                              ~            ~        ~      -25.00%              ~          ~              ~     -0.64%
MulconstU64/12                             ~            ~        ~       +2.30%              ~    +11.59%              ~     +0.14%
MulconstU64/120                            ~            ~        ~       -2.82%              ~          ~              ~     +0.04%
MulconstU64/65537                          ~       +0.37%        ~            ~              ~     +5.58%              ~          ~
MulconstU64/65538                          ~            ~        ~            ~              ~     +5.16%              ~          ~
ShiftArithmeticRight                       ~            ~        ~            ~              ~    -10.81%              ~     +0.31%
Switch8Predictable                   +14.69%            ~        ~            ~              ~    -24.85%              ~          ~
Switch8Unpredictable                       ~       -0.58%   -3.80%            ~              ~    -11.78%              ~     -0.79%
Switch32Predictable                  -10.33%      +17.89%        ~            ~              ~     +5.76%              ~          ~
Switch32Unpredictable                 -3.15%       +1.19%   +9.42%            ~              ~    -10.30%         -5.09%     +0.44%
SwitchStringPredictable              +70.88%      +20.48%        ~            ~              ~     +2.39%              ~     +0.31%
SwitchStringUnpredictable                  ~       +3.91%   -5.06%       -0.98%              ~     +0.61%         +2.03%          ~
SwitchTypePredictable               +146.58%       -1.10%        ~      -12.45%              ~     -0.46%         -3.81%          ~
SwitchTypeUnpredictable               +0.46%       -0.83%        ~       +4.18%              ~     +0.43%              ~     +0.62%
SwitchInterfaceTypePredictable       -13.41%      -10.13%  +11.03%            ~              ~     -4.38%              ~     +0.75%
SwitchInterfaceTypeUnpredictable      -6.37%       -2.14%        ~       -3.21%              ~     -4.20%              ~     +1.08%

Fixes #63110.
Fixes #75954.

Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714160
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-29 18:49:40 -07:00