Commit graph

360 commits

Author SHA1 Message Date
Michael Munday
0a569528ea cmd/compile: optimize comparisons with single bit difference
Optimize comparisons with constants that only differ by 1 bit (i.e.
a power of 2). For example:

    x == 4 || x == 6 -> x|2 == 6
    x != 1 && x != 5 -> x|4 != 5

Change-Id: Ic61719e5118446d21cf15652d9da22f7d95b2a15
Reviewed-on: https://go-review.googlesource.com/c/go/+/719420
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-11-14 10:59:56 -08:00
Russ Cox
1e5bb416d8 cmd/compile: implement bits.Mul64 on 32-bit systems
This CL implements Mul64uhilo, Hmul64, Hmul64u, and Avg64u
on 32-bit systems, with the effect that constant division of both
int64s and uint64s can now be emitted directly in all cases,
and also that bits.Mul64 can be intrinsified on 32-bit systems.

Previously, constant division of uint64s by values 0 ≤ c ≤ 0xFFFF were
implemented as uint32 divisions by c and some fixup. After expanding
those smaller constant divisions, the code for i/999 required:

	(386) 7 mul, 10 add, 2 sub, 3 rotate, 3 shift (104 bytes)
	(arm) 7 mul, 9 add, 3 sub, 2 shift (104 bytes)
	(mips) 7 mul, 10 add, 5 sub, 6 shift, 3 sgtu (176 bytes)

For that much code, we might as well use a full 64x64->128 multiply
that can be used for all divisors, not just small ones.
Having done that, the same i/999 now generates:

	(386) 4 mul, 9 add, 2 sub, 2 or, 6 shift (112 bytes)
	(arm) 4 mul, 8 add, 2 sub, 2 or, 3 shift (92 bytes)
	(mips) 4 mul, 11 add, 3 sub, 6 shift, 8 sgtu, 4 or (196 bytes)

The size increase on 386 is due to a few extra register spills.
The size increase on mips is due to add-with-carry being hard.

The new approach is more general, letting us delete the old special case
and guarantee that all int64 and uint64 divisions by constants are
generated directly on 32-bit systems.

This especially speeds up code making heavy use of bits.Mul64 with
a constant argument, which happens in strconv and various crypto
packages. A few examples are benchmarked below.

pkg: cmd/compile/internal/test

benchmark \ host                      local  linux-amd64       s7  linux-386  s7:GOARCH=386
                                    vs base      vs base  vs base    vs base        vs base
DivconstI64                               ~            ~        ~    -49.66%        -21.02%
ModconstI64                               ~            ~        ~    -13.45%        +14.52%
DivisiblePow2constI64                     ~            ~        ~     +0.97%         -1.32%
DivisibleconstI64                         ~            ~        ~    -20.01%        -48.28%
DivisibleWDivconstI64                     ~            ~   -1.76%    -38.59%        -42.74%
DivconstU64/3                             ~            ~        ~    -13.82%         -4.09%
DivconstU64/5                             ~            ~        ~    -14.10%         -3.54%
DivconstU64/37                       -2.07%       -4.45%        ~    -19.60%         -9.55%
DivconstU64/1234567                       ~            ~        ~    -61.55%        -56.93%
ModconstU64                               ~            ~        ~     -6.25%              ~
DivisibleconstU64                         ~            ~        ~     -2.78%         -7.82%
DivisibleWDivconstU64                     ~            ~        ~     +4.23%         +2.56%

pkg: math/bits

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Add                       ~            ~          ~              ~
Add32                +1.59%            ~          ~              ~
Add64                     ~            ~          ~              ~
Add64multiple             ~            ~          ~              ~
Sub                       ~            ~          ~              ~
Sub32                     ~            ~          ~              ~
Sub64                     ~            ~     -9.20%              ~
Sub64multiple             ~            ~          ~              ~
Mul                       ~            ~          ~              ~
Mul32                     ~            ~          ~              ~
Mul64                     ~            ~    -41.58%        -53.21%
Div                       ~            ~          ~              ~
Div32                     ~            ~          ~              ~
Div64                     ~            ~          ~              ~

pkg: strconv

benchmark \ host                       s7  linux-amd64  linux-386  s7:GOARCH=386
                                  vs base      vs base    vs base        vs base
ParseInt/Pos/7bit                       ~            ~    -11.08%         -6.75%
ParseInt/Pos/26bit                      ~            ~    -13.65%        -11.02%
ParseInt/Pos/31bit                      ~            ~    -14.65%         -9.71%
ParseInt/Pos/56bit                 -1.80%            ~    -17.97%        -10.78%
ParseInt/Pos/63bit                      ~            ~    -13.85%         -9.63%
ParseInt/Neg/7bit                       ~            ~    -12.14%         -7.26%
ParseInt/Neg/26bit                      ~            ~    -14.18%         -9.81%
ParseInt/Neg/31bit                      ~            ~    -14.51%         -9.02%
ParseInt/Neg/56bit                      ~            ~    -15.79%         -9.79%
ParseInt/Neg/63bit                      ~            ~    -15.68%        -11.07%
AppendFloat/Decimal                     ~            ~     -7.25%        -12.26%
AppendFloat/Float                       ~            ~    -15.96%        -19.45%
AppendFloat/Exp                         ~            ~    -13.96%        -17.76%
AppendFloat/NegExp                      ~            ~    -14.89%        -20.27%
AppendFloat/LongExp                     ~            ~    -12.68%        -17.97%
AppendFloat/Big                         ~            ~    -11.10%        -16.64%
AppendFloat/BinaryExp                   ~            ~          ~              ~
AppendFloat/32Integer                   ~            ~    -10.05%        -10.91%
AppendFloat/32ExactFraction             ~            ~     -8.93%        -13.00%
AppendFloat/32Point                     ~            ~    -10.36%        -14.89%
AppendFloat/32Exp                       ~            ~     -9.88%        -13.54%
AppendFloat/32NegExp                    ~            ~    -10.16%        -14.26%
AppendFloat/32Shortest                  ~            ~    -11.39%        -14.96%
AppendFloat/32Fixed8Hard                ~            ~          ~         -2.31%
AppendFloat/32Fixed9Hard                ~            ~          ~         -7.01%
AppendFloat/64Fixed1                    ~            ~     -2.83%         -8.23%
AppendFloat/64Fixed2                    ~            ~          ~         -7.94%
AppendFloat/64Fixed3                    ~            ~     -4.07%         -7.22%
AppendFloat/64Fixed4                    ~            ~     -7.24%         -7.62%
AppendFloat/64Fixed12                   ~            ~     -6.57%         -4.82%
AppendFloat/64Fixed16                   ~            ~     -4.00%         -5.81%
AppendFloat/64Fixed12Hard          -2.22%            ~     -4.07%         -6.35%
AppendFloat/64Fixed17Hard          -2.12%            ~          ~         -3.79%
AppendFloat/64Fixed18Hard          -1.89%            ~     +2.48%              ~
AppendFloat/Slowpath64             -1.85%            ~    -14.49%        -18.21%
AppendFloat/SlowpathDenormal64          ~            ~    -13.08%        -19.41%

pkg: crypto/internal/fips140/nistec/fiat

benchmark \ host         s7  linux-amd64  linux-386  s7:GOARCH=386
                    vs base      vs base    vs base        vs base
Mul/P224                  ~            ~    -29.95%        -39.60%
Mul/P384                  ~            ~    -37.11%        -63.33%
Mul/P521                  ~            ~    -26.62%        -12.42%
Square/P224          +1.46%            ~    -40.62%        -49.18%
Square/P384               ~            ~    -45.51%        -69.68%
Square/P521         +90.37%            ~    -25.26%        -11.23%

(The +90% is a separate problem and not real; that much variation
can be seen on that system by running the same binary from two
different files.)

pkg: crypto/internal/fips140/edwards25519

benchmark \ host                    s7  linux-amd64  linux-386  s7:GOARCH=386
                               vs base      vs base    vs base        vs base
EncodingDecoding                     ~            ~    -34.67%        -35.75%
ScalarBaseMult                       ~            ~    -31.25%        -30.29%
ScalarMult                           ~            ~    -33.45%        -32.54%
VarTimeDoubleScalarBaseMult          ~            ~    -33.78%        -33.68%

Change-Id: Id3c91d42cd01def6731b755e99f8f40c6ad1bb65
Reviewed-on: https://go-review.googlesource.com/c/go/+/716061
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Russ Cox <rsc@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-30 08:04:20 -07:00
Russ Cox
9bbda7c99d cmd/compile: make prove understand div, mod better
This CL introduces new divisible and divmod passes that rewrite
divisibility checks and div, mod, and mul. These happen after
prove, so that prove can make better sense of the code for
deriving bounds, and they must run before decompose, so that
64-bit ops can be lowered to 32-bit ops on 32-bit systems.
And then they need another generic pass as well, to optimize
the generated code before decomposing.

The three opt passes are "opt", "middle opt", and "late opt".
(Perhaps instead they should be "generic", "opt", and "late opt"?)

The "late opt" pass repeats the "middle opt" work on any new code
that has been generated in the interim.
There will not be new divs or mods, but there may be new muls.

The x%c==0 rewrite rules are much simpler now, since they can
match before divs have been rewritten. This has the effect of
applying them more consistently and making the rewrite rules
independent of the exact div rewrites.

Prove is also now charged with marking signed div/mod as
unsigned when the arguments call for it, allowing simpler
code to be emitted in various cases. For example,
t.Seconds()/2 and len(x)/2 are now recognized as unsigned,
meaning they compile to a simple shift (unsigned division),
avoiding the more complex fixup we need for signed values.

https://gist.github.com/rsc/99d9d3bd99cde87b6a1a390e3d85aa32
shows a diff of 'go build -a -gcflags=-d=ssa/prove/debug=1 std'
output before and after. "Proved Rsh64x64 shifts to zero" is replaced
by the higher-level "Proved Div64 is unsigned" (the shift was in the
signed expansion of div by constant), but otherwise prove is only
finding more things to prove.

One short example, in code that does x[i%len(x)]:

< runtime/mfinal.go:131:34: Proved Rsh64x64 shifts to zero
---
> runtime/mfinal.go:131:34: Proved Div64 is unsigned
> runtime/mfinal.go:131:38: Proved IsInBounds

A longer example:

< crypto/internal/fips140/sha3/shake.go:28:30: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:38:27: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:53:46: Proved Rsh64x64 shifts to zero
< crypto/internal/fips140/sha3/shake.go:55:46: Proved Rsh64x64 shifts to zero
---
> crypto/internal/fips140/sha3/shake.go:28:30: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:28:30: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:38:27: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:45:7: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:46:4: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:53:46: Proved IsSliceInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved Div64 is unsigned
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsInBounds
> crypto/internal/fips140/sha3/shake.go:55:46: Proved IsSliceInBounds

These diffs are due to the smaller opt being better
and taking work away from prove:

< image/jpeg/dct.go:307:5: Proved IsInBounds
< image/jpeg/dct.go:308:5: Proved IsInBounds
...
< image/jpeg/dct.go:442:5: Proved IsInBounds

In the old opt, Mul by 8 was rewritten to Lsh by 3 early.
This CL delays that rule to help prove recognize mods,
but it also helps opt constant-fold the slice x[8*i:8*i+8:8*i+8].
Specifically, computing the length, opt can now do:

	(Sub64 (Add (Mul 8 i) 8) (Add (Mul 8 i) 8)) ->
	(Add 8 (Sub (Mul 8 i) (Mul 8 i))) ->
	(Add 8 (Mul 8 (Sub i i))) ->
	(Add 8 (Mul 8 0)) ->
	(Add 8 0) ->
	8

The key step is (Sub (Mul x y) (Mul x z)) -> (Mul x (Sub y z)),
Leaving the multiply as Mul enables using that step; the old
rewrite to Lsh blocked it, leaving prove to figure out the length
and then remove the bounds checks. But now opt can evaluate
the length down to a constant 8 and then constant-fold away
the bounds checks 0 < 8, 1 < 8, and so on. After that,
the compiler has nothing left to prove.

Benchmarks are noisy in general; I checked the assembly for the many
large increases below, and the vast majority are unchanged and
presumably hitting the caches differently in some way.

The divisibility optimizations were not reliably triggering before.
This leads to a very large improvement in some cases, like
DivisiblePow2constI64, DivisibleconstI64 on 64-bit systems
and DivisbleconstU64 on 32-bit systems.

Another way the divisibility optimizations were unreliable before
was incorrectly triggering for x/3, x%3 even though they are
written not to do that. There is a real but small slowdown
in the DivisibleWDivconst benchmarks on Mac because in the cases
used in the benchmark, it is still faster (on Mac) to do the
divisibility check than to remultiply.
This may be worth further study. Perhaps when there is no rotate
(meaning the divisor is odd), the divisibility optimization
should be enabled always. In any event, this CL makes it possible
to study that.

benchmark \ host                          s7  linux-amd64      mac  linux-arm64  linux-ppc64le  linux-386  s7:GOARCH=386  linux-arm
                                     vs base      vs base  vs base      vs base        vs base    vs base        vs base    vs base
LoadAdd                                    ~            ~        ~            ~              ~     -1.59%              ~          ~
ExtShift                                   ~            ~  -42.14%       +0.10%              ~     +1.44%         +5.66%     +8.50%
Modify                                     ~            ~        ~            ~              ~          ~              ~     -1.53%
MullImm                                    ~            ~        ~            ~              ~    +37.90%        -21.87%     +3.05%
ConstModify                                ~            ~        ~            ~        -49.14%          ~              ~          ~
BitSet                                     ~            ~        ~            ~        -15.86%    -14.57%         +6.44%     +0.06%
BitClear                                   ~            ~        ~            ~              ~     +1.78%         +3.50%     +0.06%
BitToggle                                  ~            ~        ~            ~              ~    -16.09%         +2.91%          ~
BitSetConst                                ~            ~        ~            ~              ~          ~              ~     -0.49%
BitClearConst                              ~            ~        ~            ~        -28.29%          ~              ~     -0.40%
BitToggleConst                             ~            ~        ~       +8.89%        -31.19%          ~              ~     -0.77%
MulNeg                                     ~            ~        ~            ~              ~          ~              ~          ~
Mul2Neg                                    ~            ~   -4.83%            ~              ~    -13.75%         -5.92%          ~
DivconstI64                                ~            ~        ~            ~              ~    -30.12%              ~     +0.50%
ModconstI64                                ~            ~   -9.94%       -4.63%              ~     +3.15%              ~     +5.32%
DivisiblePow2constI64                -34.49%      -12.58%        ~            ~        -12.25%          ~              ~          ~
DivisibleconstI64                    -24.69%      -25.06%   -0.40%       -2.27%        -42.61%     -3.31%              ~     +1.63%
DivisibleWDivconstI64                      ~            ~        ~            ~              ~    -17.55%              ~     -0.60%
DivconstU64/3                              ~            ~        ~            ~              ~     +1.51%              ~          ~
DivconstU64/5                              ~            ~        ~            ~              ~          ~              ~          ~
DivconstU64/37                             ~            ~   -0.18%            ~              ~     +2.70%              ~          ~
DivconstU64/1234567                        ~            ~        ~            ~              ~          ~              ~     +0.12%
ModconstU64                                ~            ~        ~       -0.24%              ~     -5.10%         -1.07%     -1.56%
DivisibleconstU64                          ~            ~        ~            ~              ~    -29.01%        -59.13%    -50.72%
DivisibleWDivconstU64                      ~            ~  -12.18%      -18.88%              ~     -5.50%         -3.91%     +5.17%
DivconstI32                                ~            ~   -0.48%            ~        -34.69%    +89.01%         -6.01%    -16.67%
ModconstI32                                ~       +2.95%   -0.33%            ~              ~     -2.98%         -5.40%     -8.30%
DivisiblePow2constI32                      ~            ~        ~            ~              ~          ~              ~    -16.22%
DivisibleconstI32                          ~            ~        ~            ~              ~    -37.27%        -47.75%    -25.03%
DivisibleWDivconstI32                -11.59%       +5.22%  -12.99%      -23.83%              ~    +45.95%         -7.03%    -10.01%
DivconstU32                                ~            ~        ~            ~              ~    +74.71%         +4.81%          ~
ModconstU32                                ~            ~   +0.53%       +0.18%              ~    +51.16%              ~          ~
DivisibleconstU32                          ~            ~        ~       -0.62%              ~     -4.25%              ~          ~
DivisibleWDivconstU32                 -2.77%       +5.56%  +11.12%       -5.15%              ~    +48.70%        +25.11%     -4.07%
DivconstI16                           -6.06%            ~   -0.33%       +0.22%              ~          ~         -9.68%     +5.47%
ModconstI16                                ~            ~   +4.44%       +2.82%              ~          ~              ~     +5.06%
DivisiblePow2constI16                      ~            ~        ~            ~              ~          ~              ~     -0.17%
DivisibleconstI16                          ~            ~   -0.23%            ~              ~          ~         +4.60%     +6.64%
DivisibleWDivconstI16                 -1.44%       -0.43%  +13.48%       -5.76%              ~     +1.62%        -23.15%     -9.06%
DivconstU16                           +1.61%            ~   -0.35%       -0.47%              ~          ~        +15.59%          ~
ModconstU16                                ~            ~        ~            ~              ~     -0.72%              ~    +14.23%
DivisibleconstU16                          ~            ~   -0.05%       +3.00%              ~          ~              ~     +5.06%
DivisibleWDivconstU16                +52.10%       +0.75%  +17.28%       +4.79%              ~    -37.39%         +5.28%     -9.06%
DivconstI8                                 ~            ~   -0.34%       -0.96%              ~          ~         -9.20%          ~
ModconstI8                            +2.29%            ~   +4.38%       +2.96%              ~          ~              ~          ~
DivisiblePow2constI8                       ~            ~        ~            ~              ~          ~              ~          ~
DivisibleconstI8                           ~            ~        ~            ~              ~          ~         +6.04%          ~
DivisibleWDivconstI8                 -26.44%       +1.69%  +17.03%       +4.05%              ~    +32.48%        -24.90%          ~
DivconstU8                            -4.50%      +14.06%   -0.28%            ~              ~          ~         +4.16%     +0.88%
ModconstU8                                 ~            ~  +25.84%       -0.64%              ~          ~              ~          ~
DivisibleconstU8                           ~            ~   -5.70%            ~              ~          ~              ~          ~
DivisibleWDivconstU8                 +49.55%       +9.07%        ~       +4.03%        +53.87%    -40.03%        +39.72%     -3.01%
Mul2                                       ~            ~        ~            ~              ~          ~              ~          ~
MulNeg2                                    ~            ~        ~            ~        -11.73%          ~              ~     -0.02%
EfaceInteger                               ~            ~        ~            ~              ~    +18.11%              ~     +2.53%
TypeAssert                           +33.90%       +2.86%        ~            ~              ~     -1.07%         -5.29%     -1.04%
Div64UnsignedSmall                         ~            ~        ~            ~              ~          ~              ~          ~
Div64Small                                 ~            ~        ~            ~              ~     -0.88%              ~     +2.39%
Div64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +0.35%
Div64SmallNegDividend                      ~            ~        ~            ~              ~     -0.84%              ~     +3.57%
Div64SmallNegBoth                          ~            ~        ~            ~              ~     -0.86%              ~     +3.55%
Div64Unsigned                              ~            ~        ~            ~              ~          ~              ~     -0.11%
Div64                                      ~            ~        ~            ~              ~          ~              ~     +0.11%
Div64NegDivisor                            ~            ~        ~            ~              ~     -1.29%              ~          ~
Div64NegDividend                           ~            ~        ~            ~              ~     -1.44%              ~          ~
Div64NegBoth                               ~            ~        ~            ~              ~          ~              ~     +0.28%
Mod64UnsignedSmall                         ~            ~        ~            ~              ~     +0.48%              ~     +0.93%
Mod64Small                                 ~            ~        ~            ~              ~          ~              ~          ~
Mod64SmallNegDivisor                       ~            ~        ~            ~              ~          ~              ~     +1.44%
Mod64SmallNegDividend                      ~            ~        ~            ~              ~     +0.22%              ~     +1.37%
Mod64SmallNegBoth                          ~            ~        ~            ~              ~          ~              ~     -2.22%
Mod64Unsigned                              ~            ~        ~            ~              ~     -0.95%              ~     +0.11%
Mod64                                      ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegDivisor                            ~            ~        ~            ~              ~          ~              ~     -0.02%
Mod64NegDividend                           ~            ~        ~            ~              ~          ~              ~          ~
Mod64NegBoth                               ~            ~        ~            ~              ~          ~              ~     -0.02%
MulconstI32/3                              ~            ~        ~      -25.00%              ~          ~              ~    +47.37%
MulconstI32/5                              ~            ~        ~      +33.28%              ~          ~              ~    +32.21%
MulconstI32/12                             ~            ~        ~       -2.13%              ~          ~              ~     -0.02%
MulconstI32/120                            ~            ~        ~       +2.93%              ~          ~              ~     -0.03%
MulconstI32/-120                           ~            ~        ~       -2.17%              ~          ~              ~     -0.03%
MulconstI32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstI32/65538                          ~            ~        ~            ~              ~    -33.38%              ~     +0.04%
MulconstI64/3                              ~            ~        ~      +33.35%              ~     -0.37%              ~     -0.13%
MulconstI64/5                              ~            ~        ~      -25.00%              ~     -0.34%              ~          ~
MulconstI64/12                             ~            ~        ~       +2.13%              ~    +11.62%              ~     +2.30%
MulconstI64/120                            ~            ~        ~       -1.98%              ~          ~              ~          ~
MulconstI64/-120                           ~            ~        ~       +0.75%              ~          ~              ~          ~
MulconstI64/65537                          ~            ~        ~            ~              ~     +5.61%              ~          ~
MulconstI64/65538                          ~            ~        ~            ~              ~     +5.25%              ~          ~
MulconstU32/3                              ~       +0.81%        ~      +33.39%              ~    +77.92%              ~    -32.31%
MulconstU32/5                              ~            ~        ~      -24.97%              ~    +77.92%              ~    -24.47%
MulconstU32/12                             ~            ~        ~       +2.06%              ~          ~              ~     +0.03%
MulconstU32/120                            ~            ~        ~       -2.74%              ~          ~              ~     +0.03%
MulconstU32/65537                          ~            ~        ~            ~              ~          ~              ~     +0.03%
MulconstU32/65538                          ~            ~        ~            ~              ~    -33.42%              ~     -0.03%
MulconstU64/3                              ~            ~        ~      +33.33%              ~     -0.28%              ~     +1.22%
MulconstU64/5                              ~            ~        ~      -25.00%              ~          ~              ~     -0.64%
MulconstU64/12                             ~            ~        ~       +2.30%              ~    +11.59%              ~     +0.14%
MulconstU64/120                            ~            ~        ~       -2.82%              ~          ~              ~     +0.04%
MulconstU64/65537                          ~       +0.37%        ~            ~              ~     +5.58%              ~          ~
MulconstU64/65538                          ~            ~        ~            ~              ~     +5.16%              ~          ~
ShiftArithmeticRight                       ~            ~        ~            ~              ~    -10.81%              ~     +0.31%
Switch8Predictable                   +14.69%            ~        ~            ~              ~    -24.85%              ~          ~
Switch8Unpredictable                       ~       -0.58%   -3.80%            ~              ~    -11.78%              ~     -0.79%
Switch32Predictable                  -10.33%      +17.89%        ~            ~              ~     +5.76%              ~          ~
Switch32Unpredictable                 -3.15%       +1.19%   +9.42%            ~              ~    -10.30%         -5.09%     +0.44%
SwitchStringPredictable              +70.88%      +20.48%        ~            ~              ~     +2.39%              ~     +0.31%
SwitchStringUnpredictable                  ~       +3.91%   -5.06%       -0.98%              ~     +0.61%         +2.03%          ~
SwitchTypePredictable               +146.58%       -1.10%        ~      -12.45%              ~     -0.46%         -3.81%          ~
SwitchTypeUnpredictable               +0.46%       -0.83%        ~       +4.18%              ~     +0.43%              ~     +0.62%
SwitchInterfaceTypePredictable       -13.41%      -10.13%  +11.03%            ~              ~     -4.38%              ~     +0.75%
SwitchInterfaceTypeUnpredictable      -6.37%       -2.14%        ~       -3.21%              ~     -4.20%              ~     +1.08%

Fixes #63110.
Fixes #75954.

Change-Id: I55a876f08c6c14f419ce1a8cbba2eaae6c6efbf0
Reviewed-on: https://go-review.googlesource.com/c/go/+/714160
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-29 18:49:40 -07:00
Keith Randall
ea50d61b66 cmd/compile: name change isDirect -> isDirectAndComparable
Now that it also restricts to comparable types. Followon to CL 713840.

Change-Id: Idd975c3fd16fb51f55360f2fa0b89ab0bf1d00ed
Reviewed-on: https://go-review.googlesource.com/c/go/+/715700
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mateusz Poliwczak <mpoliwczak34@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-10-28 11:12:30 -07:00
Jorropo
73d7635fae cmd/compile: add generic rules to remove bool → int → bool roundtrips
Change-Id: I8b0a3b64c89fe167d304f901a5d38470f35400ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/715200
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-27 23:24:54 -07:00
Jorropo
5453b788fd cmd/compile: optimize Add64carry with unused carries into plain Add64
Change-Id: I8a63f567cfc574bb066ad6269eec6929760cb9c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/656338
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-27 19:07:38 -07:00
Keith Randall
9b8742f2e7 cmd/compile: don't depend on arch-dependent conversions in the compiler
Leave those constant foldings for runtime, similar to how we do it
for NaN generation.

These are the only instances I could find in cmd/compile/..., using

objdump -d ../pkg/tool/darwin_arm64/compile| egrep "(fcvtz|>:)" | grep -B1 fcvt

(There are instances in other places, like runtime and reflect, but I don't
think those places would affect compiler output.)

Change-Id: I4113fe4570115e4765825cf442cb1fde97cf2f27
Reviewed-on: https://go-review.googlesource.com/c/go/+/711281
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-13 12:19:32 -07:00
Michael Matloob
19a30ea3f2 cmd/compile: call generated size-specialized malloc functions directly
This change creates calls to size-specialized malloc functions instead
of calls to newObject when we know the size of the allocation at
compilation time. Most of it is a matter of calling the newObject
function (which will create calls to the size-specialized functions)
rather then the newObjectNonSpecialized function (which won't). In the
newHeapaddr, small, non-pointer case, we'll create a non specialized
newObject and transform that into the appropriate size-specialized
function when we produce the mallocgc in flushPendingHeapAllocations.

We have to update some of the rewrites in generic.rules to also apply to
the size-specialized functions when they apply to newObject.

The messiest thing is we have to adjust the offset we use to save the
memory profiler stack, because the depth of the call to profilealloc is
two frames fewer in the size-specialized malloc functions compared to
when newObject calls mallocgc. A bunch of tests have been adjusted to
account for that.

Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/707856
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-09 14:59:40 -07:00
Michael Munday
97fd6bdecc cmd/compile: fuse NaN checks with other comparisons
NaN checks can often be merged into other comparisons by inverting them.
For example, `math.IsNaN(x) || x > 0` is equivalent to `!(x <= 0)`.

goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
            │         sec/op         │    sec/op     vs base                │
Acos                     4.315n ± 0%    4.314n ± 0%        ~ (p=0.642 n=10)
Acosh                    8.398n ± 0%    7.779n ± 0%   -7.37% (p=0.000 n=10)
Asin                     4.203n ± 0%    4.211n ± 0%   +0.20% (p=0.001 n=10)
Asinh                   10.150n ± 0%    9.562n ± 0%   -5.79% (p=0.000 n=10)
Atan                     2.363n ± 0%    2.363n ± 0%        ~ (p=0.801 n=10)
Atanh                    8.192n ± 2%    7.685n ± 0%   -6.20% (p=0.000 n=10)
Atan2                    4.013n ± 0%    4.010n ± 0%        ~ (p=0.073 n=10)
Cbrt                     4.858n ± 0%    4.755n ± 0%   -2.12% (p=0.000 n=10)
Cos                      4.596n ± 0%    4.357n ± 0%   -5.20% (p=0.000 n=10)
Cosh                     5.071n ± 0%    5.071n ± 0%        ~ (p=0.585 n=10)
Erf                      2.802n ± 1%    2.788n ± 0%   -0.54% (p=0.002 n=10)
Erfc                     3.087n ± 1%    3.071n ± 0%        ~ (p=0.320 n=10)
Erfinv                   3.981n ± 0%    3.965n ± 0%   -0.41% (p=0.000 n=10)
Erfcinv                  3.985n ± 0%    3.977n ± 0%   -0.20% (p=0.000 n=10)
ExpGo                    8.721n ± 2%    8.252n ± 0%   -5.38% (p=0.000 n=10)
Expm1                    4.378n ± 0%    4.228n ± 0%   -3.43% (p=0.000 n=10)
Exp2                     8.313n ± 0%    7.855n ± 0%   -5.52% (p=0.000 n=10)
Exp2Go                   8.498n ± 2%    7.921n ± 0%   -6.79% (p=0.000 n=10)
Mod                      15.16n ± 4%    12.20n ± 1%  -19.58% (p=0.000 n=10)
Frexp                    1.780n ± 2%    1.496n ± 0%  -15.96% (p=0.000 n=10)
Gamma                    4.378n ± 1%    4.013n ± 0%   -8.35% (p=0.000 n=10)
HypotGo                  2.655n ± 5%    2.427n ± 1%   -8.57% (p=0.000 n=10)
Ilogb                    1.912n ± 5%    1.749n ± 0%   -8.53% (p=0.000 n=10)
J0                       22.43n ± 9%    20.46n ± 0%   -8.76% (p=0.000 n=10)
J1                       21.03n ± 4%    19.96n ± 0%   -5.09% (p=0.000 n=10)
Jn                       45.40n ± 1%    42.59n ± 0%   -6.20% (p=0.000 n=10)
Ldexp                    2.312n ± 1%    1.944n ± 0%  -15.94% (p=0.000 n=10)
Lgamma                   4.617n ± 1%    4.584n ± 0%   -0.73% (p=0.000 n=10)
Log                      4.226n ± 0%    4.213n ± 0%   -0.31% (p=0.001 n=10)
Logb                     1.771n ± 0%    1.775n ± 0%        ~ (p=0.097 n=10)
Log1p                    5.102n ± 2%    5.001n ± 0%   -1.97% (p=0.000 n=10)
Log10                    4.407n ± 0%    4.408n ± 0%        ~ (p=1.000 n=10)
Log2                     2.416n ± 1%    2.138n ± 0%  -11.51% (p=0.000 n=10)
Modf                     1.669n ± 2%    1.611n ± 0%   -3.50% (p=0.000 n=10)
Nextafter32              2.186n ± 0%    2.185n ± 0%        ~ (p=0.051 n=10)
Nextafter64              2.182n ± 0%    2.184n ± 0%   +0.09% (p=0.016 n=10)
PowInt                   11.39n ± 6%    10.68n ± 2%   -6.24% (p=0.000 n=10)
PowFrac                  26.60n ± 2%    26.12n ± 0%   -1.80% (p=0.000 n=10)
Pow10Pos                0.5067n ± 4%   0.5003n ± 1%   -1.27% (p=0.001 n=10)
Pow10Neg                0.8552n ± 0%   0.8552n ± 0%        ~ (p=0.928 n=10)
Round                    1.181n ± 0%    1.182n ± 0%   +0.08% (p=0.001 n=10)
RoundToEven              1.709n ± 0%    1.710n ± 0%        ~ (p=0.053 n=10)
Remainder                12.54n ± 5%    11.99n ± 2%   -4.46% (p=0.000 n=10)
Sin                      3.933n ± 5%    3.926n ± 0%   -0.17% (p=0.000 n=10)
Sincos                   5.672n ± 0%    5.522n ± 0%   -2.65% (p=0.000 n=10)
Sinh                     5.447n ± 1%    5.444n ± 0%   -0.06% (p=0.029 n=10)
Tan                      4.061n ± 0%    4.058n ± 0%   -0.07% (p=0.005 n=10)
Tanh                     5.599n ± 0%    5.595n ± 0%   -0.06% (p=0.042 n=10)
Y0                       20.75n ± 5%    19.73n ± 1%   -4.92% (p=0.000 n=10)
Y1                       20.87n ± 2%    19.78n ± 1%   -5.20% (p=0.000 n=10)
Yn                       44.50n ± 2%    42.04n ± 2%   -5.53% (p=0.000 n=10)
geomean                  4.989n         4.791n        -3.96%

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │     sec/op     │    sec/op     vs base                  │
Acos                    159.9n ±  0%   159.9n ±  0%        ~ (p=0.269 n=10)
Acosh                   244.7n ±  0%   235.0n ±  0%   -3.98% (p=0.000 n=10)
Asin                    159.9n ±  0%   159.9n ±  0%        ~ (p=0.154 n=10)
Asinh                   270.8n ±  0%   261.1n ±  0%   -3.60% (p=0.000 n=10)
Atan                    119.1n ±  0%   119.1n ±  0%        ~ (p=0.347 n=10)
Atanh                   260.2n ±  0%   261.8n ±  4%        ~ (p=0.459 n=10)
Atan2                   186.8n ±  0%   186.8n ±  0%        ~ (p=0.487 n=10)
Cbrt                    203.5n ±  0%   198.2n ±  0%   -2.60% (p=0.000 n=10)
Ceil                    31.82n ±  0%   31.81n ±  0%        ~ (p=0.714 n=10)
Copysign                4.894n ±  0%   4.893n ±  0%        ~ (p=0.161 n=10)
Cos                     107.6n ±  0%   103.6n ±  0%   -3.76% (p=0.000 n=10)
Cosh                    259.0n ±  0%   252.8n ±  0%   -2.39% (p=0.000 n=10)
Erf                     133.7n ±  0%   133.7n ±  0%        ~ (p=0.720 n=10)
Erfc                    137.9n ±  0%   137.8n ±  0%   -0.04% (p=0.033 n=10)
Erfinv                  173.7n ±  0%   168.8n ±  0%   -2.82% (p=0.000 n=10)
Erfcinv                 173.7n ±  0%   168.8n ±  0%   -2.82% (p=0.000 n=10)
Exp                     215.3n ±  0%   208.1n ±  0%   -3.34% (p=0.000 n=10)
ExpGo                   226.7n ±  0%   220.6n ±  0%   -2.69% (p=0.000 n=10)
Expm1                   164.8n ±  0%   159.0n ±  0%   -3.52% (p=0.000 n=10)
Exp2                    185.0n ±  0%   182.7n ±  0%   -1.22% (p=0.000 n=10)
Exp2Go                  198.9n ±  0%   196.5n ±  0%   -1.21% (p=0.000 n=10)
Abs                     4.894n ±  0%   4.893n ±  0%        ~ (p=0.262 n=10)
Dim                     16.31n ±  0%   16.31n ±  0%        ~ (p=1.000 n=10)
Floor                   31.81n ±  0%   31.81n ±  0%        ~ (p=0.067 n=10)
Max                     26.11n ±  0%   26.10n ±  0%        ~ (p=0.080 n=10)
Min                     26.10n ±  0%   26.10n ±  0%        ~ (p=0.095 n=10)
Mod                     337.7n ±  0%   291.9n ±  0%  -13.56% (p=0.000 n=10)
Frexp                   50.57n ±  0%   42.41n ±  0%  -16.13% (p=0.000 n=10)
Gamma                   206.3n ±  0%   198.1n ±  0%   -4.00% (p=0.000 n=10)
Hypot                   94.62n ±  0%   94.61n ±  0%        ~ (p=0.437 n=10)
HypotGo                 109.3n ±  0%   109.3n ±  0%        ~ (p=1.000 n=10)
Ilogb                   44.05n ±  0%   44.04n ±  0%   -0.02% (p=0.025 n=10)
J0                      663.1n ±  0%   663.9n ±  0%   +0.13% (p=0.002 n=10)
J1                      663.9n ±  0%   666.4n ±  0%   +0.38% (p=0.000 n=10)
Jn                      1.404µ ±  0%   1.407µ ±  0%   +0.21% (p=0.000 n=10)
Ldexp                   57.10n ±  0%   48.93n ±  0%  -14.30% (p=0.000 n=10)
Lgamma                  185.1n ±  0%   187.6n ±  0%   +1.32% (p=0.000 n=10)
Log                     182.7n ±  0%   170.1n ±  0%   -6.87% (p=0.000 n=10)
Logb                    46.49n ±  0%   46.49n ±  0%        ~ (p=0.675 n=10)
Log1p                   184.3n ±  0%   179.4n ±  0%   -2.63% (p=0.000 n=10)
Log10                   184.3n ±  0%   171.2n ±  0%   -7.08% (p=0.000 n=10)
Log2                    66.05n ±  0%   57.90n ±  0%  -12.34% (p=0.000 n=10)
Modf                    34.25n ±  0%   34.24n ±  0%        ~ (p=0.163 n=10)
Nextafter32             49.33n ±  1%   48.93n ±  0%   -0.81% (p=0.002 n=10)
Nextafter64             43.64n ±  0%   43.23n ±  0%   -0.93% (p=0.000 n=10)
PowInt                  267.6n ±  0%   251.2n ±  0%   -6.11% (p=0.000 n=10)
PowFrac                 672.9n ±  0%   637.9n ±  0%   -5.19% (p=0.000 n=10)
Pow10Pos                13.87n ±  0%   13.87n ±  0%        ~ (p=1.000 n=10)
Pow10Neg                19.58n ± 62%   19.59n ± 62%        ~ (p=0.355 n=10)
Round                   23.65n ±  0%   23.65n ±  0%        ~ (p=1.000 n=10)
RoundToEven             27.73n ±  0%   27.73n ±  0%        ~ (p=0.635 n=10)
Remainder               309.9n ±  0%   280.5n ±  0%   -9.49% (p=0.000 n=10)
Signbit                 13.05n ±  0%   13.05n ±  0%        ~ (p=1.000 n=10) ¹
Sin                     120.7n ±  0%   120.7n ±  0%        ~ (p=1.000 n=10) ¹
Sincos                  148.4n ±  0%   143.5n ±  0%   -3.30% (p=0.000 n=10)
Sinh                    275.6n ±  0%   267.5n ±  0%   -2.94% (p=0.000 n=10)
SqrtIndirect            3.262n ±  0%   3.262n ±  0%        ~ (p=0.263 n=10)
SqrtLatency             19.57n ±  0%   19.57n ±  0%        ~ (p=0.582 n=10)
SqrtIndirectLatency     19.57n ±  0%   19.57n ±  0%        ~ (p=1.000 n=10)
SqrtGoLatency           203.2n ±  0%   197.6n ±  0%   -2.78% (p=0.000 n=10)
SqrtPrime               4.952µ ±  0%   4.952µ ±  0%   -0.01% (p=0.025 n=10)
Tan                     153.3n ±  0%   153.3n ±  0%        ~ (p=1.000 n=10)
Tanh                    280.5n ±  0%   272.4n ±  0%   -2.91% (p=0.000 n=10)
Trunc                   31.81n ±  0%   31.81n ±  0%        ~ (p=1.000 n=10)
Y0                      680.1n ±  0%   664.8n ±  0%   -2.25% (p=0.000 n=10)
Y1                      684.2n ±  0%   669.6n ±  0%   -2.14% (p=0.000 n=10)
Yn                      1.444µ ±  0%   1.410µ ±  0%   -2.35% (p=0.000 n=10)
Float64bits             5.709n ±  0%   5.708n ±  0%        ~ (p=0.573 n=10)
Float64frombits         4.893n ±  0%   4.893n ±  0%        ~ (p=0.734 n=10)
Float32bits             12.23n ±  0%   12.23n ±  0%        ~ (p=0.628 n=10)
Float32frombits         4.893n ±  0%   4.893n ±  0%        ~ (p=0.971 n=10)
FMA                     4.893n ±  0%   4.893n ±  0%        ~ (p=0.736 n=10)
geomean                 88.96n         87.05n         -2.15%
¹ all samples are equal

Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697360
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
2025-10-08 08:11:17 -07:00
David Chase
ec70d19023 cmd/compile: rewrite to elide Slicemask from len==c>0 slicing
This might have been something that prove could be educated
into figuring out, but this also works, and it also helps
prove downstream.

Adjusted the prove test, because this change moved a message.

Cherry-picked from the dev.simd branch. This CL is not
necessarily SIMD specific. Apply early to reduce risk.

Change-Id: I5eabe639eff5db9cd9766a6a8666fdb4973829cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/697715
Commit-Queue: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: David Chase <drchase@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/708858
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
2025-10-03 12:30:21 -07:00
Jake Bailey
97da068774 cmd/compile: eliminate nil checks on .dict arg
The first arg of a generic function is the dictionary. This dictionary
is never nil, but it gets a nil check becuase the dict arg is treated as
a slice during construction.

cmp.Compare[go.shape.int] was:

00006 (+41) TESTB AX, (AX)
00007 (+52) CMPQ CX, BX
00008 (52) JGT 14
00009 (+55) JGE 12
00010 (+56) MOVL $1, AX
00011 (56) RET
00012 (+58) XORL AX, AX
00013 (58) RET
00014 (+53) MOVQ $-1, AX
00015 (53) RET

Note how the function begins with a TESTB that loads the dict to perform
the nil check.

This CL eliminates that nil check.

For most generic functions, this doesn't matter too much, but not
infrequently are generic functions written which never actually use the
dictionary (like cmp.Compare), so I suspect this might help in hot code
to avoid repeatedly touching the dictionary in memory, and in cases
where the generic function is not inlined (and thus the dict dropped).

compilecmp shows these changes (deduped):

cmp.Compare[go.shape.float64] 73 -> 72  (-1.37%)
cmp.Compare[go.shape.int] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.int32] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.int64] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.string] 142 -> 141  (-0.70%)
cmp.Compare[go.shape.uint16] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint32] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.uint64] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint8] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.uintptr] 26 -> 24  (-7.69%)
cmp.Less[go.shape.float64] 35 -> 34  (-2.86%)
cmp.Less[go.shape.int32] 8 -> 6  (-25.00%)
cmp.Less[go.shape.int64] 9 -> 7  (-22.22%)
cmp.Less[go.shape.int] 9 -> 7  (-22.22%)
cmp.Less[go.shape.string] 112 -> 110  (-1.79%)
cmp.Less[go.shape.uint16] 9 -> 7  (-22.22%)
cmp.Less[go.shape.uint32] 8 -> 6  (-25.00%)
cmp.Less[go.shape.uint64] 9 -> 7  (-22.22%)
internal/synctest.Associate[go.shape.struct 114 -> 113  (-0.88%)
internal/trace.(*dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791  (-1.74%)
internal/trace.(*dataTable[go.shape.uint64,go.shape.struct 858 -> 852  (-0.70%)
main.(*gState[go.shape.int64]).stop 2111 -> 2085  (-1.23%)
main.(*gState[go.shape.int64]).unblock 941 -> 923  (-1.91%)
runtime.fmax[go.shape.float32] 85 -> 83  (-2.35%)
runtime.fmax[go.shape.float64] 89 -> 87  (-2.25%)
runtime.fmin[go.shape.float32] 85 -> 83  (-2.35%)
runtime.fmin[go.shape.float64] 89 -> 87  (-2.25%)
slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337  (-2.60%)
slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453  (-1.95%)
slices.ContainsFunc[go.shape.[]*cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]*debug/dwarf.StructField,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]*go/ast.Field,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181  (-2.69%)
slices.Contains[go.shape.[]*cmd/compile/internal/syntax.BranchStmt,go.shape.*cmd/compile/internal/syntax.BranchStmt] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]*go/ast.BranchStmt,go.shape.*go/ast.BranchStmt] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42  (-4.55%)
slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170  (-1.60%)
slices.medianCmpFunc[go.shape.struct 1118 -> 1113  (-0.45%)
slices.medianCmpFunc[go.shape.struct 1214 -> 1209  (-0.41%)
slices.medianCmpFunc[go.shape.struct 889 -> 887  (-0.22%)
slices.medianCmpFunc[go.shape.struct 901 -> 874  (-3.00%)
slices.order2Ordered[go.shape.float64] 89 -> 87  (-2.25%)
slices.order2Ordered[go.shape.uint16] 75 -> 70  (-6.67%)
slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110  (-0.45%)
slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352  (-1.68%)
slices.partitionEqualOrdered[go.shape.int] 208 -> 203  (-2.40%)
slices.partitionEqualOrdered[go.shape.int32] 208 -> 198  (-4.81%)
slices.partitionEqualOrdered[go.shape.int64] 208 -> 203  (-2.40%)
slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198  (-4.81%)
slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203  (-2.40%)
slices.partitionOrdered[go.shape.float64] 538 -> 533  (-0.93%)
slices.partitionOrdered[go.shape.int] 437 -> 427  (-2.29%)
slices.partitionOrdered[go.shape.int64] 437 -> 427  (-2.29%)
slices.partitionOrdered[go.shape.uint16] 447 -> 442  (-1.12%)
slices.partitionOrdered[go.shape.uint64] 437 -> 427  (-2.29%)
slices.rotateCmpFunc[go.shape.struct 1045 -> 1029  (-1.53%)
slices.rotateCmpFunc[go.shape.struct 1205 -> 1163  (-3.49%)
slices.rotateCmpFunc[go.shape.struct 1226 -> 1176  (-4.08%)
slices.rotateCmpFunc[go.shape.struct 1322 -> 1272  (-3.78%)
slices.rotateCmpFunc[go.shape.struct 1419 -> 1400  (-1.34%)
slices.rotateCmpFunc[go.shape.*uint8] 549 -> 538  (-2.00%)
slices.rotateLeft[go.shape.string] 603 -> 588  (-2.49%)
slices.rotateLeft[go.shape.uint8] 255 -> 250  (-1.96%)
slices.siftDownOrdered[go.shape.int] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.int32] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.int64] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.string] 614 -> 592  (-3.58%)
slices.siftDownOrdered[go.shape.uint32] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.uint64] 181 -> 171  (-5.52%)
time.parseRFC3339[go.shape.string] 1774 -> 1758  (-0.90%)
unique.(*canonMap[go.shape.struct 280 -> 276  (-1.43%)
unique.clone[go.shape.struct 311 -> 293  (-5.79%)
weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134  (-1.47%)
weak.Make[go.shape.struct 136 -> 134  (-1.47%)
weak.Make[go.shape.uint8] 136 -> 134  (-1.47%)

Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7
Reviewed-on: https://go-review.googlesource.com/c/go/+/706655
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-30 11:22:35 -07:00
Youlin Feng
a5fa5ea51c cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7}
This CL slightly speeds up strings.HasPrefix when testing constant
prefixes of length {3,5,6,7}.

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                │      old     │                 new                 │
                │    sec/op    │   sec/op     vs base                │
StringPrefix3-8   11.125n ± 2%   8.539n ± 1%  -23.25% (p=0.000 n=20)
StringPrefix5-8   11.170n ± 2%   8.700n ± 1%  -22.11% (p=0.000 n=20)
StringPrefix6-8   11.190n ± 2%   8.655n ± 1%  -22.65% (p=0.000 n=20)
StringPrefix7-8   11.095n ± 1%   8.878n ± 1%  -19.98% (p=0.000 n=20)

Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/700816
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-09 12:10:07 -07:00
Jake Bailey
b915e14490 cmd/compile: consolidate logic for rewriting fixed loads
Many CLs have worked with this bit of code, extending the cases more and
more for various fixed addresses and constants. But, I find that it's
getting duplicitive, and I don't find the current setup very clear that
something like isFixed32 _only_ works for a specific element within the
type data.

This CL rewrites these rules (pun unintended) into a single set of
rewrite rules with shared logic, which stops hardcoding offsets and type
compatibility checks.

This should open the door to optimizing further type:... field loads, of
which most can be done entirely statically but are not yet today outside
Hash and Elem.

Passes toolstash -cmp.

Change-Id: I754138ce1785c6036eada9ed53f0ce2ad2a58b63
Reviewed-on: https://go-review.googlesource.com/c/go/+/701297
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Florian Lehner <lehner.florian86@gmail.com>
2025-09-08 13:34:03 -07:00
Jake Bailey
d767064170 cmd/compile: mark abi.PtrType.Elem sym as used
CL 700336 let the compiler see into the abi.PtrType.Elem field,
but forgot the MarkTypeSymUsedInInterface to ensure that the symbol
is marked as referenced.

I am not sure how to write a test for this, but I noticed this when
working on further optimizations where I "fixed" this issue and
confusingly failed toolstash -cmp, with diffs like:

@@ -70582,6 +70582,7 @@ reflect.groupAndSlotOf<1> STEXT size=696 args=0x20 locals=0x1e0 funcid=0x0 align
 	rel 3+0 t=R_USEIFACE type:*reflect.rtype<0>+0
 	rel 3+0 t=R_USEIFACE type:*reflect.rtype<0>+0
 	rel 3+0 t=R_USEIFACE type:*uint64<0>+0
+	rel 3+0 t=R_USEIFACE type:uint64<0>+0
 	rel 71+0 t=R_CALLIND +0
 	rel 92+4 t=R_PCREL go:itab.*reflect.rtype,reflect.Type<0>+0
 	rel 114+4 t=R_CALL reflect.(*rtype).ptrTo<1>+0

Updates #75203

Change-Id: Ib8de8a32aeb8a7ea6fcf5d728a2e4944ef227ab2
Reviewed-on: https://go-review.googlesource.com/c/go/+/701296
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-05 13:52:20 -07:00
Youlin Feng
df29038486 cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem
This CL makes the generated code for reflect.TypeFor as simple as an
intrinsic function.

Fixes #75203

Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/700336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-04 07:25:26 -07:00
Michael Munday
84b070bfb1 cmd/compile/internal/ssa: make oneBit function generic
Allows rewrite rules using oneBit to be made more compact.

Change-Id: I986715f77db5b548759d809fe668e1893048f25c
Reviewed-on: https://go-review.googlesource.com/c/go/+/699295
Reviewed-by: Youlin Feng <fengyoulin@live.com>
Commit-Queue: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-29 10:02:33 -07:00
Keith Randall
74421a305b Revert "cmd/compile: allow multi-field structs to be stored directly in interfaces"
This reverts commit cd55f86b8d (CL 681937)

Reason for revert: still causing compiler failures on Google test code

Change-Id: I5cd482fd607fd060a523257082d48821b5f965d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/695016
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-11 22:59:52 -07:00
Keith Randall
7248995b60 Revert "cmd/compile: allow more args in StructMake folding rule"
This reverts commit 72e8237cc1 (CL 693615)

Reason for revert: still causing compiler failures on Google test code

Change-Id: I4a7850c321d95ed7803d56866bb0c524c7a377d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/695015
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-11 22:36:20 -07:00
Keith Randall
72e8237cc1 cmd/compile: allow more args in StructMake folding rule
imakeOfStructMake does the right thing, but we never call it
when the StructMake has more than one argument.

Fixes #74908

Change-Id: Ib4b1a025bfb1fa69a325207e47b74bd6217092bf
Reviewed-on: https://go-review.googlesource.com/c/go/+/693615
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-06 11:04:07 -07:00
Keith Randall
cd55f86b8d cmd/compile: allow multi-field structs to be stored directly in interfaces
If the struct is a bunch of 0-sized fields and one pointer field.

Fixes #74092

Change-Id: I87c5d162c8c9fdba812420d7f9d21de97295b62c
Reviewed-on: https://go-review.googlesource.com/c/go/+/681937
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-05 09:18:31 -07:00
Cuong Manh Le
bd94ae8903 cmd/compile: use unsigned power-of-two detector for unsigned mod
Same as CL 689815, but for modulus instead of division.

Updates #74485

Change-Id: I73000231c886a987a1093669ff207fd9117a8160
Reviewed-on: https://go-review.googlesource.com/c/go/+/689895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-29 16:22:40 -07:00
Cuong Manh Le
f3582fc80e cmd/compile: add unsigned power-of-two detector
Fixes #74485

Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/689815
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-29 16:22:37 -07:00
Jorropo
4569255f8c cmd/compile: cleanup SelectN rules by indexing into args
Change-Id: I7b8e8cd88c4d6d562aa25df91593d35d331ef63c
Reviewed-on: https://go-review.googlesource.com/c/go/+/690595
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-28 11:06:11 -07:00
Jorropo
94645d2413 cmd/compile: rewrite cmov(x, x, cond) into x
I don't think branchelim will intentionally generate theses.
But at the time where branchelim is generating them they might different,
and through opt process they become the same value.

Change-Id: I4a19f1db14c08057b7e782a098f4c18ca36ab7fe
Reviewed-on: https://go-review.googlesource.com/c/go/+/690519
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <mark@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-28 11:06:07 -07:00
Jorropo
ce05ad448f cmd/compile: rewrite condselects into doublings and halvings
For performance see CL 685676.

This allows something like:
  if y { x *= 2 }

To be compiled to:
  SHLXQ BX, AX, AX

Instead of:
  MOVQ    AX, CX
  SHLQ    $1, CX
  MOVBLZX BL, DX
  TESTQ   DX, DX
  CMOVQNE CX, AX

While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings.

Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444
Reviewed-on: https://go-review.googlesource.com/c/go/+/685695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-24 14:42:15 -07:00
Jorropo
fcd28070fe cmd/compile: add opt branchelim to rewrite some CondSelect into math
This allows something like:
  if y { x++ }

To be compiled to:
  MOVBLZX BX, CX
  ADDQ CX, AX

Instead of:
  LEAQ    1(AX), CX
  MOVBLZX BL, DX
  TESTQ   DX, DX
  CMOVQNE CX, AX

While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions.

See benchmark here: https://go.dev/play/p/DJf5COjwhd_s

Either it's a performance no-op or it is faster:

  goos: linux
  goarch: amd64
  cpu: AMD Ryzen 5 3600 6-Core Processor
                                          │ /tmp/old.logs │            /tmp/new.logs             │
                                          │    sec/op     │    sec/op     vs base                │
  CmovInlineConditionAddLatency-12           0.5443n ± 5%   0.5339n ± 3%   -1.90% (p=0.004 n=10)
  CmovInlineConditionAddThroughputBy6-12      1.492n ± 1%    1.494n ± 1%        ~ (p=0.955 n=10)
  CmovInlineConditionSubLatency-12           0.5419n ± 3%   0.5282n ± 3%   -2.52% (p=0.019 n=10)
  CmovInlineConditionSubThroughputBy6-12      1.587n ± 1%    1.584n ± 2%        ~ (p=0.492 n=10)
  CmovOutlineConditionAddLatency-12          0.5223n ± 1%   0.2639n ± 4%  -49.47% (p=0.000 n=10)
  CmovOutlineConditionAddThroughputBy6-12     1.159n ± 1%    1.097n ± 2%   -5.35% (p=0.000 n=10)
  CmovOutlineConditionSubLatency-12          0.5271n ± 3%   0.2654n ± 2%  -49.66% (p=0.000 n=10)
  CmovOutlineConditionSubThroughputBy6-12     1.053n ± 1%    1.050n ± 1%        ~ (p=1.000 n=10)
  geomean

There are other benefits not tested by this benchmark:
- the math form is usually a couple bytes shorter (ICACHE)
- the math form is usually 0~2 uops shorter (UCACHE)
- the math form has usually less register pressure*
- the math form can sometimes be optimized further

*regalloc rarely find how it can use less registers

As far as pass ordering goes there are many possible options,
I've decided to reorder branchelim before late opt since:
- unlike running exclusively the CondSelect rules after branchelim,
  some extra optimizations might trigger on the adds or subs.
- I don't want to maintain a second generic.rules file of only the stuff,
  that can trigger after branchelim.
- rerunning all of opt a third time increase compilation time for little gains.

By elimination moving branchelim seems fine.

Change-Id: I869adf57e4d109948ee157cfc47144445146bafd
Reviewed-on: https://go-review.googlesource.com/c/go/+/685676
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24 14:42:10 -07:00
Jake Bailey
27ff0f249c cmd/compile/internal/ssa: eliminate string copies for calls to unique.Make
unique.Make always copies strings passed into it, so it's safe to not
copy byte slices converted to strings either. Handle this just like map
accesses with string(b) as keys.

This CL only handles unique.Make(string(b)), not nested cases like
unique.Make([2]string{string(b1), string(b2)}); this could be done in a
followup CL but the map lookup code in walk is sufficiently different
than the call handling code that I didn't attempt it. (SSA is much
easier).

Fixes #71926

Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85
Reviewed-on: https://go-review.googlesource.com/c/go/+/672135
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 20:20:31 -07:00
Keith Randall
7d0cb2a2ad cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22 10:24:18 -07:00
Alexander Musman
16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Jorropo
b60b9cf21f cmd/compile: add constant folding for bits.Add64
Change-Id: I0ed4ebeaaa68e274e5902485ccc1165c039440bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/656275
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11 20:17:53 -07:00
Jorropo
4ff70cf868 cmd/compile: add MakeTuple generic SSA op to remove duplicate Select[01] rules
Change-Id: Id94a5e503f02aa29dc1e334b521770107d4261db
Reviewed-on: https://go-review.googlesource.com/c/go/+/656615
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11 20:17:48 -07:00
Jorropo
3e033b7553 cmd/compile: add constant folding for PopCount
Change-Id: I6ea3f75ddd5c7af114ef77bc48f28c7f8570997b
Reviewed-on: https://go-review.googlesource.com/c/go/+/656156
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-11 13:50:52 -07:00
Mateusz Poliwczak
43e6525986 cmd/compile: load properly constant values from itabs
While looking at the SSA of following code, i noticed
that these rules do not work properly, and the types
are loaded indirectly through an itab, instead of statically.

type M interface{ M() }
type A interface{ A() }

type Impl struct{}
func (*Impl) M() {}
func (*Impl) A() {}

func main() {
        var a M = &Impl{}
        a.(A).A()
}

Change-Id: Ia275993f81a2e7302102d4ff87ac28586023d13c
GitHub-Last-Rev: 4bfc901917
GitHub-Pull-Request: golang/go#71784
Reviewed-on: https://go-review.googlesource.com/c/go/+/649500
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-19 13:39:00 -08:00
Keith Randall
89c2f282dc cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes)
before calling map access or delete functions with the resulting
string as the key, then we can just use the ptr/len of the
slicebytetostring argument as the key. This avoids an allocation.

Fixes #44898
Update #71132

There's old code in cmd/compile/internal/walk/order.go that handles
some of these cases.

1. m[string(b)]
2. s := string(b); m[s]
3. m[[2]string{string(b1),string(b2)}]

The old code handled cases 1&3. The new code handles cases 1&2.
We'll leave the old code around to keep 3 working, although it seems
not terribly common.

Case 2 happens particularly after inlining, so it is pretty common.

Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43
Reviewed-on: https://go-review.googlesource.com/c/go/+/640656
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 13:03:07 -08:00
Keith Randall
072eea9b3b cmd/compile: avoid ifaceeq call if we know the interface is direct
We can just use == if the interface is direct.

Fixes #70738

Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f
Reviewed-on: https://go-review.googlesource.com/c/go/+/635435
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-10 13:28:41 -08:00
Jakub Ciolek
f7dbbf2519 cmd/compile: distribute 8 and 16-bit multiplication
Expand the existing rule to cover 8 and 16 bit variants.

compilecmp linux/amd64:

time
time.parseStrictRFC3339.func1 80 -> 70  (-12.50%)
time.Time.appendStrictRFC3339.func1 80 -> 70  (-12.50%)
time.Time.appendStrictRFC3339 439 -> 428  (-2.51%)

time [cmd/compile]
time.parseStrictRFC3339.func1 80 -> 70  (-12.50%)
time.Time.appendStrictRFC3339.func1 80 -> 70  (-12.50%)
time.Time.appendStrictRFC3339 439 -> 428  (-2.51%)

linux/arm64:

time
time.parseStrictRFC3339.func1 changed
time.Time.appendStrictRFC3339.func1 changed
time.Time.appendStrictRFC3339 416 -> 400  (-3.85%)

time [cmd/compile]
time.Time.appendStrictRFC3339 416 -> 400  (-3.85%)
time.parseStrictRFC3339.func1 changed
time.Time.appendStrictRFC3339.func1 changed

Change-Id: I0ad3b2363a9fe8c322dd05fbc13bf151a146d8cb
Reviewed-on: https://go-review.googlesource.com/c/go/+/641756
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-02-03 08:41:08 -08:00
Keith Randall
f0b0109242 cmd/compile: pull multiple adds out of an unsafe.Pointer<->uintptr conversion
This came up in some swissmap code.

Change-Id: I3c6705a5cafec8cb4953dfa9535cf0b45255cc83
Reviewed-on: https://go-review.googlesource.com/c/go/+/629516
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-11-21 22:57:04 +00:00
Keith Randall
7d2e3da243 cmd/compile: remove redundant nil checks
Optimize them away if we can.

If not, be more careful about splicing them out after scheduling.

Change-Id: I660e54649d753dc456d2e25d389d375a16d76940
Reviewed-on: https://go-review.googlesource.com/c/go/+/627418
Reviewed-by: Shengwei Zhao <wingrez@126.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-11-20 16:28:52 +00:00
Youlin Feng
32f4864216 cmd/compile: arithmetic optimization for shifts
Fixes #69635

Change-Id: I4f8d7dafb34ccfb943c29f96c982278ab7edcd05
Reviewed-on: https://go-review.googlesource.com/c/go/+/621357
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-23 15:14:28 +00:00
Cuong Manh Le
6d856a804c cmd/compile: generalize struct load/store
The SSA backend currently only handle struct with up to 4 fields. Thus,
there are different operations corresponding to number fields of the
struct.

This CL generalizes these with just one OpStructMake, allow struct types
with arbitrary number of fields.

However, the ssa.MaxStruct is still kept as-is, and future CL will
increase this value to optimize large structs.

Updates #24416

Change-Id: I192ffbea881186693584476b5639394e79be45c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/611075
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-26 13:18:08 +00:00
khr@golang.org
944a2ac3c7 cmd/compile: small cleanups to rewrite rule helpers
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c
Reviewed-on: https://go-review.googlesource.com/c/go/+/615198
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-09-24 21:02:34 +00:00
khr@golang.org
be86f09e01 cmd/compile: use generics for isPowerOfTwo predicates
Change-Id: I097b53e9f13de6ff6eb18ae2261842b097f26390
Reviewed-on: https://go-review.googlesource.com/c/go/+/615197
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-24 21:02:31 +00:00
Jorropo
090f03fd2f cmd/compile: do constant folding for BitLen*
Change-Id: I56c27d606b55ea882f4db264fd4735b0cccdf7c6
Reviewed-on: https://go-review.googlesource.com/c/go/+/604015
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-09-03 16:38:29 +00:00
Keith Randall
9f9dd2bfd8 cmd/compile: fix cmpstring rewrite rule
We need to ensure that the Select0 lives in the same block as
its argument. Divide up the rule into 2 so that we can put the
parts in the right places.

(This would be simpler if we could use @block syntax mid-rule, but
that feature currently only works at the top level.)

This fixes the ssacheck builder after CL 578835

Change-Id: Id26a01d9fac0684e0b732d35d0f7999f6de07825
Reviewed-on: https://go-review.googlesource.com/c/go/+/580815
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-04-22 18:24:47 +00:00
khr@golang.org
1a0b86375f cmd/compile: remove redundant calls to cmpstring
The results of cmpstring are reuseable if the second call has the
same arguments and memory.

Note that this gets rid of cmpstring, but we still generate a
redundant </<= test and branch afterwards, because the compiler
doesn't know that cmpstring only ever returns -1,0,1.

Update #61725

Change-Id: I93a0d1ccca50d90b1e1a888240ffb75a3b10b59b
Reviewed-on: https://go-review.googlesource.com/c/go/+/578835
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-19 16:31:02 +00:00
Jorropo
bbd0bc22ea cmd/compile: improve integer comparisons with numeric bounds
This do:
- Fold always false or always true comparisons for ints and uint.
- Reduce < and <= where the true set is only one value to == with such value.

Change-Id: Ie9e3f70efd1845bef62db56543f051a50ad2532e
Reviewed-on: https://go-review.googlesource.com/c/go/+/555135
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-01-23 00:02:36 +00:00
Keith Randall
962ccbef91 cmd/compile: ensure pointer arithmetic happens after the nil check
Have nil checks return a pointer that is known non-nil. Users of
that pointer can use the result, ensuring that they are ordered
after the nil check itself.

The order dependence goes away after scheduling, when we've fixed
an order. At that point we move uses back to the original pointer
so it doesn't change regalloc any.

This prevents pointer arithmetic on nil from being spilled to the
stack and then observed by a stack scan.

Fixes #63657

Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/537775
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:45:54 +00:00
Matthew Dempsky
45d3d10071 cmd/compile/internal/ssa: rename ssagen.TypeOK as CanSSA
No need to indirect through Frontend for this.

Change-Id: I5812eb4dadfda79267cabc9d13aeab126c1479e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/526517
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-08 19:03:54 +00:00
Keith Randall
d0964e172b cmd/compile: optimize s==s for strings
s==s is always true for strings. This comes up in NaN testing in
generic code, where we want x==x to compile completely away except for
float types.

Fixes #60777

Change-Id: I3ce054b5121354de2f9751b010fb409f148cb637
Reviewed-on: https://go-review.googlesource.com/c/go/+/503795
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
2023-07-26 17:17:28 +00:00
Keith Randall
bd3f44e4ff cmd/compile: constant-fold loads from constant dictionaries and types
Retrying the original CL with a small modification. The original CL
did not handle the case of reading an itab out of a dictionary
correctly.  When we read an itab out of a dictionary, we must treat
the type inside that itab as maybe being put in an interface.

Original CL: 486895
Revert CL: 490156

Change-Id: Id2dc1699d184cd8c63dac83986a70b60b4e6cbd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/491495
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-19 18:10:11 +00:00