Commit graph

605 commits

Author SHA1 Message Date
David Chase
7056c71d32 cmd/compile: disable use of new saturating float-to-int conversions
The new conversions can be activated (or bisected) with
  -gcflags=all=-d=converthash=PATTERN

where PATTERN is either a hash string or n, qn, y, qy for
no, quietly no, yes, quietly yes.

This CL makes the default pattern be "qn" instead of the
default-default which is an efficient encoding of "qy".

Updates #75834

Change-Id: I88a9fd7880bc999132420c8d0a22a8fdc1e95a2a
Reviewed-on: https://go-review.googlesource.com/c/go/+/711845
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Bypass: David Chase <drchase@google.com>
2025-10-14 15:09:35 -07:00
Keith Randall
9b8742f2e7 cmd/compile: don't depend on arch-dependent conversions in the compiler
Leave those constant foldings for runtime, similar to how we do it
for NaN generation.

These are the only instances I could find in cmd/compile/..., using

objdump -d ../pkg/tool/darwin_arm64/compile| egrep "(fcvtz|>:)" | grep -B1 fcvt

(There are instances in other places, like runtime and reflect, but I don't
think those places would affect compiler output.)

Change-Id: I4113fe4570115e4765825cf442cb1fde97cf2f27
Reviewed-on: https://go-review.googlesource.com/c/go/+/711281
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-10-13 12:19:32 -07:00
Michael Matloob
19a30ea3f2 cmd/compile: call generated size-specialized malloc functions directly
This change creates calls to size-specialized malloc functions instead
of calls to newObject when we know the size of the allocation at
compilation time. Most of it is a matter of calling the newObject
function (which will create calls to the size-specialized functions)
rather then the newObjectNonSpecialized function (which won't). In the
newHeapaddr, small, non-pointer case, we'll create a non specialized
newObject and transform that into the appropriate size-specialized
function when we produce the mallocgc in flushPendingHeapAllocations.

We have to update some of the rewrites in generic.rules to also apply to
the size-specialized functions when they apply to newObject.

The messiest thing is we have to adjust the offset we use to save the
memory profiler stack, because the depth of the call to profilealloc is
two frames fewer in the size-specialized malloc functions compared to
when newObject calls mallocgc. A bunch of tests have been adjusted to
account for that.

Change-Id: I6a6a6964c9037fb6719e392c4a498ed700b617d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/707856
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Michael Matloob <matloob@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-09 14:59:40 -07:00
Michael Munday
97fd6bdecc cmd/compile: fuse NaN checks with other comparisons
NaN checks can often be merged into other comparisons by inverting them.
For example, `math.IsNaN(x) || x > 0` is equivalent to `!(x <= 0)`.

goos: linux
goarch: amd64
pkg: math
cpu: 12th Gen Intel(R) Core(TM) i7-12700T
            │         sec/op         │    sec/op     vs base                │
Acos                     4.315n ± 0%    4.314n ± 0%        ~ (p=0.642 n=10)
Acosh                    8.398n ± 0%    7.779n ± 0%   -7.37% (p=0.000 n=10)
Asin                     4.203n ± 0%    4.211n ± 0%   +0.20% (p=0.001 n=10)
Asinh                   10.150n ± 0%    9.562n ± 0%   -5.79% (p=0.000 n=10)
Atan                     2.363n ± 0%    2.363n ± 0%        ~ (p=0.801 n=10)
Atanh                    8.192n ± 2%    7.685n ± 0%   -6.20% (p=0.000 n=10)
Atan2                    4.013n ± 0%    4.010n ± 0%        ~ (p=0.073 n=10)
Cbrt                     4.858n ± 0%    4.755n ± 0%   -2.12% (p=0.000 n=10)
Cos                      4.596n ± 0%    4.357n ± 0%   -5.20% (p=0.000 n=10)
Cosh                     5.071n ± 0%    5.071n ± 0%        ~ (p=0.585 n=10)
Erf                      2.802n ± 1%    2.788n ± 0%   -0.54% (p=0.002 n=10)
Erfc                     3.087n ± 1%    3.071n ± 0%        ~ (p=0.320 n=10)
Erfinv                   3.981n ± 0%    3.965n ± 0%   -0.41% (p=0.000 n=10)
Erfcinv                  3.985n ± 0%    3.977n ± 0%   -0.20% (p=0.000 n=10)
ExpGo                    8.721n ± 2%    8.252n ± 0%   -5.38% (p=0.000 n=10)
Expm1                    4.378n ± 0%    4.228n ± 0%   -3.43% (p=0.000 n=10)
Exp2                     8.313n ± 0%    7.855n ± 0%   -5.52% (p=0.000 n=10)
Exp2Go                   8.498n ± 2%    7.921n ± 0%   -6.79% (p=0.000 n=10)
Mod                      15.16n ± 4%    12.20n ± 1%  -19.58% (p=0.000 n=10)
Frexp                    1.780n ± 2%    1.496n ± 0%  -15.96% (p=0.000 n=10)
Gamma                    4.378n ± 1%    4.013n ± 0%   -8.35% (p=0.000 n=10)
HypotGo                  2.655n ± 5%    2.427n ± 1%   -8.57% (p=0.000 n=10)
Ilogb                    1.912n ± 5%    1.749n ± 0%   -8.53% (p=0.000 n=10)
J0                       22.43n ± 9%    20.46n ± 0%   -8.76% (p=0.000 n=10)
J1                       21.03n ± 4%    19.96n ± 0%   -5.09% (p=0.000 n=10)
Jn                       45.40n ± 1%    42.59n ± 0%   -6.20% (p=0.000 n=10)
Ldexp                    2.312n ± 1%    1.944n ± 0%  -15.94% (p=0.000 n=10)
Lgamma                   4.617n ± 1%    4.584n ± 0%   -0.73% (p=0.000 n=10)
Log                      4.226n ± 0%    4.213n ± 0%   -0.31% (p=0.001 n=10)
Logb                     1.771n ± 0%    1.775n ± 0%        ~ (p=0.097 n=10)
Log1p                    5.102n ± 2%    5.001n ± 0%   -1.97% (p=0.000 n=10)
Log10                    4.407n ± 0%    4.408n ± 0%        ~ (p=1.000 n=10)
Log2                     2.416n ± 1%    2.138n ± 0%  -11.51% (p=0.000 n=10)
Modf                     1.669n ± 2%    1.611n ± 0%   -3.50% (p=0.000 n=10)
Nextafter32              2.186n ± 0%    2.185n ± 0%        ~ (p=0.051 n=10)
Nextafter64              2.182n ± 0%    2.184n ± 0%   +0.09% (p=0.016 n=10)
PowInt                   11.39n ± 6%    10.68n ± 2%   -6.24% (p=0.000 n=10)
PowFrac                  26.60n ± 2%    26.12n ± 0%   -1.80% (p=0.000 n=10)
Pow10Pos                0.5067n ± 4%   0.5003n ± 1%   -1.27% (p=0.001 n=10)
Pow10Neg                0.8552n ± 0%   0.8552n ± 0%        ~ (p=0.928 n=10)
Round                    1.181n ± 0%    1.182n ± 0%   +0.08% (p=0.001 n=10)
RoundToEven              1.709n ± 0%    1.710n ± 0%        ~ (p=0.053 n=10)
Remainder                12.54n ± 5%    11.99n ± 2%   -4.46% (p=0.000 n=10)
Sin                      3.933n ± 5%    3.926n ± 0%   -0.17% (p=0.000 n=10)
Sincos                   5.672n ± 0%    5.522n ± 0%   -2.65% (p=0.000 n=10)
Sinh                     5.447n ± 1%    5.444n ± 0%   -0.06% (p=0.029 n=10)
Tan                      4.061n ± 0%    4.058n ± 0%   -0.07% (p=0.005 n=10)
Tanh                     5.599n ± 0%    5.595n ± 0%   -0.06% (p=0.042 n=10)
Y0                       20.75n ± 5%    19.73n ± 1%   -4.92% (p=0.000 n=10)
Y1                       20.87n ± 2%    19.78n ± 1%   -5.20% (p=0.000 n=10)
Yn                       44.50n ± 2%    42.04n ± 2%   -5.53% (p=0.000 n=10)
geomean                  4.989n         4.791n        -3.96%

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │     sec/op     │    sec/op     vs base                  │
Acos                    159.9n ±  0%   159.9n ±  0%        ~ (p=0.269 n=10)
Acosh                   244.7n ±  0%   235.0n ±  0%   -3.98% (p=0.000 n=10)
Asin                    159.9n ±  0%   159.9n ±  0%        ~ (p=0.154 n=10)
Asinh                   270.8n ±  0%   261.1n ±  0%   -3.60% (p=0.000 n=10)
Atan                    119.1n ±  0%   119.1n ±  0%        ~ (p=0.347 n=10)
Atanh                   260.2n ±  0%   261.8n ±  4%        ~ (p=0.459 n=10)
Atan2                   186.8n ±  0%   186.8n ±  0%        ~ (p=0.487 n=10)
Cbrt                    203.5n ±  0%   198.2n ±  0%   -2.60% (p=0.000 n=10)
Ceil                    31.82n ±  0%   31.81n ±  0%        ~ (p=0.714 n=10)
Copysign                4.894n ±  0%   4.893n ±  0%        ~ (p=0.161 n=10)
Cos                     107.6n ±  0%   103.6n ±  0%   -3.76% (p=0.000 n=10)
Cosh                    259.0n ±  0%   252.8n ±  0%   -2.39% (p=0.000 n=10)
Erf                     133.7n ±  0%   133.7n ±  0%        ~ (p=0.720 n=10)
Erfc                    137.9n ±  0%   137.8n ±  0%   -0.04% (p=0.033 n=10)
Erfinv                  173.7n ±  0%   168.8n ±  0%   -2.82% (p=0.000 n=10)
Erfcinv                 173.7n ±  0%   168.8n ±  0%   -2.82% (p=0.000 n=10)
Exp                     215.3n ±  0%   208.1n ±  0%   -3.34% (p=0.000 n=10)
ExpGo                   226.7n ±  0%   220.6n ±  0%   -2.69% (p=0.000 n=10)
Expm1                   164.8n ±  0%   159.0n ±  0%   -3.52% (p=0.000 n=10)
Exp2                    185.0n ±  0%   182.7n ±  0%   -1.22% (p=0.000 n=10)
Exp2Go                  198.9n ±  0%   196.5n ±  0%   -1.21% (p=0.000 n=10)
Abs                     4.894n ±  0%   4.893n ±  0%        ~ (p=0.262 n=10)
Dim                     16.31n ±  0%   16.31n ±  0%        ~ (p=1.000 n=10)
Floor                   31.81n ±  0%   31.81n ±  0%        ~ (p=0.067 n=10)
Max                     26.11n ±  0%   26.10n ±  0%        ~ (p=0.080 n=10)
Min                     26.10n ±  0%   26.10n ±  0%        ~ (p=0.095 n=10)
Mod                     337.7n ±  0%   291.9n ±  0%  -13.56% (p=0.000 n=10)
Frexp                   50.57n ±  0%   42.41n ±  0%  -16.13% (p=0.000 n=10)
Gamma                   206.3n ±  0%   198.1n ±  0%   -4.00% (p=0.000 n=10)
Hypot                   94.62n ±  0%   94.61n ±  0%        ~ (p=0.437 n=10)
HypotGo                 109.3n ±  0%   109.3n ±  0%        ~ (p=1.000 n=10)
Ilogb                   44.05n ±  0%   44.04n ±  0%   -0.02% (p=0.025 n=10)
J0                      663.1n ±  0%   663.9n ±  0%   +0.13% (p=0.002 n=10)
J1                      663.9n ±  0%   666.4n ±  0%   +0.38% (p=0.000 n=10)
Jn                      1.404µ ±  0%   1.407µ ±  0%   +0.21% (p=0.000 n=10)
Ldexp                   57.10n ±  0%   48.93n ±  0%  -14.30% (p=0.000 n=10)
Lgamma                  185.1n ±  0%   187.6n ±  0%   +1.32% (p=0.000 n=10)
Log                     182.7n ±  0%   170.1n ±  0%   -6.87% (p=0.000 n=10)
Logb                    46.49n ±  0%   46.49n ±  0%        ~ (p=0.675 n=10)
Log1p                   184.3n ±  0%   179.4n ±  0%   -2.63% (p=0.000 n=10)
Log10                   184.3n ±  0%   171.2n ±  0%   -7.08% (p=0.000 n=10)
Log2                    66.05n ±  0%   57.90n ±  0%  -12.34% (p=0.000 n=10)
Modf                    34.25n ±  0%   34.24n ±  0%        ~ (p=0.163 n=10)
Nextafter32             49.33n ±  1%   48.93n ±  0%   -0.81% (p=0.002 n=10)
Nextafter64             43.64n ±  0%   43.23n ±  0%   -0.93% (p=0.000 n=10)
PowInt                  267.6n ±  0%   251.2n ±  0%   -6.11% (p=0.000 n=10)
PowFrac                 672.9n ±  0%   637.9n ±  0%   -5.19% (p=0.000 n=10)
Pow10Pos                13.87n ±  0%   13.87n ±  0%        ~ (p=1.000 n=10)
Pow10Neg                19.58n ± 62%   19.59n ± 62%        ~ (p=0.355 n=10)
Round                   23.65n ±  0%   23.65n ±  0%        ~ (p=1.000 n=10)
RoundToEven             27.73n ±  0%   27.73n ±  0%        ~ (p=0.635 n=10)
Remainder               309.9n ±  0%   280.5n ±  0%   -9.49% (p=0.000 n=10)
Signbit                 13.05n ±  0%   13.05n ±  0%        ~ (p=1.000 n=10) ¹
Sin                     120.7n ±  0%   120.7n ±  0%        ~ (p=1.000 n=10) ¹
Sincos                  148.4n ±  0%   143.5n ±  0%   -3.30% (p=0.000 n=10)
Sinh                    275.6n ±  0%   267.5n ±  0%   -2.94% (p=0.000 n=10)
SqrtIndirect            3.262n ±  0%   3.262n ±  0%        ~ (p=0.263 n=10)
SqrtLatency             19.57n ±  0%   19.57n ±  0%        ~ (p=0.582 n=10)
SqrtIndirectLatency     19.57n ±  0%   19.57n ±  0%        ~ (p=1.000 n=10)
SqrtGoLatency           203.2n ±  0%   197.6n ±  0%   -2.78% (p=0.000 n=10)
SqrtPrime               4.952µ ±  0%   4.952µ ±  0%   -0.01% (p=0.025 n=10)
Tan                     153.3n ±  0%   153.3n ±  0%        ~ (p=1.000 n=10)
Tanh                    280.5n ±  0%   272.4n ±  0%   -2.91% (p=0.000 n=10)
Trunc                   31.81n ±  0%   31.81n ±  0%        ~ (p=1.000 n=10)
Y0                      680.1n ±  0%   664.8n ±  0%   -2.25% (p=0.000 n=10)
Y1                      684.2n ±  0%   669.6n ±  0%   -2.14% (p=0.000 n=10)
Yn                      1.444µ ±  0%   1.410µ ±  0%   -2.35% (p=0.000 n=10)
Float64bits             5.709n ±  0%   5.708n ±  0%        ~ (p=0.573 n=10)
Float64frombits         4.893n ±  0%   4.893n ±  0%        ~ (p=0.734 n=10)
Float32bits             12.23n ±  0%   12.23n ±  0%        ~ (p=0.628 n=10)
Float32frombits         4.893n ±  0%   4.893n ±  0%        ~ (p=0.971 n=10)
FMA                     4.893n ±  0%   4.893n ±  0%        ~ (p=0.736 n=10)
geomean                 88.96n         87.05n         -2.15%
¹ all samples are equal

Change-Id: I8db8ac7b7b3430b946b89e88dd6c1546804125c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697360
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
2025-10-08 08:11:17 -07:00
Cherry Mui
1d62e92567 test/codegen: make sure assignment results are used.
Some tests make assignments to an argument without reading it.
With CL 708865, they are treated as dead stores and are removed.
Make sure the results are used.

Fixes #75745.
Fixes #75746.

Change-Id: I05580beb1006505ec1550e5fa245b54dcefd10b9
Reviewed-on: https://go-review.googlesource.com/c/go/+/708916
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-10-06 14:51:23 -07:00
Cherry Mui
38b26f29f1 cmd/compile: remove stores to unread parameters
Currently, we remove stores to local variables that are not read.
We don't do that for arguments. But arguments and locals are
essentially the same. Arguments are passed by value, and are not
expected to be read in the caller's frame. So we can remove the
writes to them as well. One exception is the cgo_unsafe_arg
directive, which makes all the arguments effectively address-taken.
cgo_unsafe_arg implies ABI0, so we just skip ABI0 functions'
arguments.

Cherry-picked from the dev.simd branch. This CL is not
necessarily SIMD specific. Apply early to reduce risk.

Change-Id: I8999fc50da6a87f22c1ec23e9a0c15483b6f7df8
Reviewed-on: https://go-review.googlesource.com/c/go/+/705815
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-on: https://go-review.googlesource.com/c/go/+/708865
2025-10-03 12:31:20 -07:00
Joel Sing
4ff8a457db test/codegen: codify handling of floating point constants on arm64
While here, reorder Float32ConstantStore/Float64ConstantStore for
consistency.

Change-Id: Ic1b3e9f9474965d15bc94518d78d1a4a7bda93f3
Reviewed-on: https://go-review.googlesource.com/c/go/+/703756
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Auto-Submit: Joel Sing <joel@sing.id.au>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-30 14:49:25 -07:00
Jake Bailey
97da068774 cmd/compile: eliminate nil checks on .dict arg
The first arg of a generic function is the dictionary. This dictionary
is never nil, but it gets a nil check becuase the dict arg is treated as
a slice during construction.

cmp.Compare[go.shape.int] was:

00006 (+41) TESTB AX, (AX)
00007 (+52) CMPQ CX, BX
00008 (52) JGT 14
00009 (+55) JGE 12
00010 (+56) MOVL $1, AX
00011 (56) RET
00012 (+58) XORL AX, AX
00013 (58) RET
00014 (+53) MOVQ $-1, AX
00015 (53) RET

Note how the function begins with a TESTB that loads the dict to perform
the nil check.

This CL eliminates that nil check.

For most generic functions, this doesn't matter too much, but not
infrequently are generic functions written which never actually use the
dictionary (like cmp.Compare), so I suspect this might help in hot code
to avoid repeatedly touching the dictionary in memory, and in cases
where the generic function is not inlined (and thus the dict dropped).

compilecmp shows these changes (deduped):

cmp.Compare[go.shape.float64] 73 -> 72  (-1.37%)
cmp.Compare[go.shape.int] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.int32] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.int64] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.string] 142 -> 141  (-0.70%)
cmp.Compare[go.shape.uint16] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint32] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.uint64] 26 -> 24  (-7.69%)
cmp.Compare[go.shape.uint8] 25 -> 23  (-8.00%)
cmp.Compare[go.shape.uintptr] 26 -> 24  (-7.69%)
cmp.Less[go.shape.float64] 35 -> 34  (-2.86%)
cmp.Less[go.shape.int32] 8 -> 6  (-25.00%)
cmp.Less[go.shape.int64] 9 -> 7  (-22.22%)
cmp.Less[go.shape.int] 9 -> 7  (-22.22%)
cmp.Less[go.shape.string] 112 -> 110  (-1.79%)
cmp.Less[go.shape.uint16] 9 -> 7  (-22.22%)
cmp.Less[go.shape.uint32] 8 -> 6  (-25.00%)
cmp.Less[go.shape.uint64] 9 -> 7  (-22.22%)
internal/synctest.Associate[go.shape.struct 114 -> 113  (-0.88%)
internal/trace.(*dataTable[go.shape.uint64,go.shape.string]).insert 805 -> 791  (-1.74%)
internal/trace.(*dataTable[go.shape.uint64,go.shape.struct 858 -> 852  (-0.70%)
main.(*gState[go.shape.int64]).stop 2111 -> 2085  (-1.23%)
main.(*gState[go.shape.int64]).unblock 941 -> 923  (-1.91%)
runtime.fmax[go.shape.float32] 85 -> 83  (-2.35%)
runtime.fmax[go.shape.float64] 89 -> 87  (-2.25%)
runtime.fmin[go.shape.float32] 85 -> 83  (-2.35%)
runtime.fmin[go.shape.float64] 89 -> 87  (-2.25%)
slices.BinarySearch[go.shape.[]string,go.shape.string] 346 -> 337  (-2.60%)
slices.Concat[go.shape.[]uint8,go.shape.uint8] 462 -> 453  (-1.95%)
slices.ContainsFunc[go.shape.[]*cmd/vendor/github.com/google/pprof/profile.Sample,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]*debug/dwarf.StructField,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]*go/ast.Field,go.shape.*uint8] 170 -> 169  (-0.59%)
slices.ContainsFunc[go.shape.[]string,go.shape.string] 186 -> 181  (-2.69%)
slices.Contains[go.shape.[]*cmd/compile/internal/syntax.BranchStmt,go.shape.*cmd/compile/internal/syntax.BranchStmt] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]cmd/compile/internal/syntax.Type,go.shape.interface 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]crypto/tls.CurveID,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]crypto/tls.SignatureScheme,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]*go/ast.BranchStmt,go.shape.*go/ast.BranchStmt] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]go/types.Type,go.shape.interface 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]int,go.shape.int] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]string,go.shape.string] 223 -> 219  (-1.79%)
slices.Contains[go.shape.[]uint16,go.shape.uint16] 44 -> 42  (-4.55%)
slices.Contains[go.shape.[]uint8,go.shape.uint8] 44 -> 42  (-4.55%)
slices.Insert[go.shape.[]string,go.shape.string] 1189 -> 1170  (-1.60%)
slices.medianCmpFunc[go.shape.struct 1118 -> 1113  (-0.45%)
slices.medianCmpFunc[go.shape.struct 1214 -> 1209  (-0.41%)
slices.medianCmpFunc[go.shape.struct 889 -> 887  (-0.22%)
slices.medianCmpFunc[go.shape.struct 901 -> 874  (-3.00%)
slices.order2Ordered[go.shape.float64] 89 -> 87  (-2.25%)
slices.order2Ordered[go.shape.uint16] 75 -> 70  (-6.67%)
slices.partialInsertionSortOrdered[go.shape.string] 1115 -> 1110  (-0.45%)
slices.partialInsertionSortOrdered[go.shape.uint16] 358 -> 352  (-1.68%)
slices.partitionEqualOrdered[go.shape.int] 208 -> 203  (-2.40%)
slices.partitionEqualOrdered[go.shape.int32] 208 -> 198  (-4.81%)
slices.partitionEqualOrdered[go.shape.int64] 208 -> 203  (-2.40%)
slices.partitionEqualOrdered[go.shape.uint32] 208 -> 198  (-4.81%)
slices.partitionEqualOrdered[go.shape.uint64] 208 -> 203  (-2.40%)
slices.partitionOrdered[go.shape.float64] 538 -> 533  (-0.93%)
slices.partitionOrdered[go.shape.int] 437 -> 427  (-2.29%)
slices.partitionOrdered[go.shape.int64] 437 -> 427  (-2.29%)
slices.partitionOrdered[go.shape.uint16] 447 -> 442  (-1.12%)
slices.partitionOrdered[go.shape.uint64] 437 -> 427  (-2.29%)
slices.rotateCmpFunc[go.shape.struct 1045 -> 1029  (-1.53%)
slices.rotateCmpFunc[go.shape.struct 1205 -> 1163  (-3.49%)
slices.rotateCmpFunc[go.shape.struct 1226 -> 1176  (-4.08%)
slices.rotateCmpFunc[go.shape.struct 1322 -> 1272  (-3.78%)
slices.rotateCmpFunc[go.shape.struct 1419 -> 1400  (-1.34%)
slices.rotateCmpFunc[go.shape.*uint8] 549 -> 538  (-2.00%)
slices.rotateLeft[go.shape.string] 603 -> 588  (-2.49%)
slices.rotateLeft[go.shape.uint8] 255 -> 250  (-1.96%)
slices.siftDownOrdered[go.shape.int] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.int32] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.int64] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.string] 614 -> 592  (-3.58%)
slices.siftDownOrdered[go.shape.uint32] 181 -> 171  (-5.52%)
slices.siftDownOrdered[go.shape.uint64] 181 -> 171  (-5.52%)
time.parseRFC3339[go.shape.string] 1774 -> 1758  (-0.90%)
unique.(*canonMap[go.shape.struct 280 -> 276  (-1.43%)
unique.clone[go.shape.struct 311 -> 293  (-5.79%)
weak.Make[go.shape.6880e4598856efac32416085c0172278cf0fb9e5050ce6518bd9b7f7d1662440] 136 -> 134  (-1.47%)
weak.Make[go.shape.struct 136 -> 134  (-1.47%)
weak.Make[go.shape.uint8] 136 -> 134  (-1.47%)

Change-Id: I43dcea5f2aa37372f773e5edc6a2ef1dee0a8db7
Reviewed-on: https://go-review.googlesource.com/c/go/+/706655
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-30 11:22:35 -07:00
limeidan
af6999e60d cmd/compile: implement jump table on loong64
Following CL 357330, use jump tables on Loong64.

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                                 │     old     │                 new                 │
                                 │   sec/op    │   sec/op     vs base                │
Switch8Predictable                 2.352n ± 0%   2.101n ± 0%  -10.65% (p=0.000 n=10)
Switch8Unpredictable               11.99n ± 0%   10.25n ± 0%  -14.51% (p=0.000 n=10)
Switch32Predictable                3.153n ± 0%   1.887n ± 1%  -40.14% (p=0.000 n=10)
Switch32Unpredictable              12.47n ± 0%   10.22n ± 0%  -18.00% (p=0.000 n=10)
SwitchStringPredictable            3.162n ± 0%   3.352n ± 0%   +6.01% (p=0.000 n=10)
SwitchStringUnpredictable          14.70n ± 0%   13.31n ± 0%   -9.46% (p=0.000 n=10)
SwitchTypePredictable              3.702n ± 0%   2.201n ± 0%  -40.55% (p=0.000 n=10)
SwitchTypeUnpredictable            16.18n ± 0%   14.48n ± 0%  -10.51% (p=0.000 n=10)
SwitchInterfaceTypePredictable     7.654n ± 0%   9.680n ± 0%  +26.47% (p=0.000 n=10)
SwitchInterfaceTypeUnpredictable   22.04n ± 0%   22.44n ± 0%   +1.81% (p=0.000 n=10)
geomean                            7.441n        6.469n       -13.07%

Change-Id: Id6f30fa73349c60fac17670084daee56973a955f
Reviewed-on: https://go-review.googlesource.com/c/go/+/705396
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-09-27 05:02:58 -07:00
Xiaolin Zhao
78ef487a6f cmd/compile: fix the issue of shift amount exceeding the valid range
Fixes #75479

Change-Id: I362d3e49090e94f91a840dd5a475978b59222a00
Reviewed-on: https://go-review.googlesource.com/c/go/+/704135
Reviewed-by: Mark Freeman <markfreeman@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-09-17 18:05:31 -07:00
Meng Zhuo
2469e92d8c cmd/compile: combine doubling with shift on riscv64
Change-Id: I4bee2770fedf97e35b5a5b9187a8ba3c41f9ec2e
Reviewed-on: https://go-review.googlesource.com/c/go/+/702697
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
2025-09-15 17:31:56 -07:00
Meng Zhuo
e5ee1f2600 test/codegen: check zerobase for newobject on 0-sized types
This CL also adds riscv64 checks

Change-Id: I693e4e606f470615f6b49085592d6d5ca61473d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/703716
Reviewed-by: Pengcheng Wang <wangpengcheng.pp@bytedance.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-15 07:47:55 -07:00
Jake Bailey
dc960d0bfe cmd/compile, reflect: further allow inlining of TypeFor
Previous CLs optimized direct use of abi.Type, but reflect.Type is
indirected, so was not benefiting.

For TypeFor, we can use toRType directly without a nil check because the
types are statically known.

Normally, I'd think SSA would remove the nil check, but due to some
oddity (specifically, late fuse being required to remove the nil check,
but opt doesn't run that late) means that the nil check persists and
gets in the way.

Manually writing the code in this instance seems to fix the problem.

It also exposed another problem; depending on the ordering, writeType
could get to a type symbol before SSA, thereby preventing Extra from
being created on the symbol for later lookups that don't go through
TypeLinksym directly. In writeType, for non-shape types, call
TypeLinksym to ensure that the type is set up for later callers. That
change itself passed toolstash -cmp.

All up, this stack put through compilecmp shows a lot of improvement in
various reflect-using packages, and reflect itself. It is too big to fit
in the commit message but here's some info:

compilecmp master -> HEAD
master (d767064170): cmd/compile: mark abi.PtrType.Elem sym as used
HEAD (846a94c568): cmd/compile, reflect: further allow inlining of TypeFor

file      before    after     Δ       %
addr2line 3735911   3735391   -520    -0.014%
asm       6382235   6382091   -144    -0.002%
buildid   3608568   3608360   -208    -0.006%
cgo       5951816   5951480   -336    -0.006%
compile   28362080  28339772  -22308  -0.079%
cover     6668686   6661414   -7272   -0.109%
dist      4311961   4311425   -536    -0.012%
fix       3771706   3771474   -232    -0.006%
link      8686073   8684993   -1080   -0.012%
nm        3715923   3715459   -464    -0.012%
objdump   6074366   6073774   -592    -0.010%
pack      3025653   3025277   -376    -0.012%
pprof     18269485  18261653  -7832   -0.043%
test2json 3442726   3438390   -4336   -0.126%
trace     16984831  16981767  -3064   -0.018%
vet       10701931  10696355  -5576   -0.052%
total     133693951 133639075 -54876  -0.041%

runtime
runtime.stkobjinit 240 -> 165  (-31.25%)

runtime [cmd/compile]
runtime.stkobjinit 240 -> 165  (-31.25%)

reflect
reflect.Value.Seq2.func3 309 -> 245  (-20.71%)
reflect.Value.Seq2.func1.1 281 -> 198  (-29.54%)
reflect.Value.Seq.func1.1 242 -> 165  (-31.82%)
reflect.Value.Seq2.func2 360 -> 285  (-20.83%)
reflect.Value.Seq.func4 281 -> 239  (-14.95%)
reflect.Value.Seq2.func4 399 -> 284  (-28.82%)
reflect.Value.Seq.func2 271 -> 230  (-15.13%)
reflect.TypeFor[go.shape.uint64] 33 -> 18  (-45.45%)
reflect.Value.Seq.func3 219 -> 178  (-18.72%)

reflect [cmd/compile]
reflect.Value.Seq2.func2 360 -> 285  (-20.83%)
reflect.Value.Seq.func4 281 -> 239  (-14.95%)
reflect.Value.Seq.func2 271 -> 230  (-15.13%)
reflect.Value.Seq.func1.1 242 -> 165  (-31.82%)
reflect.Value.Seq2.func1.1 281 -> 198  (-29.54%)
reflect.Value.Seq2.func3 309 -> 245  (-20.71%)
reflect.Value.Seq.func3 219 -> 178  (-18.72%)
reflect.TypeFor[go.shape.uint64] 33 -> 18  (-45.45%)
reflect.Value.Seq2.func4 399 -> 284  (-28.82%)

fmt
fmt.(*pp).fmtBytes 1723 -> 1691  (-1.86%)

database/sql/driver
reflect.TypeFor[go.shape.interface 33 -> 18  (-45.45%)
database/sql/driver.init 72 -> 57  (-20.83%)

Change-Id: I9eb750cf0b7ebf532589f939431feb0a899e42ff
Reviewed-on: https://go-review.googlesource.com/c/go/+/701301
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-12 09:34:43 -07:00
Keith Randall
80a2aae922 Revert "cmd/compile: improve stp merging for non-sequent cases"
This reverts commit 4c63d798cb.

Reason for revert: Causes miscompilations. See issue 75365.

Change-Id: Icd1fcfeb23d2ec524b16eb556030f43875e1c90d
Reviewed-on: https://go-review.googlesource.com/c/go/+/702455
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-09-10 11:11:11 -07:00
Youlin Feng
a5fa5ea51c cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7}
This CL slightly speeds up strings.HasPrefix when testing constant
prefixes of length {3,5,6,7}.

goos: linux
goarch: amd64
cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz
                │      old     │                 new                 │
                │    sec/op    │   sec/op     vs base                │
StringPrefix3-8   11.125n ± 2%   8.539n ± 1%  -23.25% (p=0.000 n=20)
StringPrefix5-8   11.170n ± 2%   8.700n ± 1%  -22.11% (p=0.000 n=20)
StringPrefix6-8   11.190n ± 2%   8.655n ± 1%  -22.65% (p=0.000 n=20)
StringPrefix7-8   11.095n ± 1%   8.878n ± 1%  -19.98% (p=0.000 n=20)

Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2
Reviewed-on: https://go-review.googlesource.com/c/go/+/700816
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-09 12:10:07 -07:00
Melnikov Denis
4c63d798cb cmd/compile: improve stp merging for non-sequent cases
Original algorithm merges stores with the first
mergeable store in the chain, but it misses some
cases. Additional reordering stores in increasing order
of memory access in the chain allows merging in these cases.

Fixes #71987

There are the results of sweet benchmarks and
the difference between sizes of sections .text

                        │ old.results │            new.results             │
                        │   sec/op    │   sec/op     vs base               │
BleveIndexBatch100-4      7.614 ± 2%    7.548 ± 1%       ~ (p=0.190 n=10)
ESBuildThreeJS-4         821.3m ± 0%   819.0m ± 1%       ~ (p=0.165 n=10)
ESBuildRomeTS-4          206.2m ± 1%   204.4m ± 1%  -0.90% (p=0.023 n=10)
EtcdPut-4                64.89m ± 1%   64.94m ± 2%       ~ (p=0.684 n=10)
EtcdSTM-4                318.4m ± 0%   319.2m ± 1%       ~ (p=0.631 n=10)
GoBuildKubelet-4          157.4 ± 0%    157.6 ± 0%       ~ (p=0.105 n=10)
GoBuildKubeletLink-4      12.42 ± 2%    12.41 ± 1%       ~ (p=0.529 n=10)
GoBuildIstioctl-4         124.4 ± 0%    124.4 ± 0%       ~ (p=0.579 n=10)
GoBuildIstioctlLink-4     8.700 ± 1%    8.693 ± 1%       ~ (p=0.912 n=10)
GoBuildFrontend-4         46.52 ± 0%    46.50 ± 0%       ~ (p=0.971 n=10)
GoBuildFrontendLink-4     2.282 ± 1%    2.272 ± 1%       ~ (p=0.529 n=10)
GoBuildTsgo-4             75.02 ± 1%    75.31 ± 1%       ~ (p=0.436 n=10)
GoBuildTsgoLink-4         1.229 ± 1%    1.219 ± 1%  -0.82% (p=0.035 n=10)
GopherLuaKNucleotide-4    34.77 ± 5%    34.31 ± 1%  -1.33% (p=0.015 n=10)
MarkdownRenderXHTML-4    286.6m ± 0%   285.7m ± 1%       ~ (p=0.315 n=10)
Tile38QueryLoad-4        657.2µ ± 1%   660.3µ ± 0%       ~ (p=0.436 n=10)
geomean                   2.570         2.563       -0.24%

Executable            Old .text  New .text     Change
-------------------------------------------------------
benchmark               6504820    6504020     -0.01%
bleve-index-bench       3903860    3903636     -0.01%
esbuild                 4801012    4801172     +0.00%
esbuild-bench           1256404    1256340     -0.01%
etcd                    9188148    9187076     -0.01%
etcd-bench              6462228    6461524     -0.01%
go                      5924468    5923892     -0.01%
go-build-bench          1282004    1281940     -0.00%
gopher-lua-bench        1639540    1639348     -0.01%
markdown-bench          1478452    1478356     -0.01%
tile38-bench            2753524    2753300     -0.01%
tile38-server          10241380   10240068     -0.01%

Change-Id: Ieb4fdfd656aca458f65fc45938de70550632bd13
Reviewed-on: https://go-review.googlesource.com/c/go/+/698097
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <markfreeman@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-09-09 12:10:01 -07:00
Xiaolin Zhao
f5b20689e9 cmd/compile: optimize loads from readonly globals into constants on loong64
Ref: CL 141118
Update #26498

Change-Id: I9c4ad2bedc4d50bd273bbe9119a898d4fca95e45
Reviewed-on: https://go-review.googlesource.com/c/go/+/700875
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-09-05 08:42:28 -07:00
Xiaolin Zhao
3492e4262b cmd/compile: simplify specific addition operations using the ADDV16 instruction
On loong64, the addi.d instruction can only directly handle 12-bit
immediate numbers. If a larger immediate number needs to be processed,
it must first be placed in a register, and then the add.d instruction
is used to complete the processing of the larger immediate number.
If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0,
then the ADDV16 instruction can be used to complete the addition operation.

Removes 164 instructions from the go binary on loong64.

Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a
Reviewed-on: https://go-review.googlesource.com/c/go/+/700536
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
2025-09-05 08:18:04 -07:00
Youlin Feng
df29038486 cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem
This CL makes the generated code for reflect.TypeFor as simple as an
intrinsic function.

Fixes #75203

Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/700336
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-09-04 07:25:26 -07:00
limeidan
bd71b94659 cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64
Reduce the number of go toolchain instructions on loong64 as follows:

	file	    before	after	    Δ 	     %
	go	    1573148	1571708	   -1,440  -0.0915%
	gofmt	    320578	320090	   -488    -0.1522%
	asm	    555066	554406	   -660    -0.1189%
	cgo	    481566	480926	   -640    -0.1329%
	compile	    2475962	2473880	   -2,082  -0.0841%
	cover	    516536	515920	   -616    -0.1193%
	link	    702172	701404	   -768    -0.1094%
	preprofile  238626	238274	   -352    -0.1475%
	vet	    792928	792100	   -828    -0.1044%

Change-Id: I61e462726835959c60e1b4e5256d4020202418ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/693877
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-08-25 12:30:16 -07:00
Xiaolin Zhao
44c5956bf7 test/codegen: add Mul2 and DivPow2 test for loong64
Change-Id: I29ccd105c5418955146a3f4873162963da489a70
Reviewed-on: https://go-review.googlesource.com/c/go/+/697935
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-24 18:14:28 -07:00
Xiaolin Zhao
0aa8019e94 test/codegen: add Mul* test for loong64
Change-Id: Ica285212e4884a96fe9738b53cdc789b223bf2e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/697895
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2025-08-24 18:14:22 -07:00
Xiaolin Zhao
83420974b7 test/codegen: add sqrt* abs and copysign test for loong64
Change-Id: I645396fc4b00242f36a06f01550906805c0c1f73
Reviewed-on: https://go-review.googlesource.com/c/go/+/697955
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-24 18:14:13 -07:00
limeidan
1843f1e9c0 cmd/compile: use zero register instead of specialized *zero instructions on loong64
Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this.

Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4
Reviewed-on: https://go-review.googlesource.com/c/go/+/693876
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
2025-08-21 11:23:05 -07:00
Xiaolin Zhao
9632ba8160 cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64
Pattern1: (the type of c is uint16)
    c>>8 | c<<8
To:
    revb2h c

Pattern2: (the type of c is uint32)
    (c & 0xff00ff00)>>8 | (c & 0x00ff00ff)<<8
To:
    revb2h c

Pattern3: (the type of c is uint64)
    (c & 0xff00ff00ff00ff00)>>8 | (c & 0x00ff00ff00ff00ff)<<8
To:
    revb4h c

Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda
Reviewed-on: https://go-review.googlesource.com/c/go/+/696335
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-21 11:19:34 -07:00
Xiaolin Zhao
fa706ea50f cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64
Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38
Reviewed-on: https://go-review.googlesource.com/c/go/+/697956
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2025-08-21 11:16:49 -07:00
Michael Munday
320df537cc cmd/compile: emit classify instructions for infinity tests on riscv64
The 'classify' instruction on RISC-V sets a bit in a mask to indicate
the class a floating point value belongs to (e.g. whether the value is
an infinity, a normal number, a subnormal number and so on). There are
other places this instruction is useful but for now I've just used it
for infinity tests.

The gains are relatively small (~1-2 instructions per IsInf call) but
using FCLASSD does potentially unlock further optimizations. It also
reduces the number of loads from memory and the number of moves
between general purpose and floating point register files.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │        sec/op        │   sec/op     vs base                │
Acos                           159.9n ± 0%   173.7n ± 0%   +8.66% (p=0.000 n=10)
Acosh                          249.8n ± 0%   254.4n ± 0%   +1.86% (p=0.000 n=10)
Asin                           159.9n ± 0%   173.7n ± 0%   +8.66% (p=0.000 n=10)
Asinh                          292.2n ± 0%   283.0n ± 0%   -3.15% (p=0.000 n=10)
Atan                           119.1n ± 0%   119.0n ± 0%   -0.08% (p=0.036 n=10)
Atanh                          265.1n ± 0%   271.6n ± 0%   +2.43% (p=0.000 n=10)
Atan2                          194.9n ± 0%   186.7n ± 0%   -4.23% (p=0.000 n=10)
Cbrt                           216.3n ± 0%   203.1n ± 0%   -6.10% (p=0.000 n=10)
Ceil                           31.82n ± 0%   31.81n ± 0%        ~ (p=0.063 n=10)
Copysign                       4.897n ± 0%   4.893n ± 3%   -0.08% (p=0.038 n=10)
Cos                            123.9n ± 0%   107.7n ± 1%  -13.03% (p=0.000 n=10)
Cosh                           293.0n ± 0%   264.6n ± 0%   -9.68% (p=0.000 n=10)
Erf                            150.0n ± 0%   133.8n ± 0%  -10.80% (p=0.000 n=10)
Erfc                           151.8n ± 0%   137.9n ± 0%   -9.16% (p=0.000 n=10)
Erfinv                         173.8n ± 0%   173.8n ± 0%        ~ (p=0.820 n=10)
Erfcinv                        173.8n ± 0%   173.8n ± 0%        ~ (p=1.000 n=10)
Exp                            247.7n ± 0%   220.4n ± 0%  -11.04% (p=0.000 n=10)
ExpGo                          261.4n ± 0%   232.5n ± 0%  -11.04% (p=0.000 n=10)
Expm1                          176.2n ± 0%   164.9n ± 0%   -6.41% (p=0.000 n=10)
Exp2                           220.4n ± 0%   190.2n ± 0%  -13.70% (p=0.000 n=10)
Exp2Go                         232.5n ± 0%   204.0n ± 0%  -12.22% (p=0.000 n=10)
Abs                            4.897n ± 0%   4.897n ± 0%        ~ (p=0.726 n=10)
Dim                            16.32n ± 0%   16.31n ± 0%        ~ (p=0.770 n=10)
Floor                          31.84n ± 0%   31.83n ± 0%        ~ (p=0.677 n=10)
Max                            26.11n ± 0%   26.13n ± 0%        ~ (p=0.290 n=10)
Min                            26.10n ± 0%   26.11n ± 0%        ~ (p=0.424 n=10)
Mod                            416.2n ± 0%   337.8n ± 0%  -18.83% (p=0.000 n=10)
Frexp                          63.65n ± 0%   50.60n ± 0%  -20.50% (p=0.000 n=10)
Gamma                          218.8n ± 0%   206.4n ± 0%   -5.62% (p=0.000 n=10)
Hypot                          92.20n ± 0%   94.69n ± 0%   +2.70% (p=0.000 n=10)
HypotGo                        107.7n ± 0%   109.3n ± 0%   +1.49% (p=0.000 n=10)
Ilogb                          59.54n ± 0%   44.04n ± 0%  -26.04% (p=0.000 n=10)
J0                             708.9n ± 0%   674.5n ± 0%   -4.86% (p=0.000 n=10)
J1                             707.6n ± 0%   676.1n ± 0%   -4.44% (p=0.000 n=10)
Jn                             1.513µ ± 0%   1.427µ ± 0%   -5.68% (p=0.000 n=10)
Ldexp                          70.20n ± 0%   57.09n ± 0%  -18.68% (p=0.000 n=10)
Lgamma                         201.5n ± 0%   185.3n ± 1%   -8.01% (p=0.000 n=10)
Log                            201.5n ± 0%   182.7n ± 0%   -9.35% (p=0.000 n=10)
Logb                           59.54n ± 0%   46.53n ± 0%  -21.86% (p=0.000 n=10)
Log1p                          178.8n ± 0%   173.9n ± 6%   -2.74% (p=0.021 n=10)
Log10                          201.4n ± 0%   184.3n ± 0%   -8.49% (p=0.000 n=10)
Log2                           79.17n ± 0%   66.07n ± 0%  -16.54% (p=0.000 n=10)
Modf                           34.27n ± 0%   34.25n ± 0%        ~ (p=0.559 n=10)
Nextafter32                    49.34n ± 0%   49.37n ± 0%   +0.05% (p=0.040 n=10)
Nextafter64                    43.66n ± 0%   43.66n ± 0%        ~ (p=0.869 n=10)
PowInt                         309.1n ± 0%   267.4n ± 0%  -13.49% (p=0.000 n=10)
PowFrac                        769.6n ± 0%   677.3n ± 0%  -11.98% (p=0.000 n=10)
Pow10Pos                       13.88n ± 0%   13.88n ± 0%        ~ (p=0.811 n=10)
Pow10Neg                       19.58n ± 0%   19.57n ± 0%        ~ (p=0.993 n=10)
Round                          23.65n ± 0%   23.66n ± 0%        ~ (p=0.354 n=10)
RoundToEven                    27.75n ± 0%   27.75n ± 0%        ~ (p=0.971 n=10)
Remainder                      380.0n ± 0%   309.9n ± 0%  -18.45% (p=0.000 n=10)
Signbit                        13.06n ± 0%   13.06n ± 0%        ~ (p=1.000 n=10)
Sin                            133.8n ± 0%   120.8n ± 0%   -9.75% (p=0.000 n=10)
Sincos                         160.7n ± 0%   147.7n ± 0%   -8.12% (p=0.000 n=10)
Sinh                           305.9n ± 0%   277.9n ± 0%   -9.17% (p=0.000 n=10)
SqrtIndirect                   3.265n ± 0%   3.264n ± 0%        ~ (p=0.546 n=10)
SqrtLatency                    19.58n ± 0%   19.58n ± 0%        ~ (p=0.973 n=10)
SqrtIndirectLatency            19.59n ± 0%   19.58n ± 0%        ~ (p=0.370 n=10)
SqrtGoLatency                  205.7n ± 0%   202.7n ± 0%   -1.46% (p=0.000 n=10)
SqrtPrime                      4.953µ ± 0%   4.954µ ± 0%        ~ (p=0.477 n=10)
Tan                            163.2n ± 0%   150.2n ± 0%   -7.99% (p=0.000 n=10)
Tanh                           312.4n ± 0%   284.2n ± 0%   -9.01% (p=0.000 n=10)
Trunc                          31.83n ± 0%   31.83n ± 0%        ~ (p=0.663 n=10)
Y0                             701.0n ± 0%   669.2n ± 0%   -4.54% (p=0.000 n=10)
Y1                             704.5n ± 0%   672.4n ± 0%   -4.55% (p=0.000 n=10)
Yn                             1.490µ ± 0%   1.422µ ± 0%   -4.60% (p=0.000 n=10)
Float64bits                    5.713n ± 0%   5.710n ± 0%        ~ (p=0.926 n=10)
Float64frombits                4.896n ± 0%   4.896n ± 0%        ~ (p=0.663 n=10)
Float32bits                    12.25n ± 0%   12.25n ± 0%        ~ (p=0.571 n=10)
Float32frombits                4.898n ± 0%   4.896n ± 0%        ~ (p=0.754 n=10)
FMA                            4.895n ± 0%   4.895n ± 0%        ~ (p=0.745 n=10)
geomean                        94.40n        89.43n        -5.27%

Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e
Reviewed-on: https://go-review.googlesource.com/c/go/+/348389
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-13 20:33:56 -07:00
limeidan
90b7d7aaa2 cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                  │     old      │                 new                  │
                  │    sec/op    │    sec/op     vs base                │
MulconstI32/3       0.8004n ± 0%   0.4247n ± 2%  -46.94% (p=0.000 n=10)
MulconstI32/5       0.8005n ± 0%   0.4256n ± 1%  -46.83% (p=0.000 n=10)
MulconstI32/12      1.2010n ± 0%   0.8005n ± 0%  -33.35% (p=0.000 n=10)
MulconstI32/120     0.8090n ± 0%   0.8067n ± 0%   -0.28% (p=0.007 n=10)
MulconstI32/-120    0.8109n ± 0%   0.8072n ± 0%   -0.47% (p=0.000 n=10)
MulconstI32/65537   0.8004n ± 0%   0.8004n ± 0%        ~ (p=1.000 n=10)
MulconstI32/65538   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.265 n=10)
MulconstI64/3       0.8005n ± 0%   0.4241n ± 1%  -47.02% (p=0.000 n=10)
MulconstI64/5       0.8004n ± 0%   0.4249n ± 1%  -46.91% (p=0.000 n=10)
MulconstI64/12      1.2010n ± 0%   0.8004n ± 0%  -33.36% (p=0.000 n=10)
MulconstI64/120     0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.635 n=10)
MulconstI64/-120    0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.837 n=10)
MulconstI64/65537   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.837 n=10)
MulconstI64/65538   0.8096n ± 0%   0.8004n ± 0%   -1.14% (p=0.000 n=10)
MulconstU32/3       0.8004n ± 0%   0.4263n ± 1%  -46.75% (p=0.000 n=10)
MulconstU32/5       0.8005n ± 0%   0.4262n ± 1%  -46.76% (p=0.000 n=10)
MulconstU32/12      1.2010n ± 0%   0.8005n ± 0%  -33.35% (p=0.000 n=10)
MulconstU32/120     0.8105n ± 0%   0.8096n ± 0%        ~ (p=0.183 n=10)
MulconstU32/65537   0.8004n ± 0%   0.8004n ± 0%        ~ (p=1.000 n=10)
MulconstU32/65538   0.8005n ± 0%   0.8005n ± 0%        ~ (p=1.000 n=10)
MulconstU64/3       0.8004n ± 0%   0.4265n ± 4%  -46.71% (p=0.000 n=10)
MulconstU64/5       0.8004n ± 0%   0.4256n ± 0%  -46.82% (p=0.000 n=10)
MulconstU64/12      1.2010n ± 0%   0.8004n ± 0%  -33.36% (p=0.000 n=10)
MulconstU64/120     0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.387 n=10)
MulconstU64/65537   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.265 n=10)
MulconstU64/65538   0.8080n ± 0%   0.8004n ± 0%   -0.93% (p=0.000 n=10)
geomean             0.8539n        0.6597n       -22.74%

Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-12 23:01:49 -07:00
Keith Randall
f04421ea9a cmd/compile: soften test for 74788
We now (as of CL 678620) use float registers other than X0 for copying.

Change-Id: Ifdecd5df7519663742eed0f292c98453754d4b25
Reviewed-on: https://go-review.googlesource.com/c/go/+/695275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
2025-08-12 10:05:55 -07:00
Michael Munday
084c0f8494 cmd/compile: allow InlMark operations to be speculatively executed
Although InlMark takes a memory argument it ultimately becomes a
NOP and therefore is safe to speculatively execute.

Fixes #74915

Change-Id: I64317dd433e300ac28de2bcf201845083ec2ac82
Reviewed-on: https://go-review.googlesource.com/c/go/+/693795
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-11 00:52:23 -07:00
Xiaolin Zhao
a552737418 cmd/compile: fold negation into multiplication on loong64
This change also add corresponding benchmark tests and codegen tests.
The performance improvement on CPU Loongson-3A6000-HV is as follows:

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
        |  bench.old   |              bench.new              |
        |    sec/op    |   sec/op     vs base                |
MulNeg     828.4n ± 0%   655.9n ± 0%  -20.82% (p=0.000 n=10)
Mul2Neg   1062.0n ± 0%   826.8n ± 0%  -22.15% (p=0.000 n=10)
geomean    938.0n        736.4n       -21.49%

Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15
Reviewed-on: https://go-review.googlesource.com/c/go/+/682535
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-08-05 18:02:06 -07:00
Michael Munday
fcc036f03b cmd/compile: optimise float <-> int register moves on riscv64
Use the FMV* instructions to move values between the floating point and
integer register files.

Note: I'm unsure why there is a slowdown in the Float32bits benchmark,
I've checked and an FMVXS instruction is being used as expected. There
are multiple loads and other instructions in the main loop.

goos: linux
goarch: riscv64
pkg: math
cpu: Spacemit(R) X60
                    │ fmv-before.txt │            fmv-after.txt            │
                    │     sec/op     │   sec/op     vs base                │
Acos                     122.7n ± 0%   122.7n ± 0%        ~ (p=1.000 n=10)
Acosh                    197.2n ± 0%   191.5n ± 0%   -2.89% (p=0.000 n=10)
Asin                     122.7n ± 0%   122.7n ± 0%        ~ (p=0.474 n=10)
Asinh                    231.0n ± 0%   224.1n ± 0%   -2.99% (p=0.000 n=10)
Atan                     91.39n ± 0%   91.41n ± 0%        ~ (p=0.465 n=10)
Atanh                    210.3n ± 0%   203.4n ± 0%   -3.26% (p=0.000 n=10)
Atan2                    149.6n ± 0%   149.6n ± 0%        ~ (p=0.721 n=10)
Cbrt                     176.5n ± 0%   165.9n ± 0%   -6.01% (p=0.000 n=10)
Ceil                     25.67n ± 0%   24.42n ± 0%   -4.87% (p=0.000 n=10)
Copysign                 3.756n ± 0%   3.756n ± 0%        ~ (p=0.149 n=10)
Cos                      95.15n ± 0%   95.15n ± 0%        ~ (p=0.374 n=10)
Cosh                     228.6n ± 0%   224.7n ± 0%   -1.71% (p=0.000 n=10)
Erf                      115.2n ± 0%   115.2n ± 0%        ~ (p=0.474 n=10)
Erfc                     116.4n ± 0%   116.4n ± 0%        ~ (p=0.628 n=10)
Erfinv                   133.3n ± 0%   133.3n ± 0%        ~ (p=1.000 n=10)
Erfcinv                  133.3n ± 0%   133.3n ± 0%        ~ (p=1.000 n=10)
Exp                      194.1n ± 0%   190.3n ± 0%   -1.93% (p=0.000 n=10)
ExpGo                    204.7n ± 0%   200.3n ± 0%   -2.15% (p=0.000 n=10)
Expm1                    137.7n ± 0%   135.2n ± 0%   -1.82% (p=0.000 n=10)
Exp2                     173.4n ± 0%   169.0n ± 0%   -2.54% (p=0.000 n=10)
Exp2Go                   182.8n ± 0%   178.4n ± 0%   -2.41% (p=0.000 n=10)
Abs                      3.756n ± 0%   3.756n ± 0%        ~ (p=0.157 n=10)
Dim                      12.52n ± 0%   12.52n ± 0%        ~ (p=0.737 n=10)
Floor                    25.67n ± 0%   24.42n ± 0%   -4.87% (p=0.000 n=10)
Max                      21.29n ± 0%   20.03n ± 0%   -5.92% (p=0.000 n=10)
Min                      21.28n ± 0%   20.04n ± 0%   -5.85% (p=0.000 n=10)
Mod                      344.9n ± 0%   319.2n ± 0%   -7.45% (p=0.000 n=10)
Frexp                    55.71n ± 0%   48.85n ± 0%  -12.30% (p=0.000 n=10)
Gamma                    165.9n ± 0%   167.8n ± 0%   +1.15% (p=0.000 n=10)
Hypot                    73.24n ± 0%   70.74n ± 0%   -3.41% (p=0.000 n=10)
HypotGo                  84.50n ± 0%   82.63n ± 0%   -2.21% (p=0.000 n=10)
Ilogb                    49.45n ± 0%   45.70n ± 0%   -7.59% (p=0.000 n=10)
J0                       556.5n ± 0%   544.0n ± 0%   -2.25% (p=0.000 n=10)
J1                       555.3n ± 0%   542.8n ± 0%   -2.24% (p=0.000 n=10)
Jn                       1.181µ ± 0%   1.156µ ± 0%   -2.12% (p=0.000 n=10)
Ldexp                    59.47n ± 0%   53.84n ± 0%   -9.47% (p=0.000 n=10)
Lgamma                   167.2n ± 0%   154.6n ± 0%   -7.51% (p=0.000 n=10)
Log                      160.9n ± 0%   154.6n ± 0%   -3.92% (p=0.000 n=10)
Logb                     49.45n ± 0%   45.70n ± 0%   -7.58% (p=0.000 n=10)
Log1p                    147.1n ± 0%   137.1n ± 0%   -6.80% (p=0.000 n=10)
Log10                    162.1n ± 1%   154.6n ± 0%   -4.63% (p=0.000 n=10)
Log2                     66.99n ± 0%   60.72n ± 0%   -9.36% (p=0.000 n=10)
Modf                     29.42n ± 0%   26.29n ± 0%  -10.64% (p=0.000 n=10)
Nextafter32              41.95n ± 0%   37.88n ± 0%   -9.70% (p=0.000 n=10)
Nextafter64              38.82n ± 0%   33.49n ± 0%  -13.73% (p=0.000 n=10)
PowInt                   252.3n ± 0%   237.3n ± 0%   -5.95% (p=0.000 n=10)
PowFrac                  615.5n ± 0%   589.7n ± 0%   -4.19% (p=0.000 n=10)
Pow10Pos                 10.64n ± 0%   10.64n ± 0%        ~ (p=1.000 n=10)
Pow10Neg                 24.42n ± 0%   15.02n ± 0%  -38.49% (p=0.000 n=10)
Round                    21.91n ± 0%   18.16n ± 0%  -17.12% (p=0.000 n=10)
RoundToEven              24.42n ± 0%   21.29n ± 0%  -12.84% (p=0.000 n=10)
Remainder                308.0n ± 0%   291.2n ± 0%   -5.44% (p=0.000 n=10)
Signbit                  10.02n ± 0%   10.02n ± 0%        ~ (p=1.000 n=10)
Sin                      102.7n ± 0%   102.7n ± 0%        ~ (p=0.211 n=10)
Sincos                   124.0n ± 1%   123.3n ± 0%   -0.56% (p=0.002 n=10)
Sinh                     239.1n ± 0%   234.7n ± 0%   -1.84% (p=0.000 n=10)
SqrtIndirect             2.504n ± 0%   2.504n ± 0%        ~ (p=0.303 n=10)
SqrtLatency              15.03n ± 0%   15.02n ± 0%        ~ (p=0.598 n=10)
SqrtIndirectLatency      15.02n ± 0%   15.02n ± 0%        ~ (p=0.907 n=10)
SqrtGoLatency            165.3n ± 0%   157.2n ± 0%   -4.90% (p=0.000 n=10)
SqrtPrime                3.801µ ± 0%   3.802µ ± 0%        ~ (p=1.000 n=10)
Tan                      125.2n ± 0%   125.2n ± 0%        ~ (p=0.458 n=10)
Tanh                     244.2n ± 0%   239.9n ± 0%   -1.76% (p=0.000 n=10)
Trunc                    25.67n ± 0%   24.42n ± 0%   -4.87% (p=0.000 n=10)
Y0                       550.2n ± 0%   538.1n ± 0%   -2.21% (p=0.000 n=10)
Y1                       552.8n ± 0%   540.6n ± 0%   -2.21% (p=0.000 n=10)
Yn                       1.168µ ± 0%   1.143µ ± 0%   -2.14% (p=0.000 n=10)
Float64bits              8.139n ± 0%   4.385n ± 0%  -46.13% (p=0.000 n=10)
Float64frombits          7.512n ± 0%   3.759n ± 0%  -49.96% (p=0.000 n=10)
Float32bits              8.138n ± 0%   9.393n ± 0%  +15.42% (p=0.000 n=10)
Float32frombits          7.513n ± 0%   3.757n ± 0%  -49.98% (p=0.000 n=10)
FMA                      3.756n ± 0%   3.756n ± 0%        ~ (p=0.246 n=10)
geomean                  77.43n        72.42n        -6.47%

Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667
Reviewed-on: https://go-review.googlesource.com/c/go/+/599235
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Michael Munday <mikemndy@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-08-05 08:27:15 -07:00
Xiaolin Zhao
e071617222 cmd/compile: optimize multiplication rules on loong64
Improve multiplication strength reduction, refer to CL 626998,
add additional 3 linear combination instructions for loong64.

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                  |  bench.old   |              bench.new               |
                  |    sec/op    |    sec/op     vs base                |
MulconstI32/3       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI32/5       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI32/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstI32/120     1.6010n ± 0%   0.8130n ± 0%  -49.22% (p=0.000 n=10)
MulconstI32/-120    1.6010n ± 0%   0.8109n ± 0%  -49.35% (p=0.000 n=10)
MulconstI32/65537   1.6275n ± 0%   0.8005n ± 0%  -50.81% (p=0.000 n=10)
MulconstI32/65538   1.6290n ± 0%   0.8004n ± 0%  -50.87% (p=0.000 n=10)
MulconstI64/3       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstI64/5       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstI64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstI64/120     1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/-120    1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/65537   1.6270n ± 0%   0.8005n ± 0%  -50.80% (p=0.000 n=10)
MulconstI64/65538   1.6290n ± 0%   0.8071n ± 1%  -50.45% (p=0.000 n=10)
MulconstU32/3       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstU32/5       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstU32/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU32/120     1.6010n ± 0%   0.8066n ± 0%  -49.62% (p=0.000 n=10)
MulconstU32/65537   1.6290n ± 0%   0.8005n ± 0%  -50.86% (p=0.000 n=10)
MulconstU32/65538   1.6280n ± 0%   0.8005n ± 0%  -50.83% (p=0.000 n=10)
MulconstU64/3       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/5       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU64/120     1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/65537   1.6290n ± 0%   0.8005n ± 0%  -50.86% (p=0.000 n=10)
MulconstU64/65538   1.6300n ± 0%   0.8067n ± 0%  -50.51% (p=0.000 n=10)
geomean              1.609n        0.8537n       -46.95%

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A5000 @ 2500.00MHz
                  |  bench.old   |              bench.new               |
                  |    sec/op    |    sec/op     vs base                |
MulconstI32/3       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/5       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/12       1.601n ± 0%    1.202n ± 0%  -24.92% (p=0.000 n=10)
MulconstI32/120     1.6020n ± 0%   0.8012n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/-120    1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/65537   1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI32/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI64/3       1.6015n ± 0%   0.8007n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/5       1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/12       1.602n ± 0%    1.202n ± 0%  -25.00% (p=0.000 n=10)
MulconstI64/120     1.6030n ± 0%   0.8011n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/-120    1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI64/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/3       1.6010n ± 0%   0.8006n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/5       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/12       1.601n ± 0%    1.202n ± 0%  -24.92% (p=0.000 n=10)
MulconstU32/120     1.6010n ± 0%   0.8006n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/65538   1.6020n ± 0%   0.8009n ± 0%  -50.01% (p=0.000 n=10)
MulconstU64/3       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU64/5       1.6010n ± 0%   0.8007n ± 0%  -49.98% (p=0.000 n=10)
MulconstU64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU64/120     1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstU64/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU64/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
geomean              1.601n        0.8523n       -46.77%

Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875
Reviewed-on: https://go-review.googlesource.com/c/go/+/675455
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-08-01 08:42:40 -07:00
Keith Randall
eb7f515c4d cmd/compile: use generated loops instead of DUFFZERO on amd64
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
                        │     base      │                 exp                 │
                        │    sec/op     │   sec/op     vs base                │
MemclrKnownSize112-20      1.270n ± 14%   1.006n ± 0%  -20.72% (p=0.000 n=10)
MemclrKnownSize128-20      1.266n ±  0%   1.005n ± 0%  -20.58% (p=0.000 n=10)
MemclrKnownSize192-20      1.771n ±  0%   1.579n ± 1%  -10.84% (p=0.000 n=10)
MemclrKnownSize248-20      4.034n ±  0%   3.520n ± 0%  -12.75% (p=0.000 n=10)
MemclrKnownSize256-20      2.269n ±  0%   2.014n ± 0%  -11.26% (p=0.000 n=10)
MemclrKnownSize512-20      4.280n ±  0%   4.030n ± 0%   -5.84% (p=0.000 n=10)
MemclrKnownSize1024-20     8.309n ±  1%   8.057n ± 0%   -3.03% (p=0.000 n=10)

Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765
Reviewed-on: https://go-review.googlesource.com/c/go/+/678937
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-31 17:12:39 -07:00
Michael Munday
cedf63616a cmd/compile: add floating point min/max intrinsics on s390x
Add the VECTOR FP (MINIMUM|MAXIMUM) instructions to the assembler and
use them in the compiler to implement min and max.

Note: I've allowed floating point registers to be used with the single
element instructions (those with the W instead of V prefix) to allow
easier integration into the compiler.

Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6
Reviewed-on: https://go-review.googlesource.com/c/go/+/684715
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-07-30 12:29:15 -07:00
Youlin Feng
cc571dab91 cmd/compile: deduplicate instructions when rewrite func results
After CL 628075, do not rely on the memory arg of an OpLocalAddr.

Fixes #74788

Change-Id: I4e893241e3949bb8f2d93c8b88cc102e155b725d
Reviewed-on: https://go-review.googlesource.com/c/go/+/691275
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Reviewed-by: Mark Freeman <mark@golang.org>
2025-07-30 09:38:10 -07:00
Cuong Manh Le
bd94ae8903 cmd/compile: use unsigned power-of-two detector for unsigned mod
Same as CL 689815, but for modulus instead of division.

Updates #74485

Change-Id: I73000231c886a987a1093669ff207fd9117a8160
Reviewed-on: https://go-review.googlesource.com/c/go/+/689895
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-29 16:22:40 -07:00
Cuong Manh Le
f3582fc80e cmd/compile: add unsigned power-of-two detector
Fixes #74485

Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/689815
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-29 16:22:37 -07:00
Michael Munday
46b5839231 test/codegen: fix failing condmove wasm tests
These recently added tests failed when using the -all_codgen flag.

Fixes #74770

Change-Id: Idea1ea02af2bd9f45c7d0a28d633c7442328e6df
Reviewed-on: https://go-review.googlesource.com/c/go/+/690715
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
Run-TryBot: Michael Munday <mikemndy@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Mark Freeman <mark@golang.org>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
TryBot-Bypass: Michael Knyszek <mknyszek@google.com>
2025-07-28 11:01:53 -07:00
Jorropo
ce05ad448f cmd/compile: rewrite condselects into doublings and halvings
For performance see CL 685676.

This allows something like:
  if y { x *= 2 }

To be compiled to:
  SHLXQ BX, AX, AX

Instead of:
  MOVQ    AX, CX
  SHLQ    $1, CX
  MOVBLZX BL, DX
  TESTQ   DX, DX
  CMOVQNE CX, AX

While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings.

Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444
Reviewed-on: https://go-review.googlesource.com/c/go/+/685695
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-07-24 14:42:15 -07:00
Jorropo
fcd28070fe cmd/compile: add opt branchelim to rewrite some CondSelect into math
This allows something like:
  if y { x++ }

To be compiled to:
  MOVBLZX BX, CX
  ADDQ CX, AX

Instead of:
  LEAQ    1(AX), CX
  MOVBLZX BL, DX
  TESTQ   DX, DX
  CMOVQNE CX, AX

While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions.

See benchmark here: https://go.dev/play/p/DJf5COjwhd_s

Either it's a performance no-op or it is faster:

  goos: linux
  goarch: amd64
  cpu: AMD Ryzen 5 3600 6-Core Processor
                                          │ /tmp/old.logs │            /tmp/new.logs             │
                                          │    sec/op     │    sec/op     vs base                │
  CmovInlineConditionAddLatency-12           0.5443n ± 5%   0.5339n ± 3%   -1.90% (p=0.004 n=10)
  CmovInlineConditionAddThroughputBy6-12      1.492n ± 1%    1.494n ± 1%        ~ (p=0.955 n=10)
  CmovInlineConditionSubLatency-12           0.5419n ± 3%   0.5282n ± 3%   -2.52% (p=0.019 n=10)
  CmovInlineConditionSubThroughputBy6-12      1.587n ± 1%    1.584n ± 2%        ~ (p=0.492 n=10)
  CmovOutlineConditionAddLatency-12          0.5223n ± 1%   0.2639n ± 4%  -49.47% (p=0.000 n=10)
  CmovOutlineConditionAddThroughputBy6-12     1.159n ± 1%    1.097n ± 2%   -5.35% (p=0.000 n=10)
  CmovOutlineConditionSubLatency-12          0.5271n ± 3%   0.2654n ± 2%  -49.66% (p=0.000 n=10)
  CmovOutlineConditionSubThroughputBy6-12     1.053n ± 1%    1.050n ± 1%        ~ (p=1.000 n=10)
  geomean

There are other benefits not tested by this benchmark:
- the math form is usually a couple bytes shorter (ICACHE)
- the math form is usually 0~2 uops shorter (UCACHE)
- the math form has usually less register pressure*
- the math form can sometimes be optimized further

*regalloc rarely find how it can use less registers

As far as pass ordering goes there are many possible options,
I've decided to reorder branchelim before late opt since:
- unlike running exclusively the CondSelect rules after branchelim,
  some extra optimizations might trigger on the adds or subs.
- I don't want to maintain a second generic.rules file of only the stuff,
  that can trigger after branchelim.
- rerunning all of opt a third time increase compilation time for little gains.

By elimination moving branchelim seems fine.

Change-Id: I869adf57e4d109948ee157cfc47144445146bafd
Reviewed-on: https://go-review.googlesource.com/c/go/+/685676
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24 14:42:10 -07:00
Alexander Musman
bd80f74bc1 cmd/compile: fold shift through AND for slice operations
Fold a shift through AND when the AND gets a zero-or-one operand (e.g.
from arithmetic shift by 63 of a 64-bit value) for a common case with
slice operations:

    ASR     $63, R2, R2
    AND     R3<<3, R2, R2
    ADD     R2, R0, R2

As the operands are 64-bit, we can transform it to:

    AND     R2->63, R3, R2
    ADD     R2<<3, R0, R2

Code size improvement:
compile: .text:     9088004 ->  9086292 (-0.02%)
etcd:    .text:    10500276 -> 10498964 (-0.01%)

Change-Id: Ibcd5e67173da39b77ceff77ca67812fb8be5a7b5
Reviewed-on: https://go-review.googlesource.com/c/go/+/679895
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Mark Freeman <mark@golang.org>
Auto-Submit: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-07-24 13:47:20 -07:00
Alexander Musman
dcb479c2f9 cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisons
Optimize ARM64 code generation for slice bounds checking by recognizing
patterns where comparisons to zero involve SUB or SUBconst operations.
This change adds SSA opt rules to simplify:
 (CMPconst [0] (SUB x y)) => (CMP x y)

The optimizations apply to EQ, NE, ULE, and UGT comparisons, enabling
more efficient bounds checking for slice operations.

Code size improvement:
compile: .text:    9088004  ->  9065988 (-0.24%)
etcd:    .text:    10500276 -> 10497092 (-0.03%)
Change-Id: I467cb27674351652bcacc52b87e1f19677bd46a8
Reviewed-on: https://go-review.googlesource.com/c/go/+/679915
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-07-24 12:39:53 -07:00
Paul Murphy
ee7bfbdbcc cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add
a modified version of isPPC64WordRotateMask which valids the mask is
contiguous and fits inside a uint32.

I don't this is possible when merging SRDconst, the first check should
always reject such combines. But, be extra careful and do it there
too.

Fixes #73153

Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435
Reviewed-on: https://go-review.googlesource.com/c/go/+/679775
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-06-09 20:33:27 -07:00
Jake Bailey
27ff0f249c cmd/compile/internal/ssa: eliminate string copies for calls to unique.Make
unique.Make always copies strings passed into it, so it's safe to not
copy byte slices converted to strings either. Handle this just like map
accesses with string(b) as keys.

This CL only handles unique.Make(string(b)), not nested cases like
unique.Make([2]string{string(b1), string(b2)}); this could be done in a
followup CL but the map lookup code in walk is sufficiently different
than the call handling code that I didn't attempt it. (SSA is much
easier).

Fixes #71926

Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85
Reviewed-on: https://go-review.googlesource.com/c/go/+/672135
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 20:20:31 -07:00
thepudds
f4de2ecffb cmd/compile/internal/walk: convert composite literals to interfaces without allocating
Today, this interface conversion causes the struct literal
to be heap allocated:

    var sink any

    func example1() {
        sink = S{1, 1}
    }

For basic literals like integers that are directly used in
an interface conversion that would otherwise allocate, the compiler
is able to use read-only global storage (see #18704).

This CL extends that to struct and array literals as well by creating
read-only global storage that is able to represent for example S{1, 1},
and then using a pointer to that storage in the interface
when the interface conversion happens.

A more challenging example is:

    func example2() {
        v := S{1, 1}
        sink = v
    }

In this case, the struct literal is not directly part of the
interface conversion, but is instead assigned to a local variable.

To still avoid heap allocation in cases like this, in walk we
construct a cache that maps from expressions used in interface
conversions to earlier expressions that can be used to represent the
same value (via ir.ReassignOracle.StaticValue). This is somewhat
analogous to how we avoided heap allocation for basic literals in
CL 649077 earlier in our stack, though here we also need to do a
little more work to create the read-only global.

CL 649076 (also earlier in our stack) added most of the tests
along with debug diagnostics in convert.go to make it easier
to test this change.

See the writeup in #71359 for details.

Fixes #71359
Fixes #71323
Updates #62653
Updates #53465
Updates #8618

Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90
Reviewed-on: https://go-review.googlesource.com/c/go/+/649555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-05-21 12:23:26 -07:00
Junyang Shao
d6c29c7156 cmd/compile: fix offset calculation error in memcombine
Fixes #73812

Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196
Reviewed-on: https://go-review.googlesource.com/c/go/+/675175
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-21 12:17:08 -07:00
Xiaolin Zhao
4ce1c8e9e1 cmd/compile: add rules about ORN and ANDN
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279880    279776  -104   -0.0372%
    asm       556638    556410  -228   -0.0410%
    buildid   272272    272072  -200   -0.0735%
    cgo       481522    481318  -204   -0.0424%
    compile   2457788   2457580 -208   -0.0085%
    covdata   323384    323280  -104   -0.0322%
    cover     518450    518234  -216   -0.0417%
    dist      340790    340686  -104   -0.0305%
    distpack  282456    282252  -204   -0.0722%
    doc       789932    789688  -244   -0.0309%
    fix       324332    324228  -104   -0.0321%
    link      704622    704390  -232   -0.0329%
    nm        277132    277028  -104   -0.0375%
    objdump   507862    507758  -104   -0.0205%
    pack      221774    221674  -100   -0.0451%
    pprof     1469816   1469552 -264   -0.0180%
    test2json 254836    254732  -104   -0.0408%
    trace     1100002   1099738 -264   -0.0240%
    vet       781078    780874  -204   -0.0261%
    go        1529116   1528848 -268   -0.0175%
    gofmt     318556    318448  -108   -0.0339%
    total     13792238 13788566 -3672  -0.0266%

Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef
Reviewed-on: https://go-review.googlesource.com/c/go/+/674335
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2025-05-21 08:28:37 -07:00
Xiaolin Zhao
d37a1bdd48 cmd/compile: fix the implementation of NORconst on loong64
In the loong64 instruction set, there is no NORI instruction,
so the immediate value in NORconst need to be stored in register
and then use the three-register NOR instruction.

Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139
Reviewed-on: https://go-review.googlesource.com/c/go/+/673836
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 20:24:09 -07:00
Junyang Shao
113b25774e cmd/compile: memcombine different size stores
This CL implements the TODO in combineStores to allow combining
stores of different sizes, as long as the total size aligns to
2, 4, 8.

Fixes #72832.

Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/661855
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Junyang Shao <shaojunyang@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-20 13:00:16 -07:00