Commit graph

307 commits

Author SHA1 Message Date
Keith Randall
ca66f907dd cmd/compile: use generated loops instead of DUFFCOPY on amd64
This reverts commit 4e182db5fc (CL 695196),
which is itself a revert of
ec9e1176c3 (CL 678620).

So this CL is exactly the same as CL 678620, but with a regalloc fix
(CL 696035) submitted first.

Change-Id: I743ab32fa3aa6ef3e1b2b6751a2ef4519139057c
Reviewed-on: https://go-review.googlesource.com/c/go/+/696016
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-13 15:57:33 -07:00
Keith Randall
4e182db5fc Revert "cmd/compile: use generated loops instead of DUFFCOPY on amd64"
This reverts commit ec9e1176c3 (CL 678620).

Reason for revert: causing regalloc to get into an infinite loop

Change-Id: Ie53c58c6126804af6d6883ea4acdcfb632a172bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/695196
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
2025-08-12 16:04:18 -07:00
Keith Randall
ec9e1176c3 cmd/compile: use generated loops instead of DUFFCOPY on amd64
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
                        │     base      │                 exp                 │
                        │    sec/op     │   sec/op     vs base                │
MemmoveKnownSize112-20     1.764n ±  0%   1.247n ± 0%  -29.31% (p=0.000 n=10)
MemmoveKnownSize128-20     1.891n ±  0%   1.405n ± 1%  -25.72% (p=0.000 n=10)
MemmoveKnownSize192-20     2.521n ±  0%   2.114n ± 3%  -16.16% (p=0.000 n=10)
MemmoveKnownSize248-20     4.028n ±  0%   3.877n ± 1%   -3.75% (p=0.000 n=10)
MemmoveKnownSize256-20     3.272n ±  0%   2.961n ± 2%   -9.53% (p=0.000 n=10)
MemmoveKnownSize512-20     6.733n ±  3%   5.936n ± 4%  -11.83% (p=0.000 n=10)
MemmoveKnownSize1024-20   13.905n ±  5%   9.798n ± 9%  -29.54% (p=0.000 n=10)

Change-Id: Icc01cec0d8b072300d749a5ce76f53b3725b5c65
Reviewed-on: https://go-review.googlesource.com/c/go/+/678620
Reviewed-by: Jorropo <jorropo.pgm@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
2025-08-12 09:15:08 -07:00
Keith Randall
74421a305b Revert "cmd/compile: allow multi-field structs to be stored directly in interfaces"
This reverts commit cd55f86b8d (CL 681937)

Reason for revert: still causing compiler failures on Google test code

Change-Id: I5cd482fd607fd060a523257082d48821b5f965d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/695016
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-11 22:59:52 -07:00
qiulaidongfeng
4ee0df8c46 cmd: remove dead code
Fixes #74076

Change-Id: Icc67b3d4e342f329584433bd1250c56ae8f5a73d
Reviewed-on: https://go-review.googlesource.com/c/go/+/690635
Reviewed-by: Alan Donovan <adonovan@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Commit-Queue: Alan Donovan <adonovan@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Alan Donovan <adonovan@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-05 10:31:25 -07:00
Keith Randall
cd55f86b8d cmd/compile: allow multi-field structs to be stored directly in interfaces
If the struct is a bunch of 0-sized fields and one pointer field.

Fixes #74092

Change-Id: I87c5d162c8c9fdba812420d7f9d21de97295b62c
Reviewed-on: https://go-review.googlesource.com/c/go/+/681937
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-05 09:18:31 -07:00
Cuong Manh Le
2b622b05a9 cmd/compile: remove isUintXPowerOfTwo functions
And use the generic version instead.

While at it, also correct the corresponding rules to use logXu variants
instead of logXu, following discussion in CL 689815.

Change-Id: Iba85d14ff0e26d45a126764e7bd5702586358d23
Reviewed-on: https://go-review.googlesource.com/c/go/+/692917
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2025-08-05 08:37:45 -07:00
Cuong Manh Le
72147ffa75 cmd/compile: simplify isUintXPowerOfTwo implementation
By calling isUnsignedPowerOfTwo instead of duplicating the same ones.

Change-Id: I1e29d3b7eda1bc8773fcd25728d8f508ae633ac9
Reviewed-on: https://go-review.googlesource.com/c/go/+/692916
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-05 08:34:05 -07:00
Cuong Manh Le
26da1199eb cmd/compile: make isUint{32,64}PowerOfTwo implementations clearer
Since these functions cast the input to uint64, so the result always
non-negative. The condition should be changed to comparing with zero,
thus maaking it clearer to reader, and open room for simplifying in the
future by using the generic isUnsignedPowerOfTwo function.

Separated this change, so it's easier to do bisecting if there's any
problems happened.

Change-Id: Ibec28c2590f4c52caa36384b710d526459725e49
Reviewed-on: https://go-review.googlesource.com/c/go/+/692915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-08-05 08:32:51 -07:00
Keith Randall
eb7f515c4d cmd/compile: use generated loops instead of DUFFZERO on amd64
goarch: amd64
cpu: 12th Gen Intel(R) Core(TM) i7-12700
                        │     base      │                 exp                 │
                        │    sec/op     │   sec/op     vs base                │
MemclrKnownSize112-20      1.270n ± 14%   1.006n ± 0%  -20.72% (p=0.000 n=10)
MemclrKnownSize128-20      1.266n ±  0%   1.005n ± 0%  -20.58% (p=0.000 n=10)
MemclrKnownSize192-20      1.771n ±  0%   1.579n ± 1%  -10.84% (p=0.000 n=10)
MemclrKnownSize248-20      4.034n ±  0%   3.520n ± 0%  -12.75% (p=0.000 n=10)
MemclrKnownSize256-20      2.269n ±  0%   2.014n ± 0%  -11.26% (p=0.000 n=10)
MemclrKnownSize512-20      4.280n ±  0%   4.030n ± 0%   -5.84% (p=0.000 n=10)
MemclrKnownSize1024-20     8.309n ±  1%   8.057n ± 0%   -3.03% (p=0.000 n=10)

Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765
Reviewed-on: https://go-review.googlesource.com/c/go/+/678937
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-31 17:12:39 -07:00
Cuong Manh Le
880ca333d7 cmd/compile: removing log2uint32 function
Just using isUnsignedPowerOfTwo and log32u is enough.

Change-Id: I93d49ab71c6245d05f6507adbcb9ef2a696e75d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/691476
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2025-07-29 16:22:48 -07:00
Cuong Manh Le
1513661dc3 cmd/compile: simplify logX implementations
By calling logXu instead of duplicating the same ones.

Change-Id: Ide7a3ce072a6abafe1979f0158000457d90645c3
Reviewed-on: https://go-review.googlesource.com/c/go/+/691475
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-29 16:22:45 -07:00
Cuong Manh Le
f3582fc80e cmd/compile: add unsigned power-of-two detector
Fixes #74485

Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca
Reviewed-on: https://go-review.googlesource.com/c/go/+/689815
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2025-07-29 16:22:37 -07:00
Keith Randall
5045fdd8ff cmd/compile: fix containsUnavoidableCall computation
The previous algorithm was incorrect, as it reused the dominatedByCall
slice without resetting it. It also used the depth fields even though
they were not yet calculated.

Also, clean up a lot of the loop detector code that we never use.

Always compute depths. It is cheap.

Update #71868

Not really sure how to test this. As it is just an advisory bit,
nothing goes really wrong when the result is incorrect.

Change-Id: Ic0ae87a4d3576554831252d88b05b058ca68af41
Reviewed-on: https://go-review.googlesource.com/c/go/+/680775
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2025-07-25 13:52:00 -07:00
Keith Randall
3024785b92 cmd/compile,runtime: remember idx+len for bounds check failure with less code
Currently we must put the index and length into specific registers so
we can call into the runtime to report a bounds check failure.

So a typical bounds check call is something like:

MOVD  R3, R0
MOVD  R7, R1
CALL  runtime.panicIndex

or, if for instance the index is constant,

MOVD  $7, R0
MOVD  R9, R1
CALL  runtime.panicIndex

Sometimes the MOVD can be avoided, if the value happens to be in the
right register already. But that's not terribly common, and doesn't
work at all for constants.

Let's get rid of those MOVD instructions. They pollute the instruction
cache and are almost never executed.

Instead, we'll encode in a PCDATA table where the runtime should find
the index and length. The table encodes, for each index and length,
whether it is a constant or in a register, and which register or
constant it is.

That way, we can avoid all those useless MOVDs. Instead, we can figure
out the index and length at runtime. This makes the bounds panic path
slower, but that's a good tradeoff.

We can encode registers 0-15 and constants 0-31. Anything outside that
range still needs to use an explicit instruction.

This CL is the foundation, followon CLs will move each architecture
to the new strategy.

Change-Id: I705c511e546e6aac59fed922a8eaed4585e96820
Reviewed-on: https://go-review.googlesource.com/c/go/+/682396
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-07-24 16:05:59 -07:00
Keith Randall
2ddf542e4c cmd/compile: use ,ok return idiom for sparsemap.get
Change-Id: I89719b94de74a32402d02309515dffc4989484db
Reviewed-on: https://go-review.googlesource.com/c/go/+/681575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@google.com>
2025-07-24 09:04:29 -07:00
Paul Murphy
ee7bfbdbcc cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...)
CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add
a modified version of isPPC64WordRotateMask which valids the mask is
contiguous and fits inside a uint32.

I don't this is possible when merging SRDconst, the first check should
always reject such combines. But, be extra careful and do it there
too.

Fixes #73153

Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435
Reviewed-on: https://go-review.googlesource.com/c/go/+/679775
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-06-09 20:33:27 -07:00
Keith Randall
d681270714 cmd/compile: allow load-op merging in additional situations
x += *p

We want to do this with a single load+add operation on amd64.
The tricky part is that we don't want to combine if there are
other uses of x after this instruction.

Implement a simple detector that seems to capture a common situation -
x += *p is in a loop, and the other use of x is after loop exit.
In that case, it does not hurt to do the load+add combo.

Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c
Reviewed-on: https://go-review.googlesource.com/c/go/+/672957
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-15 15:21:36 -07:00
Keith Randall
11fa0de475 cmd/compile: use OpMove instead of memmove more on arm64
OpMove is faster for small moves of fixed size.

For safety, we have to rewrite the Move rewrite rules a bit so that
all the loads are done before any stores happen.

Also use an 8-byte move instead of a 16-byte move if the tail is
at most 8 bytes.

Change-Id: I7f6c7496ac6d5eb2e0706fd59ca4b5d797c51101
Reviewed-on: https://go-review.googlesource.com/c/go/+/672997
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-05-15 14:06:16 -07:00
Xiaolin Zhao
c31a5c571f cmd/compile: fold negation into addition/subtraction on loong64
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279972    279896  -76    -0.0271%
    asm       556390    556310  -80    -0.0144%
    buildid   272376    272300  -76    -0.0279%
    cgo       481534    481550  +16    +0.0033%
    compile   2457992   2457396 -596   -0.0242%
    covdata   323488    323404  -84    -0.0260%
    cover     518630    518490  -140   -0.0270%
    dist      340894    340814  -80    -0.0235%
    distpack  282568    282484  -84    -0.0297%
    doc       790224    789984  -240   -0.0304%
    fix       324408    324348  -60    -0.0185%
    link      704910    704666  -244   -0.0346%
    nm        277220    277144  -76    -0.0274%
    objdump   508026    507878  -148   -0.0291%
    pack      221810    221786  -24    -0.0108%
    pprof     1470284   1469880 -404   -0.0275%
    test2json 254896    254852  -44    -0.0173%
    trace     1100390   1100074 -316   -0.0287%
    vet       781398    781142  -256   -0.0328%
    go        1529668   1529128 -540   -0.0353%
    gofmt     318668    318568  -100   -0.0314%
    total     13795746 13792094 -3652  -0.0265%

Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-14 17:46:58 -07:00
khr@golang.org
6729fbe93e cmd/compile: on amd64, use flag result of x instead of doing (TEST x x)
So we can avoid using a TEST where it isn't needed.

Currently only implemented for ADD{Q,L}const.

Change-Id: Ia9c4c69bb6033051a45cfd3d191376c7cec9d423
Reviewed-on: https://go-review.googlesource.com/c/go/+/669875
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-05-05 13:07:39 -07:00
Keith Randall
12110c3f7e cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.

Just amd64 and arm64 for now.

Fixes #67575

Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 09:33:31 -07:00
Keith Randall
7d0cb2a2ad cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22 10:24:18 -07:00
Alexander Musman
16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Mateusz Poliwczak
eec3745bd7 cmd/compile/internal/ssa: replace uses of interface{} with Sym/Aux
Change-Id: I0a3ce2e823697eee5bb5e7d5ea0ef025132c0689
Reviewed-on: https://go-review.googlesource.com/c/go/+/661655
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-31 08:20:16 -07:00
Jorropo
b60b9cf21f cmd/compile: add constant folding for bits.Add64
Change-Id: I0ed4ebeaaa68e274e5902485ccc1165c039440bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/656275
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11 20:17:53 -07:00
Andrey Bokhanko
11f7ea8ce0 cmd/compile: add type-based alias analysis
Make ssa.disjoint call ssa.disjointTypes to disambiguate Values based on
their types. Only one type-based rule is employed: a Type can't alias
with a pointer (https://pkg.go.dev/unsafe#Pointer).

Fixes #70488

Change-Id: I5a7e75292c2b6b5a01fb9048e3e2360e31dbcdd9
Reviewed-on: https://go-review.googlesource.com/c/go/+/632176
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-14 15:32:55 -08:00
Keith Randall
89c2f282dc cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes)
before calling map access or delete functions with the resulting
string as the key, then we can just use the ptr/len of the
slicebytetostring argument as the key. This avoids an allocation.

Fixes #44898
Update #71132

There's old code in cmd/compile/internal/walk/order.go that handles
some of these cases.

1. m[string(b)]
2. s := string(b); m[s]
3. m[[2]string{string(b1),string(b2)}]

The old code handled cases 1&3. The new code handles cases 1&2.
We'll leave the old code around to keep 3 working, although it seems
not terribly common.

Case 2 happens particularly after inlining, so it is pretty common.

Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43
Reviewed-on: https://go-review.googlesource.com/c/go/+/640656
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 13:03:07 -08:00
Keith Randall
072eea9b3b cmd/compile: avoid ifaceeq call if we know the interface is direct
We can just use == if the interface is direct.

Fixes #70738

Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f
Reviewed-on: https://go-review.googlesource.com/c/go/+/635435
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-10 13:28:41 -08:00
Paul E. Murphy
1846dd5a31 cmd/compile/internal/ssa: fix PPC64 shift codegen regression
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.

Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.

Fixes #70003

Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-24 17:32:18 +00:00
Xiaolin Zhao
91d07ac71c cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
Tested that on loong64, the optimization effect is negative for
constant size cases greater than 512.
So only enable inlining for constant size cases less than 512.

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                      |  bench.old   |              bench.new               |
                      |    sec/op    |    sec/op     vs base                |
MemclrKnownSize1        2.4070n ± 0%   0.4004n ± 0%  -83.37% (p=0.000 n=20)
MemclrKnownSize2        2.1365n ± 0%   0.4004n ± 0%  -81.26% (p=0.000 n=20)
MemclrKnownSize4        2.4445n ± 0%   0.4004n ± 0%  -83.62% (p=0.000 n=20)
MemclrKnownSize8        2.4200n ± 0%   0.4004n ± 0%  -83.45% (p=0.000 n=20)
MemclrKnownSize16       2.8030n ± 0%   0.8007n ± 0%  -71.43% (p=0.000 n=20)
MemclrKnownSize32        2.803n ± 0%    1.602n ± 0%  -42.85% (p=0.000 n=20)
MemclrKnownSize64        3.250n ± 0%    2.402n ± 0%  -26.08% (p=0.000 n=20)
MemclrKnownSize112       6.006n ± 0%    2.819n ± 0%  -53.06% (p=0.000 n=20)
MemclrKnownSize128       6.006n ± 0%    3.240n ± 0%  -46.05% (p=0.000 n=20)
MemclrKnownSize192       6.807n ± 0%    5.205n ± 0%  -23.53% (p=0.000 n=20)
MemclrKnownSize248       7.608n ± 0%    6.301n ± 0%  -17.19% (p=0.000 n=20)
MemclrKnownSize256       7.608n ± 0%    6.707n ± 0%  -11.84% (p=0.000 n=20)
MemclrKnownSize512       13.61n ± 0%    13.61n ± 0%        ~ (p=0.374 n=20)
MemclrKnownSize1024      26.43n ± 0%    26.43n ± 0%        ~ (p=0.826 n=20)
MemclrKnownSize4096      103.3n ± 0%    103.3n ± 0%        ~ (p=1.000 n=20)
MemclrKnownSize512KiB    26.29µ ± 0%    26.29µ ± 0%   -0.00% (p=0.012 n=20)
geomean                  10.05n         5.006n       -50.18%

                      |  bench.old   |               bench.new                |
                      |     B/s      |      B/s       vs base                 |
MemclrKnownSize1        396.2Mi ± 0%   2381.9Mi ± 0%  +501.21% (p=0.000 n=20)
MemclrKnownSize2        892.8Mi ± 0%   4764.0Mi ± 0%  +433.59% (p=0.000 n=20)
MemclrKnownSize4        1.524Gi ± 0%    9.305Gi ± 0%  +510.56% (p=0.000 n=20)
MemclrKnownSize8        3.079Gi ± 0%   18.609Gi ± 0%  +504.42% (p=0.000 n=20)
MemclrKnownSize16       5.316Gi ± 0%   18.609Gi ± 0%  +250.05% (p=0.000 n=20)
MemclrKnownSize32       10.63Gi ± 0%    18.61Gi ± 0%   +75.00% (p=0.000 n=20)
MemclrKnownSize64       18.34Gi ± 0%    24.81Gi ± 0%   +35.27% (p=0.000 n=20)
MemclrKnownSize112      17.37Gi ± 0%    37.01Gi ± 0%  +113.08% (p=0.000 n=20)
MemclrKnownSize128      19.85Gi ± 0%    36.80Gi ± 0%   +85.39% (p=0.000 n=20)
MemclrKnownSize192      26.27Gi ± 0%    34.35Gi ± 0%   +30.77% (p=0.000 n=20)
MemclrKnownSize248      30.36Gi ± 0%    36.66Gi ± 0%   +20.75% (p=0.000 n=20)
MemclrKnownSize256      31.34Gi ± 0%    35.55Gi ± 0%   +13.43% (p=0.000 n=20)
MemclrKnownSize512      35.02Gi ± 0%    35.03Gi ± 0%    +0.00% (p=0.030 n=20)
MemclrKnownSize1024     36.09Gi ± 0%    36.09Gi ± 0%         ~ (p=0.101 n=20)
MemclrKnownSize4096     36.93Gi ± 0%    36.93Gi ± 0%    +0.00% (p=0.003 n=20)
MemclrKnownSize512KiB   18.57Gi ± 0%    18.57Gi ± 0%    +0.00% (p=0.041 n=20)
geomean                 10.13Gi         20.33Gi       +100.72%

Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/621355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-24 08:55:31 +00:00
Cuong Manh Le
6d856a804c cmd/compile: generalize struct load/store
The SSA backend currently only handle struct with up to 4 fields. Thus,
there are different operations corresponding to number fields of the
struct.

This CL generalizes these with just one OpStructMake, allow struct types
with arbitrary number of fields.

However, the ssa.MaxStruct is still kept as-is, and future CL will
increase this value to optimize large structs.

Updates #24416

Change-Id: I192ffbea881186693584476b5639394e79be45c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/611075
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-26 13:18:08 +00:00
khr@golang.org
944a2ac3c7 cmd/compile: small cleanups to rewrite rule helpers
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c
Reviewed-on: https://go-review.googlesource.com/c/go/+/615198
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-09-24 21:02:34 +00:00
khr@golang.org
be86f09e01 cmd/compile: use generics for isPowerOfTwo predicates
Change-Id: I097b53e9f13de6ff6eb18ae2261842b097f26390
Reviewed-on: https://go-review.googlesource.com/c/go/+/615197
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-24 21:02:31 +00:00
khr@golang.org
b92f3f29c1 cmd/compile: simplify naming for arm64 bitfield accessors
They are already methods on an arm64-specific type, so they don't
need to have arm64-specific names.

Change-Id: I2be29907f9892891d88d52cced043ca248aa4e08
Reviewed-on: https://go-review.googlesource.com/c/go/+/615196
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-09-24 21:02:28 +00:00
Xiaolin Zhao
2c5b707b3b cmd/compile: optimize RotateLeft8/16 on loong64
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
             │  bench.old   │              bench.new               │
             │    sec/op    │    sec/op     vs base                │
RotateLeft8     1.401n ± 0%    1.201n ± 0%  -14.28% (p=0.000 n=20)
RotateLeft16   1.4010n ± 0%   0.8032n ± 0%  -42.67% (p=0.000 n=20)
geomean         1.401n        0.9822n       -29.90%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
             │  bench.old  │              bench.new              │
             │   sec/op    │   sec/op     vs base                │
RotateLeft8    1.576n ± 0%   1.310n ± 0%  -16.88% (p=0.000 n=20)
RotateLeft16   1.576n ± 0%   1.166n ± 0%  -26.02% (p=0.000 n=20)
geomean        1.576n        1.236n       -21.58%

Change-Id: I39c18306be0b8fd31b57bd0911714abd1783b50e
Reviewed-on: https://go-review.googlesource.com/c/go/+/604738
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
2024-09-13 17:15:09 +00:00
Keith Randall
f90f7e90b3 cmd: use built-in min/max instead of bespoke versions
Now that we're bootstrapping from a toolchain that has min/max builtins.

Update #64751

Change-Id: I63eedf3cca00f56f62ca092949cb2dc61db03361
Reviewed-on: https://go-review.googlesource.com/c/go/+/610355
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2024-09-03 22:26:52 +00:00
Paul E. Murphy
2b0a157d68 cmd/compile: intrinsify math.MulUintptr on PPC64
This can be done efficiently with few instructions.

This also adds MULHDUCC for further codegen improvement.

Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26 17:02:43 +00:00
Keith Randall
b2cdaf7346 cmd/compile: improve unneeded zeroing removal
After newobject, we don't need to write zeroes to initialize the
object.  It has already been zeroed by the allocator.

This is already handled in most cases, but because we run builtin
decomposition after the opt pass, we don't handle cases where the zero
of a compound builtin is being written. Improve the zero detector to
handle those cases.

Fixes #68845

Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/605255
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-08-14 18:16:29 +00:00
Keith Randall
b538e953ee cmd/compile: clean up some unused code in prove pass
Change-Id: Ib695064c5a77a3f86d1d2a74f96823e65199b8e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/603956
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-08-12 16:59:38 +00:00
khr@golang.org
3b96eebcbd cmd/compile: rewrite the constant parts of the prove pass
Handles a lot more cases where constant ranges can eliminate
various (mostly bounds failure) paths.

Fixes #66826
Fixes #66692
Fixes #48213
Update #57959

TODO: remove constant logic from poset code, no longer needed.

Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/599096
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07 16:07:33 +00:00
Keith Randall
7f90b960a9 cmd/compile: don't elide zero extension on top of signed values
v = ... compute some value, which zeros top 32 bits ...
w = zero-extend v

We want to remove the zero-extension operation, as it doesn't do anything.
But if v is typed as a signed value, and it gets spilled/restored, it
might be re-sign-extended upon restore. So the zero-extend isn't actually
a NOP when there might be calls or other reasons to spill in between v and w.

Fixes #68227

Change-Id: I3b30b8e56c7d70deac1fb09d2becc7395acbadf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/595675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joedian Reid <joedian@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-06-28 15:25:43 +00:00
Paul E. Murphy
d5e5b14305 cmd/compile/ssa: fix (MOVWZreg (RLWINM)) folding on PPC64
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.

Fixes #67844.

Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-06-07 19:02:52 +00:00
Paul E. Murphy
dca577d882 cmd/compile/internal/ssa: reintroduce ANDconst opcode on PPC64
This allows more effective conversion of rotate and mask opcodes
into their CC equivalents, while simplifying the first lowering
pass.

This was removed before the latelower pass was introduced to fold
more cases of compare against zero. Add ANDconst to push the
conversion of ANDconst to ANDCCconst into latelower with the other
CC opcodes.

This also requires introducing RLDICLCC to prevent regressions
when ANDconst is converted to RLDICL then to RLDICLCC and back
to ANDCCconst when possible.

Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590
Reviewed-on: https://go-review.googlesource.com/c/go/+/586160
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-22 19:59:38 +00:00
Paul E. Murphy
c6d142c4a7 cmd/compile/internal/ssa: fix ppc64 merging of (CLRLSLDI (SRD ...))
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.

Fixes #67526

Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-21 18:53:43 +00:00
Paul E. Murphy
0222a028f1 cmd/compile/internal/ssa: combine more shift and masking on PPC64
Investigating binaries, these patterns seem to show up frequently.

Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-15 13:27:41 +00:00
Paul E. Murphy
7994da4cc1 cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into RLWINM
This provides a small performance bump to crc64 as measured on ppc64le/power10:

name              old time/op    new time/op    delta
Crc64/ISO64KB       49.6µs ± 0%    46.6µs ± 0%  -6.18%
Crc64/ISO4KB        3.16µs ± 0%    2.97µs ± 0%  -5.83%
Crc64/ISO1KB         840ns ± 0%     794ns ± 0%  -5.46%
Crc64/ECMA64KB      49.6µs ± 0%    46.5µs ± 0%  -6.20%
Crc64/Random64KB    53.1µs ± 0%    49.9µs ± 0%  -6.04%
Crc64/Random16KB    15.9µs ± 1%    15.0µs ± 0%  -5.73%

Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-05-03 21:12:29 +00:00
Than McIntosh
c686783cab cmd/compile/internal/ssa: delay rewrite cycle detection for huge funcs
The SSA rewrite pass has some logic that looks to see whether a
suspiciously large number of rewrites is happening, and if so, turns
on logic to try to detect rewrite cycles. The cycle detection logic is
quite expensive (hashes the entire function), meaning that for very
large functions we might get a successful compilation in a minute or
two with no cycle detection, but take a couple of hours once cycle
detection kicks in.

This patch moves from a fixed limit of 1000 iterations to a limit set
partially based on the size of the function (meaning that we'll wait
longer before turning cycle detection for a large func).

Fixes #66773.

Change-Id: I72f8524d706f15b3f0150baf6abeab2a5d3e15c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/578215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-17 19:39:19 +00:00
Keith Randall
6bf8b76b95 cmd/compile: don't assume args are always zero-extended
On amd64, we always zero-extend when loading arguments from the stack.
On arm64, we extend based on the type. This causes problems with
zeroUpper*Bits, which reports the top bits are zero when they aren't.

Fix it to use the type to decide if the top bits are really zero.

For tests, only f32 currently fails on arm64. Added other tests
just for future-proofing.

Update #66066

Change-Id: I2f13fb47198e139ef13c9a34eb1edc932eea3ee3
Reviewed-on: https://go-review.googlesource.com/c/go/+/571135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-20 17:35:29 +00:00
Keith Randall
a46ecdca36 cmd/compile: fix sign/zero-extension removal
When an opcode generates a known high bit state (typically, a sub-word
operation that zeros the high bits), we can remove any subsequent
extension operation that would be a no-op.

x = (OP ...)
y = (ZeroExt32to64 x)

If OP zeros the high 32 bits, then we can replace y with x, as the
zero extension doesn't do anything.

However, x in this situation normally has a sub-word-sized type.  The
semantics of values in registers is typically that the high bits
beyond the value's type size are junk. So although the opcode
generating x *currently* zeros the high bits, after x is rewritten to
another opcode it may not - rewrites of sub-word-sized values can
trash the high bits.

To fix, move the extension-removing rules to late lower. That ensures
that their arguments won't be rewritten to change their high bits.

I am also worried about spilling and restoring. Spilling and restoring
doesn't preserve the high bits, but instead sets them to a known value
(often 0, but in some cases it could be sign-extended).  I am unable
to come up with a case that would cause a problem here, so leaving for
another time.

Fixes #66066

Change-Id: I3b5c091b3b3278ccbb7f11beda8b56f4b6d3fde7
Reviewed-on: https://go-review.googlesource.com/c/go/+/568616
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-12 19:38:41 +00:00