Commit graph

289 commits

Author SHA1 Message Date
Keith Randall
11fa0de475 cmd/compile: use OpMove instead of memmove more on arm64
OpMove is faster for small moves of fixed size.

For safety, we have to rewrite the Move rewrite rules a bit so that
all the loads are done before any stores happen.

Also use an 8-byte move instead of a 16-byte move if the tail is
at most 8 bytes.

Change-Id: I7f6c7496ac6d5eb2e0706fd59ca4b5d797c51101
Reviewed-on: https://go-review.googlesource.com/c/go/+/672997
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2025-05-15 14:06:16 -07:00
Xiaolin Zhao
c31a5c571f cmd/compile: fold negation into addition/subtraction on loong64
This change also avoid double negation, and add loong64 codegen for arithmetic tests.
Reduce the number of go toolchain instructions on loong64 as follows.

    file      before    after     Δ       %
    addr2line 279972    279896  -76    -0.0271%
    asm       556390    556310  -80    -0.0144%
    buildid   272376    272300  -76    -0.0279%
    cgo       481534    481550  +16    +0.0033%
    compile   2457992   2457396 -596   -0.0242%
    covdata   323488    323404  -84    -0.0260%
    cover     518630    518490  -140   -0.0270%
    dist      340894    340814  -80    -0.0235%
    distpack  282568    282484  -84    -0.0297%
    doc       790224    789984  -240   -0.0304%
    fix       324408    324348  -60    -0.0185%
    link      704910    704666  -244   -0.0346%
    nm        277220    277144  -76    -0.0274%
    objdump   508026    507878  -148   -0.0291%
    pack      221810    221786  -24    -0.0108%
    pprof     1470284   1469880 -404   -0.0275%
    test2json 254896    254852  -44    -0.0173%
    trace     1100390   1100074 -316   -0.0287%
    vet       781398    781142  -256   -0.0328%
    go        1529668   1529128 -540   -0.0353%
    gofmt     318668    318568  -100   -0.0314%
    total     13795746 13792094 -3652  -0.0265%

Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca
Reviewed-on: https://go-review.googlesource.com/c/go/+/672555
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2025-05-14 17:46:58 -07:00
khr@golang.org
6729fbe93e cmd/compile: on amd64, use flag result of x instead of doing (TEST x x)
So we can avoid using a TEST where it isn't needed.

Currently only implemented for ADD{Q,L}const.

Change-Id: Ia9c4c69bb6033051a45cfd3d191376c7cec9d423
Reviewed-on: https://go-review.googlesource.com/c/go/+/669875
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-05-05 13:07:39 -07:00
Keith Randall
12110c3f7e cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.

Just amd64 and arm64 for now.

Fixes #67575

Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 09:33:31 -07:00
Keith Randall
7d0cb2a2ad cmd/compile: constant fold 128-bit multiplies
The full 64x64->128 multiply comes up when using bits.Mul64.
The 64x64->64+overflow multiply comes up in unsafe.Slice when using
a constant length.

Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc
Reviewed-on: https://go-review.googlesource.com/c/go/+/667175
Reviewed-by: Junyang Shao <shaojunyang@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-04-22 10:24:18 -07:00
Alexander Musman
16a6b71f18 cmd/compile: improve store-to-load forwarding with compatible types
Improve the compiler's store-to-load forwarding optimization by relaxing the
type comparison condition. Instead of requiring exact type equality (CMPeq),
we now use copyCompatibleType which allows forwarding between compatible
types where safe.

Fix several size comparison bugs in the nested store patterns. Previously,
we were comparing the size of the outer store with the load type,
rather than comparing with the size of the actual store being forwarded
from.

Skip OpConvert in dead store elimination to help get rid of dead stores such
as zeroing slices. OpConvert, like OpInlMark, doesn't really use the memory.

This optimization is particularly beneficial for code that creates slices with
computed pointers, such as the runtime's heapBitsSlice function, where
intermediate calculations were previously causing the compiler to miss
store-to-load forwarding opportunities.

Local sweet run result on an x86_64 laptop:

                       │  Orig.res   │              Hopt.res              │
                       │   sec/op    │   sec/op     vs base               │
BiogoIgor-8               5.303 ± 1%    5.322 ± 1%       ~ (p=0.190 n=10)
BiogoKrishna-8            7.894 ± 1%    7.828 ± 2%       ~ (p=0.190 n=10)
BleveIndexBatch100-8      2.257 ± 1%    2.248 ± 2%       ~ (p=0.529 n=10)
EtcdPut-8                30.12m ± 1%   30.03m ± 1%       ~ (p=0.796 n=10)
EtcdSTM-8                127.1m ± 1%   126.2m ± 0%  -0.74% (p=0.023 n=10)
GoBuildKubelet-8          52.21 ± 0%    52.05 ± 1%       ~ (p=0.063 n=10)
GoBuildKubeletLink-8      4.342 ± 1%    4.305 ± 0%  -0.85% (p=0.000 n=10)
GoBuildIstioctl-8         43.33 ± 0%    43.24 ± 0%  -0.22% (p=0.015 n=10)
GoBuildIstioctlLink-8     4.604 ± 1%    4.598 ± 0%       ~ (p=0.063 n=10)
GoBuildFrontend-8         15.33 ± 0%    15.29 ± 0%       ~ (p=0.143 n=10)
GoBuildFrontendLink-8    740.0m ± 1%   737.7m ± 1%       ~ (p=0.912 n=10)
GopherLuaKNucleotide-8    9.590 ± 1%    9.656 ± 1%       ~ (p=0.165 n=10)
MarkdownRenderXHTML-8    96.97m ± 1%   97.26m ± 2%       ~ (p=0.105 n=10)
Tile38QueryLoad-8        335.9µ ± 1%   335.6µ ± 1%       ~ (p=0.481 n=10)
geomean                   1.336         1.333       -0.22%

Change-Id: I031552623e6d5a3b1b5be8325e6314706e45534f
Reviewed-on: https://go-review.googlesource.com/c/go/+/662075
Reviewed-by: Carlos Amedee <carlos@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Carlos Amedee <carlos@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-04-04 08:25:47 -07:00
Mateusz Poliwczak
eec3745bd7 cmd/compile/internal/ssa: replace uses of interface{} with Sym/Aux
Change-Id: I0a3ce2e823697eee5bb5e7d5ea0ef025132c0689
Reviewed-on: https://go-review.googlesource.com/c/go/+/661655
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2025-03-31 08:20:16 -07:00
Jorropo
b60b9cf21f cmd/compile: add constant folding for bits.Add64
Change-Id: I0ed4ebeaaa68e274e5902485ccc1165c039440bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/656275
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Auto-Submit: Jorropo <jorropo.pgm@gmail.com>
2025-03-11 20:17:53 -07:00
Andrey Bokhanko
11f7ea8ce0 cmd/compile: add type-based alias analysis
Make ssa.disjoint call ssa.disjointTypes to disambiguate Values based on
their types. Only one type-based rule is employed: a Type can't alias
with a pointer (https://pkg.go.dev/unsafe#Pointer).

Fixes #70488

Change-Id: I5a7e75292c2b6b5a01fb9048e3e2360e31dbcdd9
Reviewed-on: https://go-review.googlesource.com/c/go/+/632176
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2025-02-14 15:32:55 -08:00
Keith Randall
89c2f282dc cmd/compile: move []byte->string map key optimization to ssa
If we call slicebytetostring immediately (with no intervening writes)
before calling map access or delete functions with the resulting
string as the key, then we can just use the ptr/len of the
slicebytetostring argument as the key. This avoids an allocation.

Fixes #44898
Update #71132

There's old code in cmd/compile/internal/walk/order.go that handles
some of these cases.

1. m[string(b)]
2. s := string(b); m[s]
3. m[[2]string{string(b1),string(b2)}]

The old code handled cases 1&3. The new code handles cases 1&2.
We'll leave the old code around to keep 3 working, although it seems
not terribly common.

Case 2 happens particularly after inlining, so it is pretty common.

Change-Id: I8913226ca79d2c65f4e2bd69a38ac8c976a57e43
Reviewed-on: https://go-review.googlesource.com/c/go/+/640656
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-13 13:03:07 -08:00
Keith Randall
072eea9b3b cmd/compile: avoid ifaceeq call if we know the interface is direct
We can just use == if the interface is direct.

Fixes #70738

Change-Id: Ia9a644791a370fec969c04c42d28a9b58f16911f
Reviewed-on: https://go-review.googlesource.com/c/go/+/635435
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-10 13:28:41 -08:00
Paul E. Murphy
1846dd5a31 cmd/compile/internal/ssa: fix PPC64 shift codegen regression
CL 621357 introduced new generic lowering rules which caused
several shift related codegen test failures.

Add new rules to fix the test regressions, and cleanup tests
which are changed but not regressed. Some CLRLSLDI tests are
removed as they are no test CLRLSLDI rules.

Fixes #70003

Change-Id: I1ecc5a7e63ab709a4a0cebf11fa078d5cf164034
Reviewed-on: https://go-review.googlesource.com/c/go/+/622236
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-24 17:32:18 +00:00
Xiaolin Zhao
91d07ac71c cmd/compile: inline constant sized memclrNoHeapPointers calls on loong64
Tested that on loong64, the optimization effect is negative for
constant size cases greater than 512.
So only enable inlining for constant size cases less than 512.

goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A6000 @ 2500.00MHz
                      |  bench.old   |              bench.new               |
                      |    sec/op    |    sec/op     vs base                |
MemclrKnownSize1        2.4070n ± 0%   0.4004n ± 0%  -83.37% (p=0.000 n=20)
MemclrKnownSize2        2.1365n ± 0%   0.4004n ± 0%  -81.26% (p=0.000 n=20)
MemclrKnownSize4        2.4445n ± 0%   0.4004n ± 0%  -83.62% (p=0.000 n=20)
MemclrKnownSize8        2.4200n ± 0%   0.4004n ± 0%  -83.45% (p=0.000 n=20)
MemclrKnownSize16       2.8030n ± 0%   0.8007n ± 0%  -71.43% (p=0.000 n=20)
MemclrKnownSize32        2.803n ± 0%    1.602n ± 0%  -42.85% (p=0.000 n=20)
MemclrKnownSize64        3.250n ± 0%    2.402n ± 0%  -26.08% (p=0.000 n=20)
MemclrKnownSize112       6.006n ± 0%    2.819n ± 0%  -53.06% (p=0.000 n=20)
MemclrKnownSize128       6.006n ± 0%    3.240n ± 0%  -46.05% (p=0.000 n=20)
MemclrKnownSize192       6.807n ± 0%    5.205n ± 0%  -23.53% (p=0.000 n=20)
MemclrKnownSize248       7.608n ± 0%    6.301n ± 0%  -17.19% (p=0.000 n=20)
MemclrKnownSize256       7.608n ± 0%    6.707n ± 0%  -11.84% (p=0.000 n=20)
MemclrKnownSize512       13.61n ± 0%    13.61n ± 0%        ~ (p=0.374 n=20)
MemclrKnownSize1024      26.43n ± 0%    26.43n ± 0%        ~ (p=0.826 n=20)
MemclrKnownSize4096      103.3n ± 0%    103.3n ± 0%        ~ (p=1.000 n=20)
MemclrKnownSize512KiB    26.29µ ± 0%    26.29µ ± 0%   -0.00% (p=0.012 n=20)
geomean                  10.05n         5.006n       -50.18%

                      |  bench.old   |               bench.new                |
                      |     B/s      |      B/s       vs base                 |
MemclrKnownSize1        396.2Mi ± 0%   2381.9Mi ± 0%  +501.21% (p=0.000 n=20)
MemclrKnownSize2        892.8Mi ± 0%   4764.0Mi ± 0%  +433.59% (p=0.000 n=20)
MemclrKnownSize4        1.524Gi ± 0%    9.305Gi ± 0%  +510.56% (p=0.000 n=20)
MemclrKnownSize8        3.079Gi ± 0%   18.609Gi ± 0%  +504.42% (p=0.000 n=20)
MemclrKnownSize16       5.316Gi ± 0%   18.609Gi ± 0%  +250.05% (p=0.000 n=20)
MemclrKnownSize32       10.63Gi ± 0%    18.61Gi ± 0%   +75.00% (p=0.000 n=20)
MemclrKnownSize64       18.34Gi ± 0%    24.81Gi ± 0%   +35.27% (p=0.000 n=20)
MemclrKnownSize112      17.37Gi ± 0%    37.01Gi ± 0%  +113.08% (p=0.000 n=20)
MemclrKnownSize128      19.85Gi ± 0%    36.80Gi ± 0%   +85.39% (p=0.000 n=20)
MemclrKnownSize192      26.27Gi ± 0%    34.35Gi ± 0%   +30.77% (p=0.000 n=20)
MemclrKnownSize248      30.36Gi ± 0%    36.66Gi ± 0%   +20.75% (p=0.000 n=20)
MemclrKnownSize256      31.34Gi ± 0%    35.55Gi ± 0%   +13.43% (p=0.000 n=20)
MemclrKnownSize512      35.02Gi ± 0%    35.03Gi ± 0%    +0.00% (p=0.030 n=20)
MemclrKnownSize1024     36.09Gi ± 0%    36.09Gi ± 0%         ~ (p=0.101 n=20)
MemclrKnownSize4096     36.93Gi ± 0%    36.93Gi ± 0%    +0.00% (p=0.003 n=20)
MemclrKnownSize512KiB   18.57Gi ± 0%    18.57Gi ± 0%    +0.00% (p=0.041 n=20)
geomean                 10.13Gi         20.33Gi       +100.72%

Change-Id: I460a56f7ccc9f820ca2c1934c1c517b9614809ac
Reviewed-on: https://go-review.googlesource.com/c/go/+/621355
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Pratt <mpratt@google.com>
2024-10-24 08:55:31 +00:00
Cuong Manh Le
6d856a804c cmd/compile: generalize struct load/store
The SSA backend currently only handle struct with up to 4 fields. Thus,
there are different operations corresponding to number fields of the
struct.

This CL generalizes these with just one OpStructMake, allow struct types
with arbitrary number of fields.

However, the ssa.MaxStruct is still kept as-is, and future CL will
increase this value to optimize large structs.

Updates #24416

Change-Id: I192ffbea881186693584476b5639394e79be45c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/611075
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-26 13:18:08 +00:00
khr@golang.org
944a2ac3c7 cmd/compile: small cleanups to rewrite rule helpers
Change-Id: I50a19bd971176598bf8e4ef86ec98f008abe245c
Reviewed-on: https://go-review.googlesource.com/c/go/+/615198
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-09-24 21:02:34 +00:00
khr@golang.org
be86f09e01 cmd/compile: use generics for isPowerOfTwo predicates
Change-Id: I097b53e9f13de6ff6eb18ae2261842b097f26390
Reviewed-on: https://go-review.googlesource.com/c/go/+/615197
Auto-Submit: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-09-24 21:02:31 +00:00
khr@golang.org
b92f3f29c1 cmd/compile: simplify naming for arm64 bitfield accessors
They are already methods on an arm64-specific type, so they don't
need to have arm64-specific names.

Change-Id: I2be29907f9892891d88d52cced043ca248aa4e08
Reviewed-on: https://go-review.googlesource.com/c/go/+/615196
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
2024-09-24 21:02:28 +00:00
Xiaolin Zhao
2c5b707b3b cmd/compile: optimize RotateLeft8/16 on loong64
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
             │  bench.old   │              bench.new               │
             │    sec/op    │    sec/op     vs base                │
RotateLeft8     1.401n ± 0%    1.201n ± 0%  -14.28% (p=0.000 n=20)
RotateLeft16   1.4010n ± 0%   0.8032n ± 0%  -42.67% (p=0.000 n=20)
geomean         1.401n        0.9822n       -29.90%

goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
             │  bench.old  │              bench.new              │
             │   sec/op    │   sec/op     vs base                │
RotateLeft8    1.576n ± 0%   1.310n ± 0%  -16.88% (p=0.000 n=20)
RotateLeft16   1.576n ± 0%   1.166n ± 0%  -26.02% (p=0.000 n=20)
geomean        1.576n        1.236n       -21.58%

Change-Id: I39c18306be0b8fd31b57bd0911714abd1783b50e
Reviewed-on: https://go-review.googlesource.com/c/go/+/604738
Auto-Submit: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Tim King <taking@google.com>
2024-09-13 17:15:09 +00:00
Keith Randall
f90f7e90b3 cmd: use built-in min/max instead of bespoke versions
Now that we're bootstrapping from a toolchain that has min/max builtins.

Update #64751

Change-Id: I63eedf3cca00f56f62ca092949cb2dc61db03361
Reviewed-on: https://go-review.googlesource.com/c/go/+/610355
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org>
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2024-09-03 22:26:52 +00:00
Paul E. Murphy
2b0a157d68 cmd/compile: intrinsify math.MulUintptr on PPC64
This can be done efficiently with few instructions.

This also adds MULHDUCC for further codegen improvement.

Change-Id: I06320ba4383a679341b911a237a360ef07b19168
Reviewed-on: https://go-review.googlesource.com/c/go/+/605975
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Archana Ravindar <aravinda@redhat.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-08-26 17:02:43 +00:00
Keith Randall
b2cdaf7346 cmd/compile: improve unneeded zeroing removal
After newobject, we don't need to write zeroes to initialize the
object.  It has already been zeroed by the allocator.

This is already handled in most cases, but because we run builtin
decomposition after the opt pass, we don't handle cases where the zero
of a compound builtin is being written. Improve the zero detector to
handle those cases.

Fixes #68845

Change-Id: If3dde2e304a05e5a6a6723565191d5444b334bcc
Reviewed-on: https://go-review.googlesource.com/c/go/+/605255
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-08-14 18:16:29 +00:00
Keith Randall
b538e953ee cmd/compile: clean up some unused code in prove pass
Change-Id: Ib695064c5a77a3f86d1d2a74f96823e65199b8e9
Reviewed-on: https://go-review.googlesource.com/c/go/+/603956
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-08-12 16:59:38 +00:00
khr@golang.org
3b96eebcbd cmd/compile: rewrite the constant parts of the prove pass
Handles a lot more cases where constant ranges can eliminate
various (mostly bounds failure) paths.

Fixes #66826
Fixes #66692
Fixes #48213
Update #57959

TODO: remove constant logic from poset code, no longer needed.

Change-Id: Id196436fcd8a0c84c7d59c04f93bd92e26a0fd7e
Reviewed-on: https://go-review.googlesource.com/c/go/+/599096
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2024-08-07 16:07:33 +00:00
Keith Randall
7f90b960a9 cmd/compile: don't elide zero extension on top of signed values
v = ... compute some value, which zeros top 32 bits ...
w = zero-extend v

We want to remove the zero-extension operation, as it doesn't do anything.
But if v is typed as a signed value, and it gets spilled/restored, it
might be re-sign-extended upon restore. So the zero-extend isn't actually
a NOP when there might be calls or other reasons to spill in between v and w.

Fixes #68227

Change-Id: I3b30b8e56c7d70deac1fb09d2becc7395acbadf8
Reviewed-on: https://go-review.googlesource.com/c/go/+/595675
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Joedian Reid <joedian@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-06-28 15:25:43 +00:00
Paul E. Murphy
d5e5b14305 cmd/compile/ssa: fix (MOVWZreg (RLWINM)) folding on PPC64
RLIWNM does not clear the upper 32 bits of the target register if
the mask wraps around (e.g 0xF000000F). Don't elide MOVWZreg for
such masks. All other usage clears the upper 32 bits.

Fixes #67844.

Change-Id: I11b89f1da9ae077624369bfe2bf25e9b7c9b79bc
Reviewed-on: https://go-review.googlesource.com/c/go/+/590896
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-06-07 19:02:52 +00:00
Paul E. Murphy
dca577d882 cmd/compile/internal/ssa: reintroduce ANDconst opcode on PPC64
This allows more effective conversion of rotate and mask opcodes
into their CC equivalents, while simplifying the first lowering
pass.

This was removed before the latelower pass was introduced to fold
more cases of compare against zero. Add ANDconst to push the
conversion of ANDconst to ANDCCconst into latelower with the other
CC opcodes.

This also requires introducing RLDICLCC to prevent regressions
when ANDconst is converted to RLDICL then to RLDICLCC and back
to ANDCCconst when possible.

Change-Id: I9e5f9c99fbefa334db18c6c152c5f967f3ff2590
Reviewed-on: https://go-review.googlesource.com/c/go/+/586160
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-22 19:59:38 +00:00
Paul E. Murphy
c6d142c4a7 cmd/compile/internal/ssa: fix ppc64 merging of (CLRLSLDI (SRD ...))
The rotate value was not correctly converted from a 64 bit to 32
bit rotate. This caused a miscompile of
golang.org/x/text/unicode/runenames.Names.

Fixes #67526

Change-Id: Ief56fbab27ccc71cd4c01117909bfee7f60a2ea1
Reviewed-on: https://go-review.googlesource.com/c/go/+/586915
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
2024-05-21 18:53:43 +00:00
Paul E. Murphy
0222a028f1 cmd/compile/internal/ssa: combine more shift and masking on PPC64
Investigating binaries, these patterns seem to show up frequently.

Change-Id: I987251e4070e35c25e98da321e444ccaa1526912
Reviewed-on: https://go-review.googlesource.com/c/go/+/583302
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
2024-05-15 13:27:41 +00:00
Paul E. Murphy
7994da4cc1 cmd/compile/internal/ssa: on PPC64, try combining CLRLSLDI and SRDconst into RLWINM
This provides a small performance bump to crc64 as measured on ppc64le/power10:

name              old time/op    new time/op    delta
Crc64/ISO64KB       49.6µs ± 0%    46.6µs ± 0%  -6.18%
Crc64/ISO4KB        3.16µs ± 0%    2.97µs ± 0%  -5.83%
Crc64/ISO1KB         840ns ± 0%     794ns ± 0%  -5.46%
Crc64/ECMA64KB      49.6µs ± 0%    46.5µs ± 0%  -6.20%
Crc64/Random64KB    53.1µs ± 0%    49.9µs ± 0%  -6.04%
Crc64/Random16KB    15.9µs ± 1%    15.0µs ± 0%  -5.73%

Change-Id: I302b5431c7dc46dfd2d211545c483bdcdfe011f1
Cq-Include-Trybots: luci.golang.try:gotip-linux-ppc64_power10,gotip-linux-ppc64_power8,gotip-linux-ppc64le_power8,gotip-linux-ppc64le_power9,gotip-linux-ppc64le_power10
Reviewed-on: https://go-review.googlesource.com/c/go/+/581937
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Eli Bendersky <eliben@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-05-03 21:12:29 +00:00
Than McIntosh
c686783cab cmd/compile/internal/ssa: delay rewrite cycle detection for huge funcs
The SSA rewrite pass has some logic that looks to see whether a
suspiciously large number of rewrites is happening, and if so, turns
on logic to try to detect rewrite cycles. The cycle detection logic is
quite expensive (hashes the entire function), meaning that for very
large functions we might get a successful compilation in a minute or
two with no cycle detection, but take a couple of hours once cycle
detection kicks in.

This patch moves from a fixed limit of 1000 iterations to a limit set
partially based on the size of the function (meaning that we'll wait
longer before turning cycle detection for a large func).

Fixes #66773.

Change-Id: I72f8524d706f15b3f0150baf6abeab2a5d3e15c4
Reviewed-on: https://go-review.googlesource.com/c/go/+/578215
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-04-17 19:39:19 +00:00
Keith Randall
6bf8b76b95 cmd/compile: don't assume args are always zero-extended
On amd64, we always zero-extend when loading arguments from the stack.
On arm64, we extend based on the type. This causes problems with
zeroUpper*Bits, which reports the top bits are zero when they aren't.

Fix it to use the type to decide if the top bits are really zero.

For tests, only f32 currently fails on arm64. Added other tests
just for future-proofing.

Update #66066

Change-Id: I2f13fb47198e139ef13c9a34eb1edc932eea3ee3
Reviewed-on: https://go-review.googlesource.com/c/go/+/571135
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-20 17:35:29 +00:00
Keith Randall
a46ecdca36 cmd/compile: fix sign/zero-extension removal
When an opcode generates a known high bit state (typically, a sub-word
operation that zeros the high bits), we can remove any subsequent
extension operation that would be a no-op.

x = (OP ...)
y = (ZeroExt32to64 x)

If OP zeros the high 32 bits, then we can replace y with x, as the
zero extension doesn't do anything.

However, x in this situation normally has a sub-word-sized type.  The
semantics of values in registers is typically that the high bits
beyond the value's type size are junk. So although the opcode
generating x *currently* zeros the high bits, after x is rewritten to
another opcode it may not - rewrites of sub-word-sized values can
trash the high bits.

To fix, move the extension-removing rules to late lower. That ensures
that their arguments won't be rewritten to change their high bits.

I am also worried about spilling and restoring. Spilling and restoring
doesn't preserve the high bits, but instead sets them to a known value
(often 0, but in some cases it could be sign-extended).  I am unable
to come up with a case that would cause a problem here, so leaving for
another time.

Fixes #66066

Change-Id: I3b5c091b3b3278ccbb7f11beda8b56f4b6d3fde7
Reviewed-on: https://go-review.googlesource.com/c/go/+/568616
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-03-12 19:38:41 +00:00
Cherry Mui
2908352980 cmd/compile: make jump table symbol static
The jump table symbol is accessed only from the function symbol
(in the same package), so it can be static. Also, if the function
is DUPOK and it is, somehow, compiled differently in two different
packages, the linker must choose the jump table symbol associated
to the function symbol it chose. Currently the jump table symbol
is DUPOK, so that is not guaranteed. Making it static will
guarantee that, as each copy of the function symbol refers to its
own jump table symbol.

For #65783.

Change-Id: I27e051d01ef585d07700b75d4dfac5768f16441e
Reviewed-on: https://go-review.googlesource.com/c/go/+/565535
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2024-02-20 22:40:04 +00:00
Joel Sing
daa58db486 cmd/compile: improve rotations for riscv64
Enable canRotate for riscv64, enable rotation intrinsics and provide
better rewrite implementations for rotations. By avoiding Lsh*x64
and Rsh*Ux64 we can produce better code, especially for 32 and 64
bit rotations. By enabling canRotate we also benefit from the generic
rotation rewrite rules.

Benchmark on a StarFive VisionFive 2:

               │   rotate.1   │              rotate.2               │
               │    sec/op    │   sec/op     vs base                │
RotateLeft-4     14.700n ± 0%   8.016n ± 0%  -45.47% (p=0.000 n=10)
RotateLeft8-4     14.70n ± 0%   10.69n ± 0%  -27.28% (p=0.000 n=10)
RotateLeft16-4    14.70n ± 0%   12.02n ± 0%  -18.23% (p=0.000 n=10)
RotateLeft32-4   13.360n ± 0%   8.016n ± 0%  -40.00% (p=0.000 n=10)
RotateLeft64-4   13.360n ± 0%   8.016n ± 0%  -40.00% (p=0.000 n=10)
geomean           14.15n        9.208n       -34.92%

Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2
Reviewed-on: https://go-review.googlesource.com/c/go/+/560315
Reviewed-by: M Zhuo <mengzhuo1203@gmail.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-02-16 11:59:07 +00:00
Paul E. Murphy
773039ed5c cmd/compile/internal/ssa: on PPC64, merge (CMPconst [0] (op ...)) more aggressively
Generate the CC version of many opcodes whose result is compared against
signed 0. The approach taken here works even if the opcode result is used in
multiple places too.

Add support for ADD, ADDconst, ANDN, SUB, NEG, CNTLZD, NOR conversions
to their CC opcode variant. These are the most commonly used variants.

Also, do not set clobberFlags of CNTLZD and CNTLZW, they do not clobber
flags.

This results in about 1% smaller text sections in kubernetes binaries,
and no regressions in the crypto benchmarks.

Change-Id: I9e0381944869c3774106bf348dead5ecb96dffda
Reviewed-on: https://go-review.googlesource.com/c/go/+/538636
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
2023-11-13 22:12:32 +00:00
Keith Randall
962ccbef91 cmd/compile: ensure pointer arithmetic happens after the nil check
Have nil checks return a pointer that is known non-nil. Users of
that pointer can use the result, ensuring that they are ordered
after the nil check itself.

The order dependence goes away after scheduling, when we've fixed
an order. At that point we move uses back to the original pointer
so it doesn't change regalloc any.

This prevents pointer arithmetic on nil from being spilled to the
stack and then observed by a stack scan.

Fixes #63657

Change-Id: I1a5fa4f2e6d9000d672792b4f90dfc1b7b67f6ea
Reviewed-on: https://go-review.googlesource.com/c/go/+/537775
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:45:54 +00:00
Keith Randall
43b57b8516 cmd/compile: handle constant pointer offsets in dead store elimination
Update #63657
Update #45573

Change-Id: I163c6038c13d974dc0ca9f02144472bc05331826
Reviewed-on: https://go-review.googlesource.com/c/go/+/538595
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2023-10-31 20:42:56 +00:00
Keith Randall
657c885fb9 cmd/compile: when combining stores, use line number of first store
var p *[2]uint32 = ...
p[0] = 0
p[1] = 0

When we combine these two 32-bit stores into a single 64-bit store,
use the line number of the first store, not the second one.
This differs from the default behavior because usually with the combining
that the compiler does, we use the line number of the last instruction
in the combo (e.g. load+add, we use the line number of the add).

This is the same behavior that gcc does in C (picking the line
number of the first of a set of combined stores).

Change-Id: Ie70bf6151755322d33ecd50e4d9caf62f7881784
Reviewed-on: https://go-review.googlesource.com/c/go/+/521678
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-10-12 18:09:26 +00:00
Paul E. Murphy
c8caad423c cmd/compile/internal/ssa: optimize (AND (MOVDconst [-1] x)) on PPC64
This sequence can show up in the lowering pass on PPC64. If it
makes it to the latelower pass, it will cause an error because
it looks like it can be turned into RLDICL, but -1 isn't an
accepted mask.

Also, print more debug info if panic is called from
encodePPC64RotateMask.

Fixes #62698

Change-Id: I0f3322e2205357abe7fc28f96e05e3f7ad65567c
Reviewed-on: https://go-review.googlesource.com/c/go/+/529195
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-22 14:09:29 +00:00
Paul E. Murphy
5cdb132228 cmd/compile/internal/ssa: improve masking codegen on PPC64
Generate RLDIC[LR] instead of MOVD mask, Rx; AND Rx, Ry, Rz.
This helps reduce code size, and reduces the latency caused
by the constant load.

Similarly, for smaller-than-register values, truncate constants
which exceed the range of the value's type to avoid needing to
load a constant.

Change-Id: I6019684795eb8962d4fd6d9585d08b17c15e7d64
Reviewed-on: https://go-review.googlesource.com/c/go/+/515576
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-09-06 16:34:20 +00:00
Alexander Yastrebov
8ffc931eae all: fix spelling errors
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored.

Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc
GitHub-Last-Rev: 3491615b1b
GitHub-Pull-Request: golang/go#60758
Reviewed-on: https://go-review.googlesource.com/c/go/+/502576
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-14 00:03:57 +00:00
Keith Randall
bd3f44e4ff cmd/compile: constant-fold loads from constant dictionaries and types
Retrying the original CL with a small modification. The original CL
did not handle the case of reading an itab out of a dictionary
correctly.  When we read an itab out of a dictionary, we must treat
the type inside that itab as maybe being put in an interface.

Original CL: 486895
Revert CL: 490156

Change-Id: Id2dc1699d184cd8c63dac83986a70b60b4e6cbd7
Reviewed-on: https://go-review.googlesource.com/c/go/+/491495
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-05-19 18:10:11 +00:00
Lynn Boger
4481042c43 cmd/compile: update rules to generate more prefixed instructions
This modifies some existing rules to allow more prefixed instructions
to be generated when using GOPPC64=power10. Some rules also check
if PCRel is available, which is currently supported for linux/ppc64le
and linux/ppc64 (internal linking only).

Prior to p10, DS-offset loads and stores had a 16 bit size limit for
the offset field. If the offset of the data for load or store was
beyond this range then an indexed load or store would be selected by
the rules.

In p10 the assembler can generate prefixed instructions in this case,
but does not if an indexed instruction was selected during the lowering
pass.

This allows many more cases to use prefixed loads or stores, reducing
function sizes and improving performance in some cases where the code
change happens in key loops.

For example in strconv BenchmarkAppendQuoteRune before:

  12c5e4:       15 00 10 06     pla     r10,1425660
  12c5e8:       fc c0 40 39
  12c5ec:       00 00 6a e8     ld      r3,0(r10)
  12c5f0:       10 00 aa e8     ld      r5,16(r10)

After this change:

  12a828:       15 00 10 04     pld     r3,1433272
  12a82c:       b8 de 60 e4
  12a830:       15 00 10 04     pld     r5,1433280
  12a834:       c0 de a0 e4

Performs better in the second case.

A testcase was added to verify that the rules correctly select a load or
store based on the offset and whether power10 or earlier.

Change-Id: I4335fed0bd9b8aba8a4f84d69b89f819cc464846
Reviewed-on: https://go-review.googlesource.com/c/go/+/477398
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Paul Murphy <murp@ibm.com>
2023-05-15 18:20:54 +00:00
Chressie Himpel
f046180890 Revert "cmd/compile: constant-fold loads from constant dictionaries and types"
This reverts CL 486895.

Reason for revert: This breaks internal tests at Google, see b/280035614.

Change-Id: I48772a44f5f6070a7f06b5704e9f9aa272b5f978
Reviewed-on: https://go-review.googlesource.com/c/go/+/490156
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Stapelberg <stapelberg@google.com>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Michael Stapelberg <stapelberg@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-04-28 11:15:01 +00:00
Keith Randall
635839a17a cmd/compile: constant-fold loads from constant dictionaries and types
Update #59591

Change-Id: Id250a7779c5b53776fff73f3e678fec54d92a8e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/486895
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-04-27 21:12:07 +00:00
Michael Pratt
598cf5e6ac cmd/compile: expose ir.Func to ssa
ssagen.ssafn already holds the ir.Func, and ssa.Frontend.SetWBPos and
ssa.Frontend.Lsym are simple wrappers around parts of the ir.Func.

Expose the ir.Func through ssa.Frontend, allowing us to remove these
wrapper methods and allowing future access to additional features of the
ir.Func if needed.

While we're here, drop ssa.Frontend.Line, which is unused.

For #58298.

Change-Id: I30c4cbd2743e9ad991d8c6b388484a7d1e95f3ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/484215
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2023-04-20 21:51:46 +00:00
Keith Randall
a3f3868c7a cmd/compile: replace isSigned(t) with t.IsSigned()
No change in semantics, just removing an unneeded helper.

Also align rules a bit.

Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483439
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-04-10 17:07:24 +00:00
Cuong Manh Le
07559ceb72 cmd/compile: mark negative size memclr non-inlineable
Fixes #59174

Change-Id: I72b2b068830b90d42a0186addd004fb3175b9126
Reviewed-on: https://go-review.googlesource.com/c/go/+/478375
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-03-22 16:43:10 +00:00
ruinan
4d180f71dc cmd/compile: omit redundant sign/unsign extension on arm64
On Arm64, all 32-bit instructions will ignore the upper 32 bits and
clear them to zero for the result. No need to do an unsign extend before
a 32 bit op.

This CL removes the redundant unsign extension only for the existing
32-bit opcodes, and also omits the sign extension when the upper bit of
the result can be predicted.

Fixes #42162

Change-Id: I61e6670bfb8982572430e67a4fa61134a3ea240a
CustomizedGitHooks: yes
Reviewed-on: https://go-review.googlesource.com/c/go/+/427454
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Eric Fang <eric.fang@arm.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-02-28 03:16:44 +00:00
Lynn Boger
ebe49f98c8 cmd/compile: inline constant sized memclrNoHeapPointers calls on PPC64
Update the function isInlinableMemclr for ppc64 and ppc64le
to enable inlining for the constant sized cases < 512.

Larger cases can use dcbz which performs better but requires
alignment checking so it is best to continue using memclrNoHeapPointers
for those cases.

Results on p10:

MemclrKnownSize1         2.07ns ± 0%     0.57ns ± 0%   -72.59%
MemclrKnownSize2         2.56ns ± 5%     0.57ns ± 0%   -77.82%
MemclrKnownSize4         5.15ns ± 0%     0.57ns ± 0%   -89.00%
MemclrKnownSize8         2.23ns ± 0%     0.57ns ± 0%   -74.57%
MemclrKnownSize16        2.23ns ± 0%     0.50ns ± 0%   -77.74%
MemclrKnownSize32        2.28ns ± 0%     0.56ns ± 0%   -75.28%
MemclrKnownSize64        2.49ns ± 0%     0.72ns ± 0%   -70.95%
MemclrKnownSize112       2.97ns ± 2%     1.14ns ± 0%   -61.72%
MemclrKnownSize128       4.64ns ± 6%     2.45ns ± 1%   -47.17%
MemclrKnownSize192       5.45ns ± 5%     2.79ns ± 0%   -48.87%
MemclrKnownSize248       4.51ns ± 0%     2.83ns ± 0%   -37.12%
MemclrKnownSize256       6.34ns ± 1%     3.58ns ± 0%   -43.53%
MemclrKnownSize512       3.64ns ± 0%     3.64ns ± 0%    -0.03%
MemclrKnownSize1024      4.73ns ± 0%     4.73ns ± 0%    +0.01%
MemclrKnownSize4096      17.1ns ± 0%     17.1ns ± 0%    +0.07%
MemclrKnownSize512KiB    2.12µs ± 0%     2.12µs ± 0%      ~     (all equal)

Change-Id: If1abf5749f4802c64523a41fe0058bd144d0ea46
Reviewed-on: https://go-review.googlesource.com/c/go/+/464340
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: Archana Ravindar <aravind5@in.ibm.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Reviewed-by: Carlos Eduardo Seo <carlos.seo@linaro.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2023-02-23 18:57:27 +00:00