Commit graph

192 commits

Author SHA1 Message Date
Julian Zhu
bdd51e7855 cmd/compile: use constant zero register instead of specialized zero instructions on mips64x
Refer to CL 633075, mips64x has a constant zero register that can be used to do this.

Change-Id: I7b60f9a9fe0015299f48b9219ba0eddd3c02e07a
Reviewed-on: https://go-review.googlesource.com/c/go/+/700935
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Mark Freeman <markfreeman@google.com>
2025-09-09 12:08:27 -07:00
limeidan
90b7d7aaa2 cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64
goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                  │     old      │                 new                  │
                  │    sec/op    │    sec/op     vs base                │
MulconstI32/3       0.8004n ± 0%   0.4247n ± 2%  -46.94% (p=0.000 n=10)
MulconstI32/5       0.8005n ± 0%   0.4256n ± 1%  -46.83% (p=0.000 n=10)
MulconstI32/12      1.2010n ± 0%   0.8005n ± 0%  -33.35% (p=0.000 n=10)
MulconstI32/120     0.8090n ± 0%   0.8067n ± 0%   -0.28% (p=0.007 n=10)
MulconstI32/-120    0.8109n ± 0%   0.8072n ± 0%   -0.47% (p=0.000 n=10)
MulconstI32/65537   0.8004n ± 0%   0.8004n ± 0%        ~ (p=1.000 n=10)
MulconstI32/65538   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.265 n=10)
MulconstI64/3       0.8005n ± 0%   0.4241n ± 1%  -47.02% (p=0.000 n=10)
MulconstI64/5       0.8004n ± 0%   0.4249n ± 1%  -46.91% (p=0.000 n=10)
MulconstI64/12      1.2010n ± 0%   0.8004n ± 0%  -33.36% (p=0.000 n=10)
MulconstI64/120     0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.635 n=10)
MulconstI64/-120    0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.837 n=10)
MulconstI64/65537   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.837 n=10)
MulconstI64/65538   0.8096n ± 0%   0.8004n ± 0%   -1.14% (p=0.000 n=10)
MulconstU32/3       0.8004n ± 0%   0.4263n ± 1%  -46.75% (p=0.000 n=10)
MulconstU32/5       0.8005n ± 0%   0.4262n ± 1%  -46.76% (p=0.000 n=10)
MulconstU32/12      1.2010n ± 0%   0.8005n ± 0%  -33.35% (p=0.000 n=10)
MulconstU32/120     0.8105n ± 0%   0.8096n ± 0%        ~ (p=0.183 n=10)
MulconstU32/65537   0.8004n ± 0%   0.8004n ± 0%        ~ (p=1.000 n=10)
MulconstU32/65538   0.8005n ± 0%   0.8005n ± 0%        ~ (p=1.000 n=10)
MulconstU64/3       0.8004n ± 0%   0.4265n ± 4%  -46.71% (p=0.000 n=10)
MulconstU64/5       0.8004n ± 0%   0.4256n ± 0%  -46.82% (p=0.000 n=10)
MulconstU64/12      1.2010n ± 0%   0.8004n ± 0%  -33.36% (p=0.000 n=10)
MulconstU64/120     0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.387 n=10)
MulconstU64/65537   0.8005n ± 0%   0.8005n ± 0%        ~ (p=0.265 n=10)
MulconstU64/65538   0.8080n ± 0%   0.8004n ± 0%   -0.93% (p=0.000 n=10)
geomean             0.8539n        0.6597n       -22.74%

Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357
Reviewed-on: https://go-review.googlesource.com/c/go/+/693855
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-08-12 23:01:49 -07:00
Xiaolin Zhao
e071617222 cmd/compile: optimize multiplication rules on loong64
Improve multiplication strength reduction, refer to CL 626998,
add additional 3 linear combination instructions for loong64.

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A6000-HV @ 2500.00MHz
                  |  bench.old   |              bench.new               |
                  |    sec/op    |    sec/op     vs base                |
MulconstI32/3       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI32/5       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI32/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstI32/120     1.6010n ± 0%   0.8130n ± 0%  -49.22% (p=0.000 n=10)
MulconstI32/-120    1.6010n ± 0%   0.8109n ± 0%  -49.35% (p=0.000 n=10)
MulconstI32/65537   1.6275n ± 0%   0.8005n ± 0%  -50.81% (p=0.000 n=10)
MulconstI32/65538   1.6290n ± 0%   0.8004n ± 0%  -50.87% (p=0.000 n=10)
MulconstI64/3       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstI64/5       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstI64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstI64/120     1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/-120    1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/65537   1.6270n ± 0%   0.8005n ± 0%  -50.80% (p=0.000 n=10)
MulconstI64/65538   1.6290n ± 0%   0.8071n ± 1%  -50.45% (p=0.000 n=10)
MulconstU32/3       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstU32/5       1.6010n ± 0%   0.8004n ± 0%  -50.01% (p=0.000 n=10)
MulconstU32/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU32/120     1.6010n ± 0%   0.8066n ± 0%  -49.62% (p=0.000 n=10)
MulconstU32/65537   1.6290n ± 0%   0.8005n ± 0%  -50.86% (p=0.000 n=10)
MulconstU32/65538   1.6280n ± 0%   0.8005n ± 0%  -50.83% (p=0.000 n=10)
MulconstU64/3       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/5       1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU64/120     1.6010n ± 0%   0.8005n ± 0%  -50.00% (p=0.000 n=10)
MulconstU64/65537   1.6290n ± 0%   0.8005n ± 0%  -50.86% (p=0.000 n=10)
MulconstU64/65538   1.6300n ± 0%   0.8067n ± 0%  -50.51% (p=0.000 n=10)
geomean              1.609n        0.8537n       -46.95%

goos: linux
goarch: loong64
pkg: cmd/compile/internal/test
cpu: Loongson-3A5000 @ 2500.00MHz
                  |  bench.old   |              bench.new               |
                  |    sec/op    |    sec/op     vs base                |
MulconstI32/3       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/5       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/12       1.601n ± 0%    1.202n ± 0%  -24.92% (p=0.000 n=10)
MulconstI32/120     1.6020n ± 0%   0.8012n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/-120    1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI32/65537   1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI32/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI64/3       1.6015n ± 0%   0.8007n ± 0%  -50.00% (p=0.000 n=10)
MulconstI64/5       1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/12       1.602n ± 0%    1.202n ± 0%  -25.00% (p=0.000 n=10)
MulconstI64/120     1.6030n ± 0%   0.8011n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/-120    1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstI64/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstI64/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/3       1.6010n ± 0%   0.8006n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/5       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/12       1.601n ± 0%    1.202n ± 0%  -24.92% (p=0.000 n=10)
MulconstU32/120     1.6010n ± 0%   0.8006n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU32/65538   1.6020n ± 0%   0.8009n ± 0%  -50.01% (p=0.000 n=10)
MulconstU64/3       1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU64/5       1.6010n ± 0%   0.8007n ± 0%  -49.98% (p=0.000 n=10)
MulconstU64/12       1.601n ± 0%    1.201n ± 0%  -24.98% (p=0.000 n=10)
MulconstU64/120     1.6020n ± 0%   0.8007n ± 0%  -50.02% (p=0.000 n=10)
MulconstU64/65537   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
MulconstU64/65538   1.6010n ± 0%   0.8007n ± 0%  -49.99% (p=0.000 n=10)
geomean              1.601n        0.8523n       -46.77%

Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875
Reviewed-on: https://go-review.googlesource.com/c/go/+/675455
Reviewed-by: Mark Freeman <mark@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Auto-Submit: Keith Randall <khr@golang.org>
2025-08-01 08:42:40 -07:00
Keith Randall
12110c3f7e cmd/compile: improve multiplication strength reduction
Use an automatic algorithm to generate strength reduction code.
You give it all the linear combination (a*x+b*y) instructions in your
architecture, it figures out the rest.

Just amd64 and arm64 for now.

Fixes #67575

Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1
Reviewed-on: https://go-review.googlesource.com/c/go/+/626998
Reviewed-by: Jakub Ciolek <jakub@ciolek.dev>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-05-01 09:33:31 -07:00
limeidan
09d76e59d2 cmd/compile: set unalignedOK to make memcombine work properly on loong64
goos: linux
goarch: loong64
pkg: unicode/utf8
cpu: Loongson-3A6000-HV @ 2500.00MHz
                            │     old     │                 new                 │
                            │   sec/op    │   sec/op     vs base                │
ValidTenASCIIChars            7.604n ± 0%   6.805n ± 0%  -10.51% (p=0.000 n=10)
Valid100KASCIIChars           37.41µ ± 0%   16.58µ ± 0%  -55.67% (p=0.000 n=10)
ValidTenJapaneseChars         60.84n ± 0%   58.62n ± 0%   -3.64% (p=0.000 n=10)
ValidLongMostlyASCII          113.5µ ± 0%   113.5µ ± 0%        ~ (p=0.303 n=10)
ValidLongJapanese             204.6µ ± 0%   206.8µ ± 0%   +1.07% (p=0.000 n=10)
ValidStringTenASCIIChars      7.604n ± 0%   6.803n ± 0%  -10.53% (p=0.000 n=10)
ValidString100KASCIIChars     38.05µ ± 0%   17.14µ ± 0%  -54.97% (p=0.000 n=10)
ValidStringTenJapaneseChars   60.58n ± 0%   59.48n ± 0%   -1.82% (p=0.000 n=10)
ValidStringLongMostlyASCII    113.5µ ± 0%   113.4µ ± 0%   -0.10% (p=0.000 n=10)
ValidStringLongJapanese       205.9µ ± 0%   207.3µ ± 0%   +0.67% (p=0.000 n=10)
geomean                       3.324µ        2.756µ       -17.08%

Change-Id: Id43b6e2e41907bd4b92f421dacde31f048db47d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/662495
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@google.com>
2025-04-09 09:18:20 -07:00
Russ Cox
26040b1dd7 cmd/compile: remove noDuffDevice
noDuffDevice was for Plan 9, but Plan 9 doesn't need it anymore.
It was also being set in s390x, mips, mipsle, and wasm, but
on those systems it had no effect since the SSA rules for those
architectures don't refer to it at all.

Change-Id: Ib85c0832674c714f3ad5091f0a022eb7cd3ebcdf
Reviewed-on: https://go-review.googlesource.com/c/go/+/655878
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
2025-03-12 05:40:38 -07:00
Russ Cox
c9b07e8871 cmd/compile: use FMA on plan9, and drop UseFMA
Every OS uses FMA now.

Change-Id: Ia7ffa77c52c45aefca611ddc54e9dfffb27a48da
Reviewed-on: https://go-review.googlesource.com/c/go/+/655877
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-03-12 05:40:34 -07:00
Russ Cox
35cb497d6e cmd/compile: remove useSSE
Every OS uses SSE now.

Change-Id: I4df7e2fbc8e5ccb1fc84a884d4c922b7a2a628e4
Reviewed-on: https://go-review.googlesource.com/c/go/+/655876
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2025-03-12 05:40:30 -07:00
Russ Cox
c18ff21cc8 cmd/compile, runtime: remove plan9 special case avoiding SSE
Change-Id: Id5258a72b0727bf7c66d558e30486eac2c6c8c36
Reviewed-on: https://go-review.googlesource.com/c/go/+/655875
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Russ Cox <rsc@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David du Colombier <0intro@gmail.com>
2025-03-11 14:11:35 -07:00
Cherry Mui
3b25b3c195 cmd/compile: remove residual register GC map code
We used to generate register GC maps as an experimental approach
for asynchronous preemption, which later we chose not to take.
Most of the register GC map code are already removed. One
exception is that the ssa.Register type still contains a field
for the register map index. Remove it.

Change-Id: Ib177ebce9548aa5ffbcaedd4b507240ea7df8afe
Reviewed-on: https://go-review.googlesource.com/c/go/+/651076
Reviewed-by: David Chase <drchase@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2025-02-20 12:51:47 -08:00
guoguangwu
cb6d15a747 cmd/compile/internal/ssa: fix typos in comment and log
Change-Id: Ic872bac2989ea1c83f31456eb334e6d756ebd7d1
GitHub-Last-Rev: d409884979
GitHub-Pull-Request: golang/go#66612
Reviewed-on: https://go-review.googlesource.com/c/go/+/575296
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: qiulaidongfeng <2645477756@qq.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Keith Randall <khr@golang.org>
2024-04-02 16:11:47 +00:00
Guoqi Chen
e5615ad876 cmd/compile, internal/buildcfg: enable regABI on loong64, and add loong64 in test func hasRegisterABI
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A5000 @ 2500.00MHz
                      │    bench.old   │    bench.new                         │
                      │    sec/op      │    sec/op      vs base               │
Template                  116.4m ± 1%     101.3m ± 0%   -12.94% (p=0.000 n=20)
Gzip                      417.2m ± 0%     419.4m ± 0%    +0.53% (p=0.000 n=20)
Gunzip                    87.41m ± 0%     84.61m ± 0%    -3.20% (p=0.000 n=20)
FmtFprintfEmpty           97.87n ± 0%     81.05n ± 0%   -17.19% (p=0.000 n=20)
FmtFprintfString          151.1n ± 0%     140.9n ± 0%    -6.75% (p=0.000 n=20)
FmtFprintfInt             155.6n ± 0%     143.0n ± 0%    -8.10% (p=0.000 n=20)
FmtFprintfIntInt          236.9n ± 0%     225.1n ± 0%    -5.00% (p=0.000 n=20)
FmtFprintfPrefixedInt     316.8n ± 0%     331.9n ± 0%    +4.77% (p=0.000 n=20)
FmtFprintfFloat           401.5n ± 0%     380.0n ± 0%    -5.35% (p=0.000 n=20)
FmtManyArgs               925.3n ± 0%     910.1n ± 0%    -1.64% (p=0.000 n=20)
BinaryTree17               14.04 ± 1%      12.84 ± 0%    -8.52% (p=0.000 n=20)
RegexpMatchEasy0_32       133.1n ± 0%     121.3n ± 0%    -8.87% (p=0.000 n=20)
RegexpMatchEasy0_1K       1.363µ ± 0%     1.337µ ± 0%    -1.91% (p=0.000 n=20)
RegexpMatchEasy1_32       162.7n ± 0%     152.6n ± 0%    -6.24% (p=0.000 n=20)
RegexpMatchEasy1_1K       1.505µ ± 0%     1.740µ ± 0%   +15.61% (p=0.000 n=20)
RegexpMatchMedium_32      1.429µ ± 0%     1.299µ ± 0%    -9.10% (p=0.000 n=20)
RegexpMatchMedium_1K      41.76µ ± 0%     38.16µ ± 0%    -8.61% (p=0.000 n=20)
RegexpMatchHard_32        2.094µ ± 0%     2.157µ ± 0%    +3.01% (p=0.000 n=20)
RegexpMatchHard_1K        63.25µ ± 0%     64.72µ ± 0%    +2.33% (p=0.000 n=20)
JSONEncode                18.00m ± 1%     17.46m ± 1%    -3.05% (p=0.000 n=20)
JSONDecode                79.49m ± 0%     72.42m ± 0%    -8.89% (p=0.000 n=20)
Revcomp                    1.147 ± 0%      1.255 ± 0%    +9.39% (p=0.000 n=20)
Fannkuch11                 3.623 ± 0%      3.410 ± 0%    -5.87% (p=0.000 n=20)
Fannkuch11                 3.623 ± 0%      3.410 ± 0%    -5.87% (p=0.000 n=20)
GobDecode                 14.26m ± 0%     12.92m ± 0%    -9.36% (p=0.000 n=20)
GobEncode                 16.86m ± 1%     14.96m ± 0%   -11.28% (p=0.000 n=20)
GoParse                   8.721m ± 0%     8.125m ± 1%    -6.84% (p=0.000 n=20)
Mandelbrot200             7.203m ± 0%     7.171m ± 0%    -0.44% (p=0.000 n=20)
HTTPClientServer          83.96µ ± 0%     80.83µ ± 0%    -3.72% (p=0.000 n=20)
TimeParse                 415.3n ± 0%     389.1n ± 0%    -6.31% (p=0.000 n=20)
TimeFormat                506.4n ± 0%     495.9n ± 0%    -2.06% (p=0.000 n=20)
geomean                   102.6µ          98.04µ         -4.40%

                      │   bench.old    │   bench.new                          │
                      │      B/s       │     B/s        vs base               │
Template                 15.90Mi ± 1%    18.26Mi ± 0%   +14.88% (p=0.000 n=20)
Gzip                     44.36Mi ± 0%    44.12Mi ± 0%    -0.53% (p=0.000 n=20)
Gunzip                   211.7Mi ± 0%    218.7Mi ± 0%    +3.31% (p=0.000 n=20)
RegexpMatchEasy0_32      229.3Mi ± 0%    251.6Mi ± 0%    +9.72% (p=0.000 n=20)
RegexpMatchEasy0_1K      716.4Mi ± 0%    730.3Mi ± 0%    +1.94% (p=0.000 n=20)
RegexpMatchEasy1_32      187.6Mi ± 0%    200.0Mi ± 0%    +6.64% (p=0.000 n=20)
RegexpMatchEasy1_1K      649.1Mi ± 0%    561.3Mi ± 0%   -13.52% (p=0.000 n=20)
RegexpMatchMedium_32     21.35Mi ± 0%    23.50Mi ± 0%   +10.05% (p=0.000 n=20)
RegexpMatchMedium_1K     23.38Mi ± 0%    25.59Mi ± 0%    +9.42% (p=0.000 n=20)
RegexpMatchHard_32       14.57Mi ± 0%    14.14Mi ± 0%    -2.95% (p=0.000 n=20)
RegexpMatchHard_1K       15.44Mi ± 0%    15.09Mi ± 0%    -2.29% (p=0.000 n=20)
JSONEncode               102.8Mi ± 1%    106.0Mi ± 1%    +3.15% (p=0.000 n=20)
JSONDecode               23.28Mi ± 0%    25.55Mi ± 0%    +9.75% (p=0.000 n=20)
Revcomp                  211.3Mi ± 0%    193.1Mi ± 0%    -8.58% (p=0.000 n=20)
GobDecode                51.34Mi ± 0%    56.64Mi ± 0%   +10.33% (p=0.000 n=20)
GobEncode                43.42Mi ± 1%    48.93Mi ± 0%   +12.71% (p=0.000 n=20)
GoParse                  6.337Mi ± 0%    6.800Mi ± 1%    +7.30% (p=0.000 n=20)
geomean                  61.24Mi         63.63Mi         +3.91%

Update #40724

Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: I5993460da8c5926c70cb6fbe551b8e4655dea9d0
Reviewed-on: https://go-review.googlesource.com/c/go/+/521790
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-11-21 20:24:52 +00:00
Guoqi Chen
ebca52eeb7 cmd/compile/internal: add register info for loong64 regABI
Update #40724

Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ifd7d94147b01e4fc83978b53dca2bcc0ad1ac4e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521779
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2023-11-21 19:04:14 +00:00
David Chase
a903639608 cmd/compile: adjust GOSSAFUNC html dumping to be more ABI-aware
Uses ,ABI instead of <ABI> because of problems with shell escaping
and windows file names, however if someone goes to all the trouble
of escaping the linker syntax and uses that instead, that works too.

Examples:
```
GOSSAFUNC=runtime.exitsyscall go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html
dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html

GOSSADIR=`pwd` GOSSAFUNC=runtime.exitsyscall go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/runtime.exitsyscall,0.html
dumped SSA for exitsyscall,1 to ../../src/loopvar/runtime.exitsyscall,1.html

GOSSAFUNC=runtime.exitsyscall,0 go build main.go
\# runtime
dumped SSA for exitsyscall,0 to ../../src/loopvar/ssa.html

GOSSAFUNC=runtime.exitsyscall\<1\> go build main.go
\# runtime
dumped SSA for exitsyscall,1 to ../../src/loopvar/ssa.html
```

Change-Id: Ia1138b61c797d0de49dbfae702dc306b9650a7f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/532475
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
2023-10-04 15:11:40 +00:00
Matthew Dempsky
5d9e0be159 cmd/compile/internal/ssa: replace Frontend.Auto with Func.NewLocal
Change-Id: I0858568d225daba1c318842dc0c9b5e652dff612
Reviewed-on: https://go-review.googlesource.com/c/go/+/526519
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-08 19:09:14 +00:00
Matthew Dempsky
5d6f835b3e cmd/compile/internal/ssagen: call AllocFrame after ssa.Compile
This indirection is no longer necessary.

Change-Id: Ibb5eb1753febdc17a93ea9c35130e3d2b26c360e
Reviewed-on: https://go-review.googlesource.com/c/go/+/526518
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-08 19:05:18 +00:00
Matthew Dempsky
45d3d10071 cmd/compile/internal/ssa: rename ssagen.TypeOK as CanSSA
No need to indirect through Frontend for this.

Change-Id: I5812eb4dadfda79267cabc9d13aeab126c1479e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/526517
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-09-08 19:03:54 +00:00
Matthew Dempsky
af8a2bde7b cmd/compile/internal/ssa: remove Frontend.MyImportPath
This method is only used to find the path of the function being
compiled for hash debugging, but it was instead returning the path of
the package being compiled. These are typically the same, but can be
different for certain functions compiled across package boundaries
(e.g., method value wrappers and generic functions).

It's redundant either with f.fe.Func().Sym().Pkg.Path (package path of
the function being compiled) or f.Config.ctxt.Pkgpath (package path of
the compilation unit), so just remove it instead.

Change-Id: I1daae09055043d0ecb1fcc874a0b0006a6f8bddf
Reviewed-on: https://go-review.googlesource.com/c/go/+/526516
Auto-Submit: Matthew Dempsky <mdempsky@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2023-09-08 19:01:07 +00:00
Alexander Yastrebov
8ffc931eae all: fix spelling errors
Fix spelling errors discovered using https://github.com/codespell-project/codespell. Errors in data files and vendored packages are ignored.

Change-Id: I83c7818222f2eea69afbd270c15b7897678131dc
GitHub-Last-Rev: 3491615b1b
GitHub-Pull-Request: golang/go#60758
Reviewed-on: https://go-review.googlesource.com/c/go/+/502576
Auto-Submit: Michael Pratt <mpratt@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-06-14 00:03:57 +00:00
Keith Randall
cedf5008a8 cmd/compile: introduce separate memory op combining pass
Memory op combining is currently done using arch-specific rewrite rules.
Instead, do them as a arch-independent rewrite pass. This ensures that
all architectures (with unaligned loads & stores) get equal treatment.

This removes a lot of rewrite rules.

The new pass is a bit more comprehensive. It handles things like out-of-order
writes and is careful not to apply partial optimizations that then block
further optimizations.

Change-Id: I780ff3bb052475cd725a923309616882d25b8d9e
Reviewed-on: https://go-review.googlesource.com/c/go/+/478475
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-04-21 21:05:46 +00:00
Michael Pratt
598cf5e6ac cmd/compile: expose ir.Func to ssa
ssagen.ssafn already holds the ir.Func, and ssa.Frontend.SetWBPos and
ssa.Frontend.Lsym are simple wrappers around parts of the ir.Func.

Expose the ir.Func through ssa.Frontend, allowing us to remove these
wrapper methods and allowing future access to additional features of the
ir.Func if needed.

While we're here, drop ssa.Frontend.Line, which is unused.

For #58298.

Change-Id: I30c4cbd2743e9ad991d8c6b388484a7d1e95f3ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/484215
Auto-Submit: Michael Pratt <mpratt@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Michael Pratt <mpratt@google.com>
2023-04-20 21:51:46 +00:00
Paul E. Murphy
bd6cd7db07 cmd/compile: fix PPC64 latelower enablement
The commit f841722853 needed an update for c0f27eb3d5. This
fixes the aforementioned commit.

Also, regenerate the lowering rules.

Change-Id: I2073d2e86af212dfe58bc832a1c04a8ef2a57621
Reviewed-on: https://go-review.googlesource.com/c/go/+/445155
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
2022-10-24 19:40:03 +00:00
Paul E. Murphy
f841722853 cmd/compile: enable lateLower pass on PPC64
This allows new rules to be added which would otherwise
greatly overcomplicate the generic rules, like CC opcode
conversion or zero register simplification.

Change-Id: I1533f0fa07815aff99ed8ab890077bd22a3bfbf5
Reviewed-on: https://go-review.googlesource.com/c/go/+/442595
Reviewed-by: Lynn Boger <laboger@linux.vnet.ibm.com>
Run-TryBot: Paul Murphy <murp@ibm.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
2022-10-24 18:59:50 +00:00
Joel Sing
c0f27eb3d5 cmd/compile/internal/ssa: wire up late lower block function
Currently, the lowerBlock function is reused with lateLowerValue, meaning
that any block rewriting rules in the late lower pass are silently ignored.
Change the late lower pass to actually use the lateLowerBlock function with
the lateLowerValue function.

Change-Id: Iaac1c2955bb27078378cac50cde3716e79a7d9f8
Reviewed-on: https://go-review.googlesource.com/c/go/+/444335
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-20 16:13:20 +00:00
Wayne Zuo
1c783f7c68 cmd/compile: split 3 operand LEA in late lower pass
On newer amd64 cpus 3 operand LEA instructions are slow, CL 114655 split
them to 2 LEA instructions in genssa.

This CL make late lower pass run after addressing modes, and split 3
operand LEA in late lower pass so that we can do common-subexpression
elimination for splited LEAs.

Updates #21735

Change-Id: Ied49139c7abab655e1a14a6fd793bdf9f987d1f1
Reviewed-on: https://go-review.googlesource.com/c/go/+/440035
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
2022-10-17 15:11:16 +00:00
Wayne Zuo
af668c689c cmd/compile: fold constant shift with extension on riscv64
For example:

  movb a0, a0
  srai $1, a0, a0

the assembler will expand to:

  slli $56, a0, a0
  srai $56, a0, a0
  srai $1, a0, a0

this CL optimize to:

  slli $56, a0, a0
  srai $57, a0, a0

Remove 270+ instructions from Go binary on linux/riscv64.

Change-Id: I375e19f9d3bd54f2781791d8cbe5970191297dc8
Reviewed-on: https://go-review.googlesource.com/c/go/+/428496
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-06 05:21:04 +00:00
eric fang
ddc7d2a80c cmd/compile: add late lower pass for last rules to run
Usually optimization rules have corresponding priorities, some need to
be run first, some run next, and some run last, which produces the best
code. But currently our optimization rules have no priority, this CL
adds a late lower pass that runs those rules that need to be run at last,
such as split unreasonable constant folding. This pass can be seen as
the second round of the lower pass.

For example:
func foo(a, b uint64) uint64 {
        d := a+0x1234568
        d1 := b+0x1234568
        return d&d1
}
The code generated by the master branch:
	0x0004 00004        ADD     $19088744, R0, R2 // movz+movk+add
	0x0010 00016        ADD     $19088744, R1, R1 // movz+movk+add
	0x001c 00028        AND     R1, R2, R0

This is because the current constant folding optimization rules do not
take into account the range of constants, causing the constant to be
loaded repeatedly. This CL splits these unreasonable constants folding
in the late lower pass. With this CL the generated code:
	0x0004 00004        MOVD    $19088744, R2 // movz+movk
	0x000c 00012        ADD     R0, R2, R3
	0x0010 00016        ADD     R1, R2, R1
	0x0014 00020        AND     R1, R3, R0

This CL also adds constant folding optimization for ADDS instruction.

In addition, in order not to introduce the codegen regression, an
optimization rule is added to change the addition of a negative number
into a subtraction of a positive number.

go1 benchmarks:
name                     old time/op    new time/op    delta
BinaryTree17-8              1.22s ± 1%     1.24s ± 0%  +1.56%  (p=0.008 n=5+5)
Fannkuch11-8                1.54s ± 0%     1.53s ± 0%  -0.69%  (p=0.016 n=4+5)
FmtFprintfEmpty-8          14.1ns ± 0%    14.1ns ± 0%    ~     (p=0.079 n=4+5)
FmtFprintfString-8         26.0ns ± 0%    26.1ns ± 0%  +0.23%  (p=0.008 n=5+5)
FmtFprintfInt-8            32.3ns ± 0%    32.9ns ± 1%  +1.72%  (p=0.008 n=5+5)
FmtFprintfIntInt-8         54.5ns ± 0%    55.5ns ± 0%  +1.83%  (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8    61.5ns ± 0%    62.0ns ± 0%  +0.93%  (p=0.008 n=5+5)
FmtFprintfFloat-8          72.0ns ± 0%    73.6ns ± 0%  +2.24%  (p=0.008 n=5+5)
FmtManyArgs-8               221ns ± 0%     224ns ± 0%  +1.22%  (p=0.008 n=5+5)
GobDecode-8                1.91ms ± 0%    1.93ms ± 0%  +0.98%  (p=0.008 n=5+5)
GobEncode-8                1.40ms ± 1%    1.39ms ± 0%  -0.79%  (p=0.032 n=5+5)
Gzip-8                      115ms ± 0%     117ms ± 1%  +1.17%  (p=0.008 n=5+5)
Gunzip-8                   19.4ms ± 1%    19.3ms ± 0%  -0.71%  (p=0.016 n=5+4)
HTTPClientServer-8         27.0µs ± 0%    27.3µs ± 0%  +0.80%  (p=0.008 n=5+5)
JSONEncode-8               3.36ms ± 1%    3.33ms ± 0%    ~     (p=0.056 n=5+5)
JSONDecode-8               17.5ms ± 2%    17.8ms ± 0%  +1.71%  (p=0.016 n=5+4)
Mandelbrot200-8            2.29ms ± 0%    2.29ms ± 0%    ~     (p=0.151 n=5+5)
GoParse-8                  1.35ms ± 1%    1.36ms ± 1%    ~     (p=0.056 n=5+5)
RegexpMatchEasy0_32-8      24.5ns ± 0%    24.5ns ± 0%    ~     (p=0.444 n=4+5)
RegexpMatchEasy0_1K-8       131ns ±11%     118ns ± 6%    ~     (p=0.056 n=5+5)
RegexpMatchEasy1_32-8      22.9ns ± 0%    22.9ns ± 0%    ~     (p=0.905 n=4+5)
RegexpMatchEasy1_1K-8       126ns ± 0%     127ns ± 0%    ~     (p=0.063 n=4+5)
RegexpMatchMedium_32-8      486ns ± 5%     483ns ± 0%    ~     (p=0.381 n=5+4)
RegexpMatchMedium_1K-8     15.4µs ± 1%    15.5µs ± 0%    ~     (p=0.151 n=5+5)
RegexpMatchHard_32-8        687ns ± 0%     686ns ± 0%    ~     (p=0.103 n=5+5)
RegexpMatchHard_1K-8       20.7µs ± 0%    20.7µs ± 1%    ~     (p=0.151 n=5+5)
Revcomp-8                   175ms ± 2%     176ms ± 3%    ~     (p=1.000 n=5+5)
Template-8                 20.4ms ± 6%    20.1ms ± 2%    ~     (p=0.151 n=5+5)
TimeParse-8                 112ns ± 0%     113ns ± 0%  +0.97%  (p=0.016 n=5+4)
TimeFormat-8                156ns ± 0%     145ns ± 0%  -7.14%  (p=0.029 n=4+4)

Change-Id: I3ced26e89041f873ac989586514ccc5ee09f13da
Reviewed-on: https://go-review.googlesource.com/c/go/+/425134
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Eric Fang <eric.fang@arm.com>
2022-10-05 02:40:56 +00:00
Cherry Mui
4bcc138bc6 cmd/compile, cmd/link: enable Duff's device on darwin/arm64
Duff's device was disabled on darwin/arm64 because the darwin
linker couldn't handle a branch relocation with non-zero addend.
This is no longer the case now. The darwin linker can handle it
just fine. So enable it.

Fixes #54189.

Change-Id: Ida7ebafe6eb01db1af5bb8ae60a62491da5eabdf
Reviewed-on: https://go-review.googlesource.com/c/go/+/420894
Reviewed-by: Eric Fang <eric.fang@arm.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-08 17:54:10 +00:00
Xiaodong Liu
756fcd8fc2 cmd/compile/internal/ssa: config functions used in lower pass for loong64
Contributors to the loong64 port are:
  Weining Lu <luweining@loongson.cn>
  Lei Wang <wanglei@loongson.cn>
  Lingqin Gong <gonglingqin@loongson.cn>
  Xiaolin Zhao <zhaoxiaolin@loongson.cn>
  Meidan Li <limeidan@loongson.cn>
  Xiaojuan Zhai <zhaixiaojuan@loongson.cn>
  Qiyuan Pu <puqiyuan@loongson.cn>
  Guoqi Chen <chenguoqi@loongson.cn>

This port has been updated to Go 1.15.6:
  https://github.com/loongson/go

Updates #46229

Change-Id: I50d20eb22f2108d245513de8ac95ebe0b7e1a1dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/367037
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Ian Lance Taylor <iant@google.com>
Run-TryBot: Ian Lance Taylor <iant@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: Ian Lance Taylor <iant@google.com>
2022-05-12 01:08:28 +00:00
Austin Clements
5f625de4d0 cmd/compile,cmd/internal/obj: replace Ctxt.FixedFrameSize method with Arch field
And delete now-unused FixedFrameSize methods.

Change-Id: Id257e1647dbeb4eb4ab866c53744010c4efeb953
Reviewed-on: https://go-review.googlesource.com/c/go/+/400819
Run-TryBot: Austin Clements <austin@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-04-19 15:59:22 +00:00
Keith Randall
1ba96d8c09 cmd/compile: implement jump tables
Performance is kind of hard to exactly quantify.

One big difference between jump tables and the old binary search
scheme is that there's only 1 branch statement instead of O(n) of
them. That can be both a blessing and a curse, and can make evaluating
jump tables very hard to do.

The single branch can become a choke point for the hardware branch
predictor. A branch table jump must fit all of its state in a single
branch predictor entry (technically, a branch target predictor entry).
With binary search that predictor state can be spread among lots of
entries. In cases where the case selection is repetitive and thus
predictable, binary search can perform better.

The big win for a jump table is that it doesn't consume so much of the
branch predictor's resources. But that benefit is essentially never
observed in microbenchmarks, because the branch predictor can easily
keep state for all the binary search branches in a microbenchmark. So
that benefit is really hard to measure.

So predictable switch microbenchmarks are ~useless - they will almost
always favor the binary search scheme. Fully unpredictable switch
microbenchmarks are better, as they aren't lying to us quite so
much. In a perfectly unpredictable situation, a jump table will expect
to incur 1-1/N branch mispredicts, where a binary search would incur
lg(N)/2 of them. That makes the crossover point at about N=4. But of
course switches in real programs are seldom fully unpredictable, so
we'll use a higher crossover point.

Beyond the branch predictor, jump tables tend to execute more
instructions per switch but have no additional instructions per case,
which also argues for a larger crossover.

As far as code size goes, with this CL cmd/go has a slightly smaller
code segment and a slightly larger overall size (from the jump tables
themselves which live in the data segment).

This is a case where some FDO (feedback-directed optimization) would
be really nice to have. #28262

Some large-program benchmarks might help make the case for this
CL. Especially if we can turn on branch mispredict counters so we can
see how much using jump tables can free up branch prediction resources
that can be gainfully used elsewhere in the program.

name                         old time/op  new time/op  delta
Switch8Predictable         1.89ns ± 2%  1.27ns ± 3%  -32.58%  (p=0.000 n=9+10)
Switch8Unpredictable       9.33ns ± 1%  7.50ns ± 1%  -19.60%  (p=0.000 n=10+9)
Switch32Predictable        2.20ns ± 2%  1.64ns ± 1%  -25.39%  (p=0.000 n=10+9)
Switch32Unpredictable      10.0ns ± 2%   7.6ns ± 2%  -24.04%  (p=0.000 n=10+10)

Fixes #5496
Update #34381

Change-Id: I3ff56011d02be53f605ca5fd3fb96b905517c34f
Reviewed-on: https://go-review.googlesource.com/c/go/+/357330
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@google.com>
2022-04-14 19:30:00 +00:00
Meng Zhuo
d3362fc124 cmd/compile: enable reg args on riscv64
This CL updates config.go to enable register args.

Change-Id: I00697fc3db23293be0f5bd2fe33fb0055eeab43e
Reviewed-on: https://go-review.googlesource.com/c/go/+/360217
Trust: mzh <mzh@golangcn.org>
Run-TryBot: mzh <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-04-07 06:23:40 +00:00
Paul E. Murphy
a4a0f9b148 cmd/compile: make XER allocatable register on PPC64
This is the first step towards decomposing aggregate operations
which create or consume the CA bit of the XER.

This helps optimize the canned sequence of Add64Carry (and
Sub64Borrow if it were implemented similarly) by minimizing
extraneous operations related to loading the CA bit,
reloading CA in chained operations, or extracting it when
unused.

Likewise, mark the operations which clobber CA.

Change-Id: I33e6dd2654a8cc39fcdbb9690a495f03558cdc97
Reviewed-on: https://go-review.googlesource.com/c/go/+/346869
Trust: Paul Murphy <murp@ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-29 18:55:09 +00:00
Meng Zhuo
caf5cd9da8 cmd/compile/internal: add ABI register info for riscv64
This CL adds register information for riscv64

Updates #40724

Change-Id: If2275d9135596ff856d096881e4fe8bd1eeaacb2
Reviewed-on: https://go-review.googlesource.com/c/go/+/359337
Trust: mzh <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: mzh <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-03-19 12:47:31 +00:00
Lynn Boger
24c2ee7b65 cmd/compile: enable reg args and add duffcopy support on ppc64x
This adds support for duffcopy on ppc64x and updates the
ssa/config.go file to enable register args and recognize
the duffDevice is available on ppc64x.

Change-Id: Ifc472cc9cc19c9a80e468fb52078c75f7dd44d36
Reviewed-on: https://go-review.googlesource.com/c/go/+/351490
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-23 15:51:39 +00:00
Lynn Boger
cceadf8527 cmd/compile/internal: add ABI register information for ppc64
This adds the defines for ABI registers on PPC64. Other changes
will need to be in place before they are enabled.

Updates #40724

Change-Id: Ia6ead140719eda9aa99b99c48afafff684c33039
Reviewed-on: https://go-review.googlesource.com/c/go/+/351110
Run-TryBot: Lynn Boger <laboger@linux.vnet.ibm.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Trust: Lynn Boger <laboger@linux.vnet.ibm.com>
2021-09-21 12:47:07 +00:00
Cherry Mui
656f0888b7 [dev.typeparams] cmd/compile: make softfloat mode work with register ABI
Previously, softfloat mode does not work with register ABI, mainly
because the compiler doesn't know how to pass floating point
arguments and results. According to the ABI it should be passed in
FP registers, but there isn't any in softfloat mode.

This CL makes it work. When softfloat is used, we define the ABI
as having 0 floating point registers (because there aren't any).
The integer registers are unchanged. So floating point arguments
and results are passed in memory.

Another option is to pass (the bit representation of) floating
point values in integer registers. But this complicates things
because it'd need to reorder integer argument registers.

Change-Id: Ibecbeccb658c10a868fa7f2dcf75138f719cc809
Reviewed-on: https://go-review.googlesource.com/c/go/+/327274
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-08-03 16:14:24 +00:00
Matthew Dempsky
dd95a4e3db [dev.typeparams] cmd/compile: simplify SSA devirtualization
This CL implements a few improvements to SSA devirtualization to make
it simpler and more general:

1. Change reflectdata.ITabAddr to now immediately generate the wrapper
functions and write out the itab symbol data. Previously, these were
each handled by separate phases later on.

2. Removes the hack in typecheck where we marked itabs that we
expected to need later. Instead, the calls to ITabAddr in walk now
handle generating the wrappers.

3. Changes the SSA interface call devirtualization algorithm to just
use the itab symbol data (namely, its relocations) to figure out what
pointer is available in memory at the given offset. This decouples it
somewhat from reflectdata.

Change-Id: I8fe06922af8f8a1e7c93f5aff2b60ff59b8e7114
Reviewed-on: https://go-review.googlesource.com/c/go/+/327871
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-06-16 20:57:38 +00:00
Cherry Mui
c93d5d1a52 [dev.typeparams] all: always enable regabig on AMD64
Always enable regabig on AMD64, which enables the G register and
the X15 zero register. Remove the fallback path.

Also remove the regabig GOEXPERIMENT. On AMD64 it is always
enabled (this CL). Other architectures already have a G register,
except for 386, where there are too few registers and it is
unlikely that we will reserve one. (If we really do, we can just
add a new experiment).

Change-Id: I229cac0060f48fe58c9fdaabd38d6fa16b8a0855
Reviewed-on: https://go-review.googlesource.com/c/go/+/327272
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-06-11 20:52:41 +00:00
Cherry Mui
963f33b03b [dev.typeparams] cmd/compile: enable register args on ARM64
Now it will be used for functions marked go:registerparams.

test/abi tests are passing with it.

Change-Id: I5af37ae6b79a1064832a42c7ef5f2cc0b5b6a342
Reviewed-on: https://go-review.googlesource.com/c/go/+/322854
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-05-27 16:38:12 +00:00
Cherry Mui
4bb927f82e [dev.typeparams] cmd/compile: define ARM64 parameter registers
Define the registers.

They are not really enabled for now. Otherwise the compiler will
start using them for go:registerparams functions and it is not
fully working. Some test will fail.

Now we can compile a simple Add function with registerparams
(with registers enabled).

Change-Id: Ifdfac931052c0196096a1dd8b0687b5fdedb14d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/322850
Trust: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-05-26 23:28:56 +00:00
David Chase
b38b1b2f9a cmd/compile: manage Slot array better
steals idea from CL 312093

further investigation revealed additional duplicate
slots (equivalent, but not equal), so delete those too.

Rearranged Func.Names to be addresses of slots,
create canonical addresses so that split slots
(which use those addresses to refer to their parent,
and split slots can be further split)
will preserve "equivalent slots are equal".

Removes duplicates, improves metrics for "args at entry".

Change-Id: I5bbdcb50bd33655abcab3d27ad8cdce25499faaf
Reviewed-on: https://go-review.googlesource.com/c/go/+/312292
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-08 17:03:18 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00
Austin Clements
eaa1ddee84 all: explode GOEXPERIMENT=regabi into 5 sub-experiments
This separates GOEXPERIMENT=regabi into five sub-experiments:
regabiwrappers, regabig, regabireflect, regabidefer, and regabiargs.
Setting GOEXPERIMENT=regabi now implies the working subset of these
(currently, regabiwrappers, regabig, and regabireflect).

This simplifies testing, helps derisk the register ABI project,
and will also help with performance comparisons.

This replaces the -abiwrap flag to the compiler and linker with
the regabiwrappers experiment.

As part of this, regabiargs now enables registers for all calls
in the compiler. Previously, this was statically disabled in
regabiEnabledForAllCompilation, but now that we can control it
independently, this isn't necessary.

For #40724.

Change-Id: I5171e60cda6789031f2ef034cc2e7c5d62459122
Reviewed-on: https://go-review.googlesource.com/c/go/+/302070
Trust: Austin Clements <austin@google.com>
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-03-18 16:51:27 +00:00
David Chase
97b32a6724 cmd/compile: better version of check frame offsets against abi
improved to run on more architectures.

this is in preparation for turning off calculation of frame offsets
in types.CalcSize.

Replaces https://go-review.googlesource.com/c/go/+/293392 .
Updates #44675.
For #40724.

Change-Id: I40ba496172447cf09b86bc646148859363c11ad9
Reviewed-on: https://go-review.googlesource.com/c/go/+/297637
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-03-02 15:26:33 +00:00
David Chase
74cac8d479 cmd/compile: add AMD64 parameter register defs, Arg ops, plumb to ssa.Config
This is partial plumbing recycled from the original register abi test work;
these are the parts that translate easily.  Some other bits are deferred till
later when they are ready to be used.

For #40724.

Change-Id: Ica8c55a4526793446189725a2bc3839124feb38f
Reviewed-on: https://go-review.googlesource.com/c/go/+/260539
Trust: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2021-02-23 18:14:42 +00:00
Cuong Manh Le
2a18e37c4e cmd/compile: remove backend's "scratch mem" support
This CL rebases CL 273987 on top of master with @mdempsky's permission.

The last (only?) use for this feature was 387 support, which was
removed in golang.org/cl/258957.

Change-Id: I4f79fee8d0c336c9b6082bcd5eb6ece52c032dc0
Reviewed-on: https://go-review.googlesource.com/c/go/+/292893
Trust: Cuong Manh Le <cuong.manhle.vn@gmail.com>
Run-TryBot: Cuong Manh Le <cuong.manhle.vn@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-02-23 06:01:42 +00:00
Cherry Zhang
5d7dc53888 [dev.regabi] cmd/compile, runtime: reserve R14 as g registers on AMD64
This is a proof-of-concept change for using the g register on
AMD64. getg is now lowered to R14 in the new ABI. The g register
is not yet used in all places where it can be used (e.g. stack
bounds check, runtime assembly code).

Change-Id: I10123ddf38e31782cf58bafcdff170aee0ff0d1b
Reviewed-on: https://go-review.googlesource.com/c/go/+/289196
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-02-08 16:30:07 +00:00
Cherry Zhang
401d7e5a24 [dev.regabi] cmd/compile: reserve X15 as zero register on AMD64
In ABIInternal, reserve X15 as constant zero, and use it to zero
memory. (Maybe there can be more use of it?)

The register is zeroed when transition to ABIInternal from ABI0.

Caveat: using X15 generates longer instructions than using X0.
Maybe we want to use X0?

Change-Id: I12d5ee92a01fc0b59dad4e5ab023ac71bc2a8b7d
Reviewed-on: https://go-review.googlesource.com/c/go/+/288093
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2021-02-03 22:44:53 +00:00
David Chase
f7dad5eae4 [dev.regabi] cmd/compile: remove leftover code form late call lowering work
It's no longer conditional.

Change-Id: I697bb0e9ffe9644ec4d2766f7e8be8b82d3b0638
Reviewed-on: https://go-review.googlesource.com/c/go/+/286013
Trust: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2021-01-26 18:35:19 +00:00