Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2026-04-19 18:30:36 +00:00

Author	SHA1	Message	Date
Keith Randall	80a2aae922	Revert "cmd/compile: improve stp merging for non-sequent cases" This reverts commit `4c63d798cb`. Reason for revert: Causes miscompilations. See issue 75365. Change-Id: Icd1fcfeb23d2ec524b16eb556030f43875e1c90d Reviewed-on: https://go-review.googlesource.com/c/go/+/702455 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com>	2025-09-10 11:11:11 -07:00
Youlin Feng	a5fa5ea51c	cmd/compile/internal/ssa: expand runtime.memequal for length {3,5,6,7} This CL slightly speeds up strings.HasPrefix when testing constant prefixes of length {3,5,6,7}. goos: linux goarch: amd64 cpu: Intel(R) Core(TM) i7-8650U CPU @ 1.90GHz │ old │ new │ │ sec/op │ sec/op vs base │ StringPrefix3-8 11.125n ± 2% 8.539n ± 1% -23.25% (p=0.000 n=20) StringPrefix5-8 11.170n ± 2% 8.700n ± 1% -22.11% (p=0.000 n=20) StringPrefix6-8 11.190n ± 2% 8.655n ± 1% -22.65% (p=0.000 n=20) StringPrefix7-8 11.095n ± 1% 8.878n ± 1% -19.98% (p=0.000 n=20) Change-Id: I510a80d59cf78680b57d68780d35d212d24030e2 Reviewed-on: https://go-review.googlesource.com/c/go/+/700816 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-09-09 12:10:07 -07:00
Melnikov Denis	4c63d798cb	cmd/compile: improve stp merging for non-sequent cases Original algorithm merges stores with the first mergeable store in the chain, but it misses some cases. Additional reordering stores in increasing order of memory access in the chain allows merging in these cases. Fixes #71987 There are the results of sweet benchmarks and the difference between sizes of sections .text │ old.results │ new.results │ │ sec/op │ sec/op vs base │ BleveIndexBatch100-4 7.614 ± 2% 7.548 ± 1% ~ (p=0.190 n=10) ESBuildThreeJS-4 821.3m ± 0% 819.0m ± 1% ~ (p=0.165 n=10) ESBuildRomeTS-4 206.2m ± 1% 204.4m ± 1% -0.90% (p=0.023 n=10) EtcdPut-4 64.89m ± 1% 64.94m ± 2% ~ (p=0.684 n=10) EtcdSTM-4 318.4m ± 0% 319.2m ± 1% ~ (p=0.631 n=10) GoBuildKubelet-4 157.4 ± 0% 157.6 ± 0% ~ (p=0.105 n=10) GoBuildKubeletLink-4 12.42 ± 2% 12.41 ± 1% ~ (p=0.529 n=10) GoBuildIstioctl-4 124.4 ± 0% 124.4 ± 0% ~ (p=0.579 n=10) GoBuildIstioctlLink-4 8.700 ± 1% 8.693 ± 1% ~ (p=0.912 n=10) GoBuildFrontend-4 46.52 ± 0% 46.50 ± 0% ~ (p=0.971 n=10) GoBuildFrontendLink-4 2.282 ± 1% 2.272 ± 1% ~ (p=0.529 n=10) GoBuildTsgo-4 75.02 ± 1% 75.31 ± 1% ~ (p=0.436 n=10) GoBuildTsgoLink-4 1.229 ± 1% 1.219 ± 1% -0.82% (p=0.035 n=10) GopherLuaKNucleotide-4 34.77 ± 5% 34.31 ± 1% -1.33% (p=0.015 n=10) MarkdownRenderXHTML-4 286.6m ± 0% 285.7m ± 1% ~ (p=0.315 n=10) Tile38QueryLoad-4 657.2µ ± 1% 660.3µ ± 0% ~ (p=0.436 n=10) geomean 2.570 2.563 -0.24% Executable Old .text New .text Change ------------------------------------------------------- benchmark 6504820 6504020 -0.01% bleve-index-bench 3903860 3903636 -0.01% esbuild 4801012 4801172 +0.00% esbuild-bench 1256404 1256340 -0.01% etcd 9188148 9187076 -0.01% etcd-bench 6462228 6461524 -0.01% go 5924468 5923892 -0.01% go-build-bench 1282004 1281940 -0.00% gopher-lua-bench 1639540 1639348 -0.01% markdown-bench 1478452 1478356 -0.01% tile38-bench 2753524 2753300 -0.01% tile38-server 10241380 10240068 -0.01% Change-Id: Ieb4fdfd656aca458f65fc45938de70550632bd13 Reviewed-on: https://go-review.googlesource.com/c/go/+/698097 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-09-09 12:10:01 -07:00
Xiaolin Zhao	f5b20689e9	cmd/compile: optimize loads from readonly globals into constants on loong64 Ref: CL 141118 Update #26498 Change-Id: I9c4ad2bedc4d50bd273bbe9119a898d4fca95e45 Reviewed-on: https://go-review.googlesource.com/c/go/+/700875 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-09-05 08:42:28 -07:00
Xiaolin Zhao	3492e4262b	cmd/compile: simplify specific addition operations using the ADDV16 instruction On loong64, the addi.d instruction can only directly handle 12-bit immediate numbers. If a larger immediate number needs to be processed, it must first be placed in a register, and then the add.d instruction is used to complete the processing of the larger immediate number. If a larger immediate number c satisfies is32Bit(c) && c&0xffff == 0, then the ADDV16 instruction can be used to complete the addition operation. Removes 164 instructions from the go binary on loong64. Change-Id: I404de93cc4eaaa12fe424f5a0d61b03231215d1a Reviewed-on: https://go-review.googlesource.com/c/go/+/700536 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Pratt <mpratt@google.com>	2025-09-05 08:18:04 -07:00
Youlin Feng	df29038486	cmd/compile/internal/ssa: load constant values from abi.PtrType.Elem This CL makes the generated code for reflect.TypeFor as simple as an intrinsic function. Fixes #75203 Change-Id: I7bb48787101f07e77ab5c583292e834c28a028d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/700336 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-09-04 07:25:26 -07:00
limeidan	bd71b94659	cmd/compile/internal: optimizing add+sll rule using ALSLV instruction on loong64 Reduce the number of go toolchain instructions on loong64 as follows: file before after Δ % go 1573148 1571708 -1,440 -0.0915% gofmt 320578 320090 -488 -0.1522% asm 555066 554406 -660 -0.1189% cgo 481566 480926 -640 -0.1329% compile 2475962 2473880 -2,082 -0.0841% cover 516536 515920 -616 -0.1193% link 702172 701404 -768 -0.1094% preprofile 238626 238274 -352 -0.1475% vet 792928 792100 -828 -0.1044% Change-Id: I61e462726835959c60e1b4e5256d4020202418ab Reviewed-on: https://go-review.googlesource.com/c/go/+/693877 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn>	2025-08-25 12:30:16 -07:00
Xiaolin Zhao	44c5956bf7	test/codegen: add Mul2 and DivPow2 test for loong64 Change-Id: I29ccd105c5418955146a3f4873162963da489a70 Reviewed-on: https://go-review.googlesource.com/c/go/+/697935 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org>	2025-08-24 18:14:28 -07:00
Xiaolin Zhao	0aa8019e94	test/codegen: add Mul* test for loong64 Change-Id: Ica285212e4884a96fe9738b53cdc789b223bf2e3 Reviewed-on: https://go-review.googlesource.com/c/go/+/697895 Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn>	2025-08-24 18:14:22 -07:00
Xiaolin Zhao	83420974b7	test/codegen: add sqrt* abs and copysign test for loong64 Change-Id: I645396fc4b00242f36a06f01550906805c0c1f73 Reviewed-on: https://go-review.googlesource.com/c/go/+/697955 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Carlos Amedee <carlos@golang.org>	2025-08-24 18:14:13 -07:00
limeidan	1843f1e9c0	cmd/compile: use zero register instead of specialized *zero instructions on loong64 Refer to CL 633075, loong64 has a zero(R0) register that can be used to do this. Change-Id: I846c6bdfcfd6dbfa18338afc13e34e350580ead4 Reviewed-on: https://go-review.googlesource.com/c/go/+/693876 Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org>	2025-08-21 11:23:05 -07:00
Xiaolin Zhao	9632ba8160	cmd/compile: optimize some patterns into revb2h/revb4h instruction on loong64 Pattern1: (the type of c is uint16) c>>8 \| c<<8 To: revb2h c Pattern2: (the type of c is uint32) (c & 0xff00ff00)>>8 \| (c & 0x00ff00ff)<<8 To: revb2h c Pattern3: (the type of c is uint64) (c & 0xff00ff00ff00ff00)>>8 \| (c & 0x00ff00ff00ff00ff)<<8 To: revb4h c Change-Id: Ic6231a3f476cbacbea4bd00e31193d107cb86cda Reviewed-on: https://go-review.googlesource.com/c/go/+/696335 Reviewed-by: Meidan Li <limeidan@loongson.cn> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-08-21 11:19:34 -07:00
Xiaolin Zhao	fa706ea50f	cmd/compile: optimize rule (x + x) << c to x << c+1 on loong64 Change-Id: I782f93510bba92ba60b298c1c1cde456c8bcec38 Reviewed-on: https://go-review.googlesource.com/c/go/+/697956 Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Carlos Amedee <carlos@golang.org>	2025-08-21 11:16:49 -07:00
Michael Munday	320df537cc	cmd/compile: emit classify instructions for infinity tests on riscv64 The 'classify' instruction on RISC-V sets a bit in a mask to indicate the class a floating point value belongs to (e.g. whether the value is an infinity, a normal number, a subnormal number and so on). There are other places this instruction is useful but for now I've just used it for infinity tests. The gains are relatively small (~1-2 instructions per IsInf call) but using FCLASSD does potentially unlock further optimizations. It also reduces the number of loads from memory and the number of moves between general purpose and floating point register files. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10) Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10) Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10) Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10) Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10) Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10) Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10) Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10) Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10) Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10) Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10) Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10) Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10) Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10) Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10) Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10) Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10) ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10) Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10) Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10) Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10) Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10) Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10) Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10) Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10) Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10) Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10) Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10) Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10) Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10) HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10) Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10) J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10) J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10) Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10) Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10) Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10) Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10) Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10) Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10) Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10) Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10) Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10) Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10) Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10) PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10) PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10) Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10) Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10) Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10) RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10) Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10) Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10) Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10) Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10) Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10) SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10) SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10) SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10) SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10) SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10) Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10) Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10) Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10) Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10) Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10) Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10) Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10) Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10) Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10) Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10) FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10) geomean 94.40n 89.43n -5.27% Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e Reviewed-on: https://go-review.googlesource.com/c/go/+/348389 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-08-13 20:33:56 -07:00
limeidan	90b7d7aaa2	cmd/compile/internal: optimize multiplication use new operation 'ADDshiftLLV' on loong64 goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz │ old │ new │ │ sec/op │ sec/op vs base │ MulconstI32/3 0.8004n ± 0% 0.4247n ± 2% -46.94% (p=0.000 n=10) MulconstI32/5 0.8005n ± 0% 0.4256n ± 1% -46.83% (p=0.000 n=10) MulconstI32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10) MulconstI32/120 0.8090n ± 0% 0.8067n ± 0% -0.28% (p=0.007 n=10) MulconstI32/-120 0.8109n ± 0% 0.8072n ± 0% -0.47% (p=0.000 n=10) MulconstI32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10) MulconstI32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10) MulconstI64/3 0.8005n ± 0% 0.4241n ± 1% -47.02% (p=0.000 n=10) MulconstI64/5 0.8004n ± 0% 0.4249n ± 1% -46.91% (p=0.000 n=10) MulconstI64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10) MulconstI64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.635 n=10) MulconstI64/-120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10) MulconstI64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.837 n=10) MulconstI64/65538 0.8096n ± 0% 0.8004n ± 0% -1.14% (p=0.000 n=10) MulconstU32/3 0.8004n ± 0% 0.4263n ± 1% -46.75% (p=0.000 n=10) MulconstU32/5 0.8005n ± 0% 0.4262n ± 1% -46.76% (p=0.000 n=10) MulconstU32/12 1.2010n ± 0% 0.8005n ± 0% -33.35% (p=0.000 n=10) MulconstU32/120 0.8105n ± 0% 0.8096n ± 0% ~ (p=0.183 n=10) MulconstU32/65537 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10) MulconstU32/65538 0.8005n ± 0% 0.8005n ± 0% ~ (p=1.000 n=10) MulconstU64/3 0.8004n ± 0% 0.4265n ± 4% -46.71% (p=0.000 n=10) MulconstU64/5 0.8004n ± 0% 0.4256n ± 0% -46.82% (p=0.000 n=10) MulconstU64/12 1.2010n ± 0% 0.8004n ± 0% -33.36% (p=0.000 n=10) MulconstU64/120 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.387 n=10) MulconstU64/65537 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.265 n=10) MulconstU64/65538 0.8080n ± 0% 0.8004n ± 0% -0.93% (p=0.000 n=10) geomean 0.8539n 0.6597n -22.74% Change-Id: Ie33e88985d7639f481bbba540bc917b9f185c357 Reviewed-on: https://go-review.googlesource.com/c/go/+/693855 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-08-12 23:01:49 -07:00
Keith Randall	f04421ea9a	cmd/compile: soften test for 74788 We now (as of CL 678620) use float registers other than X0 for copying. Change-Id: Ifdecd5df7519663742eed0f292c98453754d4b25 Reviewed-on: https://go-review.googlesource.com/c/go/+/695275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Dmitri Shuralyov <dmitshur@golang.org> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com>	2025-08-12 10:05:55 -07:00
Michael Munday	084c0f8494	cmd/compile: allow InlMark operations to be speculatively executed Although InlMark takes a memory argument it ultimately becomes a NOP and therefore is safe to speculatively execute. Fixes #74915 Change-Id: I64317dd433e300ac28de2bcf201845083ec2ac82 Reviewed-on: https://go-review.googlesource.com/c/go/+/693795 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2025-08-11 00:52:23 -07:00
Xiaolin Zhao	a552737418	cmd/compile: fold negation into multiplication on loong64 This change also add corresponding benchmark tests and codegen tests. The performance improvement on CPU Loongson-3A6000-HV is as follows: goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| MulNeg 828.4n ± 0% 655.9n ± 0% -20.82% (p=0.000 n=10) Mul2Neg 1062.0n ± 0% 826.8n ± 0% -22.15% (p=0.000 n=10) geomean 938.0n 736.4n -21.49% Change-Id: Ia999732880ec65be0c66cddc757a4868847e5b15 Reviewed-on: https://go-review.googlesource.com/c/go/+/682535 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com>	2025-08-05 18:02:06 -07:00
Michael Munday	fcc036f03b	cmd/compile: optimise float <-> int register moves on riscv64 Use the FMV* instructions to move values between the floating point and integer register files. Note: I'm unsure why there is a slowdown in the Float32bits benchmark, I've checked and an FMVXS instruction is being used as expected. There are multiple loads and other instructions in the main loop. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ fmv-before.txt │ fmv-after.txt │ │ sec/op │ sec/op vs base │ Acos 122.7n ± 0% 122.7n ± 0% ~ (p=1.000 n=10) Acosh 197.2n ± 0% 191.5n ± 0% -2.89% (p=0.000 n=10) Asin 122.7n ± 0% 122.7n ± 0% ~ (p=0.474 n=10) Asinh 231.0n ± 0% 224.1n ± 0% -2.99% (p=0.000 n=10) Atan 91.39n ± 0% 91.41n ± 0% ~ (p=0.465 n=10) Atanh 210.3n ± 0% 203.4n ± 0% -3.26% (p=0.000 n=10) Atan2 149.6n ± 0% 149.6n ± 0% ~ (p=0.721 n=10) Cbrt 176.5n ± 0% 165.9n ± 0% -6.01% (p=0.000 n=10) Ceil 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Copysign 3.756n ± 0% 3.756n ± 0% ~ (p=0.149 n=10) Cos 95.15n ± 0% 95.15n ± 0% ~ (p=0.374 n=10) Cosh 228.6n ± 0% 224.7n ± 0% -1.71% (p=0.000 n=10) Erf 115.2n ± 0% 115.2n ± 0% ~ (p=0.474 n=10) Erfc 116.4n ± 0% 116.4n ± 0% ~ (p=0.628 n=10) Erfinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Erfcinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Exp 194.1n ± 0% 190.3n ± 0% -1.93% (p=0.000 n=10) ExpGo 204.7n ± 0% 200.3n ± 0% -2.15% (p=0.000 n=10) Expm1 137.7n ± 0% 135.2n ± 0% -1.82% (p=0.000 n=10) Exp2 173.4n ± 0% 169.0n ± 0% -2.54% (p=0.000 n=10) Exp2Go 182.8n ± 0% 178.4n ± 0% -2.41% (p=0.000 n=10) Abs 3.756n ± 0% 3.756n ± 0% ~ (p=0.157 n=10) Dim 12.52n ± 0% 12.52n ± 0% ~ (p=0.737 n=10) Floor 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Max 21.29n ± 0% 20.03n ± 0% -5.92% (p=0.000 n=10) Min 21.28n ± 0% 20.04n ± 0% -5.85% (p=0.000 n=10) Mod 344.9n ± 0% 319.2n ± 0% -7.45% (p=0.000 n=10) Frexp 55.71n ± 0% 48.85n ± 0% -12.30% (p=0.000 n=10) Gamma 165.9n ± 0% 167.8n ± 0% +1.15% (p=0.000 n=10) Hypot 73.24n ± 0% 70.74n ± 0% -3.41% (p=0.000 n=10) HypotGo 84.50n ± 0% 82.63n ± 0% -2.21% (p=0.000 n=10) Ilogb 49.45n ± 0% 45.70n ± 0% -7.59% (p=0.000 n=10) J0 556.5n ± 0% 544.0n ± 0% -2.25% (p=0.000 n=10) J1 555.3n ± 0% 542.8n ± 0% -2.24% (p=0.000 n=10) Jn 1.181µ ± 0% 1.156µ ± 0% -2.12% (p=0.000 n=10) Ldexp 59.47n ± 0% 53.84n ± 0% -9.47% (p=0.000 n=10) Lgamma 167.2n ± 0% 154.6n ± 0% -7.51% (p=0.000 n=10) Log 160.9n ± 0% 154.6n ± 0% -3.92% (p=0.000 n=10) Logb 49.45n ± 0% 45.70n ± 0% -7.58% (p=0.000 n=10) Log1p 147.1n ± 0% 137.1n ± 0% -6.80% (p=0.000 n=10) Log10 162.1n ± 1% 154.6n ± 0% -4.63% (p=0.000 n=10) Log2 66.99n ± 0% 60.72n ± 0% -9.36% (p=0.000 n=10) Modf 29.42n ± 0% 26.29n ± 0% -10.64% (p=0.000 n=10) Nextafter32 41.95n ± 0% 37.88n ± 0% -9.70% (p=0.000 n=10) Nextafter64 38.82n ± 0% 33.49n ± 0% -13.73% (p=0.000 n=10) PowInt 252.3n ± 0% 237.3n ± 0% -5.95% (p=0.000 n=10) PowFrac 615.5n ± 0% 589.7n ± 0% -4.19% (p=0.000 n=10) Pow10Pos 10.64n ± 0% 10.64n ± 0% ~ (p=1.000 n=10) Pow10Neg 24.42n ± 0% 15.02n ± 0% -38.49% (p=0.000 n=10) Round 21.91n ± 0% 18.16n ± 0% -17.12% (p=0.000 n=10) RoundToEven 24.42n ± 0% 21.29n ± 0% -12.84% (p=0.000 n=10) Remainder 308.0n ± 0% 291.2n ± 0% -5.44% (p=0.000 n=10) Signbit 10.02n ± 0% 10.02n ± 0% ~ (p=1.000 n=10) Sin 102.7n ± 0% 102.7n ± 0% ~ (p=0.211 n=10) Sincos 124.0n ± 1% 123.3n ± 0% -0.56% (p=0.002 n=10) Sinh 239.1n ± 0% 234.7n ± 0% -1.84% (p=0.000 n=10) SqrtIndirect 2.504n ± 0% 2.504n ± 0% ~ (p=0.303 n=10) SqrtLatency 15.03n ± 0% 15.02n ± 0% ~ (p=0.598 n=10) SqrtIndirectLatency 15.02n ± 0% 15.02n ± 0% ~ (p=0.907 n=10) SqrtGoLatency 165.3n ± 0% 157.2n ± 0% -4.90% (p=0.000 n=10) SqrtPrime 3.801µ ± 0% 3.802µ ± 0% ~ (p=1.000 n=10) Tan 125.2n ± 0% 125.2n ± 0% ~ (p=0.458 n=10) Tanh 244.2n ± 0% 239.9n ± 0% -1.76% (p=0.000 n=10) Trunc 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Y0 550.2n ± 0% 538.1n ± 0% -2.21% (p=0.000 n=10) Y1 552.8n ± 0% 540.6n ± 0% -2.21% (p=0.000 n=10) Yn 1.168µ ± 0% 1.143µ ± 0% -2.14% (p=0.000 n=10) Float64bits 8.139n ± 0% 4.385n ± 0% -46.13% (p=0.000 n=10) Float64frombits 7.512n ± 0% 3.759n ± 0% -49.96% (p=0.000 n=10) Float32bits 8.138n ± 0% 9.393n ± 0% +15.42% (p=0.000 n=10) Float32frombits 7.513n ± 0% 3.757n ± 0% -49.98% (p=0.000 n=10) FMA 3.756n ± 0% 3.756n ± 0% ~ (p=0.246 n=10) geomean 77.43n 72.42n -6.47% Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667 Reviewed-on: https://go-review.googlesource.com/c/go/+/599235 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Munday <mikemndy@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-08-05 08:27:15 -07:00
Xiaolin Zhao	e071617222	cmd/compile: optimize multiplication rules on loong64 Improve multiplication strength reduction, refer to CL 626998, add additional 3 linear combination instructions for loong64. goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A6000-HV @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| MulconstI32/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI32/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstI32/120 1.6010n ± 0% 0.8130n ± 0% -49.22% (p=0.000 n=10) MulconstI32/-120 1.6010n ± 0% 0.8109n ± 0% -49.35% (p=0.000 n=10) MulconstI32/65537 1.6275n ± 0% 0.8005n ± 0% -50.81% (p=0.000 n=10) MulconstI32/65538 1.6290n ± 0% 0.8004n ± 0% -50.87% (p=0.000 n=10) MulconstI64/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstI64/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstI64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstI64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI64/-120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstI64/65537 1.6270n ± 0% 0.8005n ± 0% -50.80% (p=0.000 n=10) MulconstI64/65538 1.6290n ± 0% 0.8071n ± 1% -50.45% (p=0.000 n=10) MulconstU32/3 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstU32/5 1.6010n ± 0% 0.8004n ± 0% -50.01% (p=0.000 n=10) MulconstU32/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstU32/120 1.6010n ± 0% 0.8066n ± 0% -49.62% (p=0.000 n=10) MulconstU32/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10) MulconstU32/65538 1.6280n ± 0% 0.8005n ± 0% -50.83% (p=0.000 n=10) MulconstU64/3 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstU64/5 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstU64/120 1.6010n ± 0% 0.8005n ± 0% -50.00% (p=0.000 n=10) MulconstU64/65537 1.6290n ± 0% 0.8005n ± 0% -50.86% (p=0.000 n=10) MulconstU64/65538 1.6300n ± 0% 0.8067n ± 0% -50.51% (p=0.000 n=10) geomean 1.609n 0.8537n -46.95% goos: linux goarch: loong64 pkg: cmd/compile/internal/test cpu: Loongson-3A5000 @ 2500.00MHz \| bench.old \| bench.new \| \| sec/op \| sec/op vs base \| MulconstI32/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10) MulconstI32/120 1.6020n ± 0% 0.8012n ± 0% -49.99% (p=0.000 n=10) MulconstI32/-120 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI32/65537 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstI32/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI64/3 1.6015n ± 0% 0.8007n ± 0% -50.00% (p=0.000 n=10) MulconstI64/5 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstI64/12 1.602n ± 0% 1.202n ± 0% -25.00% (p=0.000 n=10) MulconstI64/120 1.6030n ± 0% 0.8011n ± 0% -50.02% (p=0.000 n=10) MulconstI64/-120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstI64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstI64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU32/3 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10) MulconstU32/5 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU32/12 1.601n ± 0% 1.202n ± 0% -24.92% (p=0.000 n=10) MulconstU32/120 1.6010n ± 0% 0.8006n ± 0% -49.99% (p=0.000 n=10) MulconstU32/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU32/65538 1.6020n ± 0% 0.8009n ± 0% -50.01% (p=0.000 n=10) MulconstU64/3 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU64/5 1.6010n ± 0% 0.8007n ± 0% -49.98% (p=0.000 n=10) MulconstU64/12 1.601n ± 0% 1.201n ± 0% -24.98% (p=0.000 n=10) MulconstU64/120 1.6020n ± 0% 0.8007n ± 0% -50.02% (p=0.000 n=10) MulconstU64/65537 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) MulconstU64/65538 1.6010n ± 0% 0.8007n ± 0% -49.99% (p=0.000 n=10) geomean 1.601n 0.8523n -46.77% Change-Id: I9fb0e47ca57875da171a347bf4828adfab41b875 Reviewed-on: https://go-review.googlesource.com/c/go/+/675455 Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-08-01 08:42:40 -07:00
Keith Randall	eb7f515c4d	cmd/compile: use generated loops instead of DUFFZERO on amd64 goarch: amd64 cpu: 12th Gen Intel(R) Core(TM) i7-12700 │ base │ exp │ │ sec/op │ sec/op vs base │ MemclrKnownSize112-20 1.270n ± 14% 1.006n ± 0% -20.72% (p=0.000 n=10) MemclrKnownSize128-20 1.266n ± 0% 1.005n ± 0% -20.58% (p=0.000 n=10) MemclrKnownSize192-20 1.771n ± 0% 1.579n ± 1% -10.84% (p=0.000 n=10) MemclrKnownSize248-20 4.034n ± 0% 3.520n ± 0% -12.75% (p=0.000 n=10) MemclrKnownSize256-20 2.269n ± 0% 2.014n ± 0% -11.26% (p=0.000 n=10) MemclrKnownSize512-20 4.280n ± 0% 4.030n ± 0% -5.84% (p=0.000 n=10) MemclrKnownSize1024-20 8.309n ± 1% 8.057n ± 0% -3.03% (p=0.000 n=10) Change-Id: I8f1627e2a1e981ff351dc7178932b32a2627f765 Reviewed-on: https://go-review.googlesource.com/c/go/+/678937 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-07-31 17:12:39 -07:00
Michael Munday	cedf63616a	cmd/compile: add floating point min/max intrinsics on s390x Add the VECTOR FP (MINIMUM\|MAXIMUM) instructions to the assembler and use them in the compiler to implement min and max. Note: I've allowed floating point registers to be used with the single element instructions (those with the W instead of V prefix) to allow easier integration into the compiler. Change-Id: I5f80a510bd248cf483cce95f1979bf63fbae7de6 Reviewed-on: https://go-review.googlesource.com/c/go/+/684715 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2025-07-30 12:29:15 -07:00
Youlin Feng	cc571dab91	cmd/compile: deduplicate instructions when rewrite func results After CL 628075, do not rely on the memory arg of an OpLocalAddr. Fixes #74788 Change-Id: I4e893241e3949bb8f2d93c8b88cc102e155b725d Reviewed-on: https://go-review.googlesource.com/c/go/+/691275 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: Mark Freeman <mark@golang.org>	2025-07-30 09:38:10 -07:00
Cuong Manh Le	bd94ae8903	cmd/compile: use unsigned power-of-two detector for unsigned mod Same as CL 689815, but for modulus instead of division. Updates #74485 Change-Id: I73000231c886a987a1093669ff207fd9117a8160 Reviewed-on: https://go-review.googlesource.com/c/go/+/689895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-07-29 16:22:40 -07:00
Cuong Manh Le	f3582fc80e	cmd/compile: add unsigned power-of-two detector Fixes #74485 Change-Id: Ia22a58ac43bdc36c8414d555672a3a3eafc749ca Reviewed-on: https://go-review.googlesource.com/c/go/+/689815 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Cuong Manh Le <cuong.manhle.vn@gmail.com>	2025-07-29 16:22:37 -07:00
Michael Munday	46b5839231	test/codegen: fix failing condmove wasm tests These recently added tests failed when using the -all_codgen flag. Fixes #74770 Change-Id: Idea1ea02af2bd9f45c7d0a28d633c7442328e6df Reviewed-on: https://go-review.googlesource.com/c/go/+/690715 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Run-TryBot: Michael Munday <mikemndy@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> TryBot-Bypass: Michael Knyszek <mknyszek@google.com>	2025-07-28 11:01:53 -07:00
Jorropo	ce05ad448f	cmd/compile: rewrite condselects into doublings and halvings For performance see CL 685676. This allows something like: if y { x *= 2 } To be compiled to: SHLXQ BX, AX, AX Instead of: MOVQ AX, CX SHLQ $1, CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 2 doublings and 4 halvings. Change-Id: Ic0727cbf429528a2dbf17cbfc3b0121db8387444 Reviewed-on: https://go-review.googlesource.com/c/go/+/685695 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-07-24 14:42:15 -07:00
Jorropo	fcd28070fe	cmd/compile: add opt branchelim to rewrite some CondSelect into math This allows something like: if y { x++ } To be compiled to: MOVBLZX BX, CX ADDQ CX, AX Instead of: LEAQ 1(AX), CX MOVBLZX BL, DX TESTQ DX, DX CMOVQNE CX, AX While ./make.bash uniqued per LOC, there is 100 additions and 75 substractions. See benchmark here: https://go.dev/play/p/DJf5COjwhd_s Either it's a performance no-op or it is faster: goos: linux goarch: amd64 cpu: AMD Ryzen 5 3600 6-Core Processor │ /tmp/old.logs │ /tmp/new.logs │ │ sec/op │ sec/op vs base │ CmovInlineConditionAddLatency-12 0.5443n ± 5% 0.5339n ± 3% -1.90% (p=0.004 n=10) CmovInlineConditionAddThroughputBy6-12 1.492n ± 1% 1.494n ± 1% ~ (p=0.955 n=10) CmovInlineConditionSubLatency-12 0.5419n ± 3% 0.5282n ± 3% -2.52% (p=0.019 n=10) CmovInlineConditionSubThroughputBy6-12 1.587n ± 1% 1.584n ± 2% ~ (p=0.492 n=10) CmovOutlineConditionAddLatency-12 0.5223n ± 1% 0.2639n ± 4% -49.47% (p=0.000 n=10) CmovOutlineConditionAddThroughputBy6-12 1.159n ± 1% 1.097n ± 2% -5.35% (p=0.000 n=10) CmovOutlineConditionSubLatency-12 0.5271n ± 3% 0.2654n ± 2% -49.66% (p=0.000 n=10) CmovOutlineConditionSubThroughputBy6-12 1.053n ± 1% 1.050n ± 1% ~ (p=1.000 n=10) geomean There are other benefits not tested by this benchmark: - the math form is usually a couple bytes shorter (ICACHE) - the math form is usually 0~2 uops shorter (UCACHE) - the math form has usually less register pressure* - the math form can sometimes be optimized further *regalloc rarely find how it can use less registers As far as pass ordering goes there are many possible options, I've decided to reorder branchelim before late opt since: - unlike running exclusively the CondSelect rules after branchelim, some extra optimizations might trigger on the adds or subs. - I don't want to maintain a second generic.rules file of only the stuff, that can trigger after branchelim. - rerunning all of opt a third time increase compilation time for little gains. By elimination moving branchelim seems fine. Change-Id: I869adf57e4d109948ee157cfc47144445146bafd Reviewed-on: https://go-review.googlesource.com/c/go/+/685676 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-07-24 14:42:10 -07:00
Alexander Musman	bd80f74bc1	cmd/compile: fold shift through AND for slice operations Fold a shift through AND when the AND gets a zero-or-one operand (e.g. from arithmetic shift by 63 of a 64-bit value) for a common case with slice operations: ASR $63, R2, R2 AND R3<<3, R2, R2 ADD R2, R0, R2 As the operands are 64-bit, we can transform it to: AND R2->63, R3, R2 ADD R2<<3, R0, R2 Code size improvement: compile: .text: 9088004 -> 9086292 (-0.02%) etcd: .text: 10500276 -> 10498964 (-0.01%) Change-Id: Ibcd5e67173da39b77ceff77ca67812fb8be5a7b5 Reviewed-on: https://go-review.googlesource.com/c/go/+/679895 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <mark@golang.org> Auto-Submit: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-07-24 13:47:20 -07:00
Alexander Musman	dcb479c2f9	cmd/compile: optimize slice bounds checking with SUB/SUBconst comparisons Optimize ARM64 code generation for slice bounds checking by recognizing patterns where comparisons to zero involve SUB or SUBconst operations. This change adds SSA opt rules to simplify: (CMPconst [0] (SUB x y)) => (CMP x y) The optimizations apply to EQ, NE, ULE, and UGT comparisons, enabling more efficient bounds checking for slice operations. Code size improvement: compile: .text: 9088004 -> 9065988 (-0.24%) etcd: .text: 10500276 -> 10497092 (-0.03%) Change-Id: I467cb27674351652bcacc52b87e1f19677bd46a8 Reviewed-on: https://go-review.googlesource.com/c/go/+/679915 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> Auto-Submit: Keith Randall <khr@golang.org>	2025-07-24 12:39:53 -07:00
Paul Murphy	ee7bfbdbcc	cmd/compile/internal/ssa: fix PPC64 merging of (AND (S[RL]Dconst ...) CL 622236 forgot to check the mask was also a 32 bit rotate mask. Add a modified version of isPPC64WordRotateMask which valids the mask is contiguous and fits inside a uint32. I don't this is possible when merging SRDconst, the first check should always reject such combines. But, be extra careful and do it there too. Fixes #73153 Change-Id: Ie95f74ec5e7d89dc761511126db814f886a7a435 Reviewed-on: https://go-review.googlesource.com/c/go/+/679775 Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Jayanth Krishnamurthy <jayanth.krishnamurthy@ibm.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-06-09 20:33:27 -07:00
Jake Bailey	27ff0f249c	cmd/compile/internal/ssa: eliminate string copies for calls to unique.Make unique.Make always copies strings passed into it, so it's safe to not copy byte slices converted to strings either. Handle this just like map accesses with string(b) as keys. This CL only handles unique.Make(string(b)), not nested cases like unique.Make([2]string{string(b1), string(b2)}); this could be done in a followup CL but the map lookup code in walk is sufficiently different than the call handling code that I didn't attempt it. (SSA is much easier). Fixes #71926 Change-Id: Ic2f82f2f91963d563b4ddb1282bd49fc40da8b85 Reviewed-on: https://go-review.googlesource.com/c/go/+/672135 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-21 20:20:31 -07:00
thepudds	f4de2ecffb	cmd/compile/internal/walk: convert composite literals to interfaces without allocating Today, this interface conversion causes the struct literal to be heap allocated: var sink any func example1() { sink = S{1, 1} } For basic literals like integers that are directly used in an interface conversion that would otherwise allocate, the compiler is able to use read-only global storage (see #18704). This CL extends that to struct and array literals as well by creating read-only global storage that is able to represent for example S{1, 1}, and then using a pointer to that storage in the interface when the interface conversion happens. A more challenging example is: func example2() { v := S{1, 1} sink = v } In this case, the struct literal is not directly part of the interface conversion, but is instead assigned to a local variable. To still avoid heap allocation in cases like this, in walk we construct a cache that maps from expressions used in interface conversions to earlier expressions that can be used to represent the same value (via ir.ReassignOracle.StaticValue). This is somewhat analogous to how we avoided heap allocation for basic literals in CL 649077 earlier in our stack, though here we also need to do a little more work to create the read-only global. CL 649076 (also earlier in our stack) added most of the tests along with debug diagnostics in convert.go to make it easier to test this change. See the writeup in #71359 for details. Fixes #71359 Fixes #71323 Updates #62653 Updates #53465 Updates #8618 Change-Id: I8924f0c69ff738ea33439bd6af7b4066af493b90 Reviewed-on: https://go-review.googlesource.com/c/go/+/649555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-05-21 12:23:26 -07:00
Junyang Shao	d6c29c7156	cmd/compile: fix offset calculation error in memcombine Fixes #73812 Change-Id: If7a6e103ae9e1442a2cf4a3c6b1270b6a1887196 Reviewed-on: https://go-review.googlesource.com/c/go/+/675175 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-21 12:17:08 -07:00
Xiaolin Zhao	4ce1c8e9e1	cmd/compile: add rules about ORN and ANDN Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279880 279776 -104 -0.0372% asm 556638 556410 -228 -0.0410% buildid 272272 272072 -200 -0.0735% cgo 481522 481318 -204 -0.0424% compile 2457788 2457580 -208 -0.0085% covdata 323384 323280 -104 -0.0322% cover 518450 518234 -216 -0.0417% dist 340790 340686 -104 -0.0305% distpack 282456 282252 -204 -0.0722% doc 789932 789688 -244 -0.0309% fix 324332 324228 -104 -0.0321% link 704622 704390 -232 -0.0329% nm 277132 277028 -104 -0.0375% objdump 507862 507758 -104 -0.0205% pack 221774 221674 -100 -0.0451% pprof 1469816 1469552 -264 -0.0180% test2json 254836 254732 -104 -0.0408% trace 1100002 1099738 -264 -0.0240% vet 781078 780874 -204 -0.0261% go 1529116 1528848 -268 -0.0175% gofmt 318556 318448 -108 -0.0339% total 13792238 13788566 -3672 -0.0266% Change-Id: I23fb3ebd41309252c7075e57ea7094e79f8c4fef Reviewed-on: https://go-review.googlesource.com/c/go/+/674335 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Auto-Submit: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meidan Li <limeidan@loongson.cn>	2025-05-21 08:28:37 -07:00
Xiaolin Zhao	d37a1bdd48	cmd/compile: fix the implementation of NORconst on loong64 In the loong64 instruction set, there is no NORI instruction, so the immediate value in NORconst need to be stored in register and then use the three-register NOR instruction. Change-Id: I5ef697450619317218cb3ef47fc07e238bdc2139 Reviewed-on: https://go-review.googlesource.com/c/go/+/673836 Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-20 20:24:09 -07:00
Junyang Shao	113b25774e	cmd/compile: memcombine different size stores This CL implements the TODO in combineStores to allow combining stores of different sizes, as long as the total size aligns to 2, 4, 8. Fixes #72832. Change-Id: I6d1d471335da90d851ad8f3b5a0cf10bdcfa17c4 Reviewed-on: https://go-review.googlesource.com/c/go/+/661855 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Junyang Shao <shaojunyang@google.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-20 13:00:16 -07:00
Julian Zhu	dfebef1c04	cmd/compile: fold negation into addition/subtraction on arm64 Fold negation into addition/subtraction and avoid double negation. platform: linux/arm64 file before after Δ % addr2line 3628108 3628116 +8 +0.000% asm 6208353 6207857 -496 -0.008% buildid 3460682 3460418 -264 -0.008% cgo 5572988 5572492 -496 -0.009% compile 26042159 26041039 -1120 -0.004% cover 6304328 6303472 -856 -0.014% dist 4139330 4139098 -232 -0.006% doc 9429305 9428065 -1240 -0.013% fix 3997189 3996733 -456 -0.011% link 8212128 8210280 -1848 -0.023% nm 3620056 3619696 -360 -0.010% objdump 5920289 5919233 -1056 -0.018% pack 2892250 2891778 -472 -0.016% pprof 17094569 17092745 -1824 -0.011% test2json 3335825 3335529 -296 -0.009% trace 15842080 15841456 -624 -0.004% vet 9472194 9471106 -1088 -0.011% go 19081541 19081509 -32 -0.000% total 154253374 154240622 -12752 -0.008% platform: darwin/arm64 file before after Δ % compile 27152002 27135490 -16512 -0.061% link 8372914 8356402 -16512 -0.197% go 19154802 19154778 -24 -0.000% total 157734180 157701132 -33048 -0.021% Change-Id: I15a349bfbaf7333ec3e4a62ae4d06f3f371dfb1d Reviewed-on: https://go-review.googlesource.com/c/go/+/673715 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-20 11:08:28 -07:00
Keith Randall	3baf53aec6	cmd/compile: derive bounds on signed %N for N a power of 2 -N+1 <= x % N <= N-1 This is useful for cases like: func setBit(b []byte, i int) { b[i/8] \|= 1<<(i%8) } The shift does not need protection against larger-than-7 cases. (It does still need protection against <0 cases.) Change-Id: Idf83101386af538548bfeb6e2928cea855610ce2 Reviewed-on: https://go-review.googlesource.com/c/go/+/672995 Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-05-19 15:21:54 -07:00
Julian Zhu	d52679006c	cmd/compile: fold negation into addition/subtraction on mipsx Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 3742022 3741986 -36 -0.001% asm 6668616 6668628 +12 +0.000% buildid 3583786 3583630 -156 -0.004% cgo 6020370 6019634 -736 -0.012% compile 29416016 29417336 +1320 +0.004% cover 6801903 6801675 -228 -0.003% dist 4485916 4485816 -100 -0.002% doc 10652787 10652251 -536 -0.005% fix 4115988 4115560 -428 -0.010% link 9002328 9001616 -712 -0.008% nm 3733148 3732780 -368 -0.010% objdump 6163292 6163068 -224 -0.004% pack 2944768 2944604 -164 -0.006% pprof 18909973 18908773 -1200 -0.006% test2json 3394662 3394778 +116 +0.003% trace 17350911 17349751 -1160 -0.007% vet 10077727 10077527 -200 -0.002% go 19118769 19118609 -160 -0.001% total 166182982 166178022 -4960 -0.003% Change-Id: Id55698800fd70f3cb2ff48393584456b87208921 Reviewed-on: https://go-review.googlesource.com/c/go/+/673556 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2025-05-19 11:27:35 -07:00
Julian Zhu	8097cf14d2	cmd/compile: fold negation into addition/subtraction on mips64x Fold negation into addition/subtraction and avoid double negation. file before after Δ % addr2line 4007310 4007470 +160 +0.004% asm 7007636 7007436 -200 -0.003% buildid 3839268 3838972 -296 -0.008% cgo 6353466 6352738 -728 -0.011% compile 30426920 30426896 -24 -0.000% cover 7005408 7004744 -664 -0.009% dist 4651192 4650872 -320 -0.007% doc 10606050 10606034 -16 -0.000% fix 4446414 4446390 -24 -0.001% link 9237736 9237024 -712 -0.008% nm 3999107 3999323 +216 +0.005% objdump 6762424 6762144 -280 -0.004% pack 3270757 3270493 -264 -0.008% pprof 19428299 19361939 -66360 -0.342% test2json 3717345 3717217 -128 -0.003% trace 17382273 17381657 -616 -0.004% vet 10689481 10688985 -496 -0.005% go 19118769 19118609 -160 -0.001% total 171949855 171878943 -70912 -0.041% Change-Id: I35c1f264d216c214ea3f56252a9ddab8ea850fa6 Reviewed-on: https://go-review.googlesource.com/c/go/+/673555 Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-05-16 11:06:06 -07:00
Keith Randall	d681270714	cmd/compile: allow load-op merging in additional situations x += p We want to do this with a single load+add operation on amd64. The tricky part is that we don't want to combine if there are other uses of x after this instruction. Implement a simple detector that seems to capture a common situation - x += p is in a loop, and the other use of x is after loop exit. In that case, it does not hurt to do the load+add combo. Change-Id: I466174cce212e78bde83f908cc1f2752b560c49c Reviewed-on: https://go-review.googlesource.com/c/go/+/672957 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-15 15:21:36 -07:00
Keith Randall	19f05770b0	cmd/compile: schedule induction variable increments late for ..; ..; i++ { ... } We want to schedule the i++ late in the block, so that all other uses of i in the block are scheduled first. That way, i++ can happen in place in a register instead of requiring a temporary register. Change-Id: Id777407c7e67a5ddbd8e58251099b0488138c0df Reviewed-on: https://go-review.googlesource.com/c/go/+/672998 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-05-15 14:06:41 -07:00
Xiaolin Zhao	c31a5c571f	cmd/compile: fold negation into addition/subtraction on loong64 This change also avoid double negation, and add loong64 codegen for arithmetic tests. Reduce the number of go toolchain instructions on loong64 as follows. file before after Δ % addr2line 279972 279896 -76 -0.0271% asm 556390 556310 -80 -0.0144% buildid 272376 272300 -76 -0.0279% cgo 481534 481550 +16 +0.0033% compile 2457992 2457396 -596 -0.0242% covdata 323488 323404 -84 -0.0260% cover 518630 518490 -140 -0.0270% dist 340894 340814 -80 -0.0235% distpack 282568 282484 -84 -0.0297% doc 790224 789984 -240 -0.0304% fix 324408 324348 -60 -0.0185% link 704910 704666 -244 -0.0346% nm 277220 277144 -76 -0.0274% objdump 508026 507878 -148 -0.0291% pack 221810 221786 -24 -0.0108% pprof 1470284 1469880 -404 -0.0275% test2json 254896 254852 -44 -0.0173% trace 1100390 1100074 -316 -0.0287% vet 781398 781142 -256 -0.0328% go 1529668 1529128 -540 -0.0353% gofmt 318668 318568 -100 -0.0314% total 13795746 13792094 -3652 -0.0265% Change-Id: I88d1f12cfc4be0e92687c48e06a57213aa484aca Reviewed-on: https://go-review.googlesource.com/c/go/+/672555 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: abner chenc <chenguoqi@loongson.cn> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2025-05-14 17:46:58 -07:00
Jakub Ciolek	c9d0fad5cb	cmd/compile: add 2 phiopt cases Add 2 more cases: if a { x = value } else { x = a } => x = a && value if a { x = a } else { x = value } => x = a \|\| value AND case goes from: 00006 (8) TESTB AX, AX 00007 (8) JNE 9 00008 (13) MOVL AX, BX 00009 (13) MOVL BX, AX 00010 (13) RET to: 00006 (13) ANDL BX, AX 00007 (13) RET OR goes from: 00006 (19) TESTB AX, AX 00007 (19) JNE 9 00008 (24) MOVL BX, AX 00009 (24) RET to: 00006 (24) ORL BX, AX 00007 (24) RET compilecmp linux/amd64: runtime runtime.lock2 847 -> 869 (+2.60%) runtime.addspecial 542 -> 517 (-4.61%) runtime.tracebackPCs changed runtime.scanstack changed runtime.mallocinit changed runtime.traceback2 2238 -> 2206 (-1.43%) runtime [cmd/compile] runtime.lock2 860 -> 882 (+2.56%) runtime.scanstack changed runtime.addspecial 542 -> 517 (-4.61%) runtime.traceback2 2238 -> 2206 (-1.43%) runtime.lockWithRank 870 -> 890 (+2.30%) runtime.tracebackPCs changed runtime.mallocinit changed strconv strconv.ryuFtoaFixed32 changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.readFloat changed strconv.ryuFtoaShortest changed strings strings.(Replacer).build changed strconv [cmd/compile] strconv.readFloat changed strconv.ryuFtoaFixed64 639 -> 638 (-0.16%) strconv.ryuFtoaFixed32 changed strconv.ryuFtoaShortest changed strings [cmd/compile] strings.(Replacer).build changed regexp regexp.makeOnePass.func1 changed regexp [cmd/compile] regexp.makeOnePass.func1 changed encoding/json encoding/json.indirect changed database/sql database/sql.driverArgsConnLocked changed vendor/golang.org/x/text/unicode/norm vendor/golang.org/x/text/unicode/norm.Form.transform changed go/doc/comment go/doc/comment.parseSpans changed internal/diff internal/diff.tgs changed log/slog log/slog.(handleState).appendNonBuiltIns 1898 -> 1877 (-1.11%) testing/fstest testing/fstest.(fsTester).checkGlob changed runtime/pprof runtime/pprof.(profileBuilder).build changed cmd/internal/dwarf cmd/internal/dwarf.isEmptyInlinedCall 254 -> 244 (-3.94%) go/printer go/printer.keepTypeColumn 302 -> 270 (-10.60%) go/printer.(printer).binaryExpr changed cmd/compile/internal/syntax cmd/compile/internal/syntax.(scanner).rune changed cmd/compile/internal/syntax.(scanner).number 2137 -> 2153 (+0.75%) Change-Id: I7f95f54b03a35d0b616c40f38b415a7feb71be73 Reviewed-on: https://go-review.googlesource.com/c/go/+/666835 Reviewed-by: Keith Randall <khr@golang.org> Auto-Submit: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Jakub Ciolek <jakub@ciolek.dev> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-08 10:18:37 -07:00
Keith Randall	12110c3f7e	cmd/compile: improve multiplication strength reduction Use an automatic algorithm to generate strength reduction code. You give it all the linear combination (ax+by) instructions in your architecture, it figures out the rest. Just amd64 and arm64 for now. Fixes #67575 Change-Id: I35c69382bebb1d2abf4bb4e7c43fd8548c6c59a1 Reviewed-on: https://go-review.googlesource.com/c/go/+/626998 Reviewed-by: Jakub Ciolek <jakub@ciolek.dev> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-01 09:33:31 -07:00
Joel Sing	4d10d4ad84	cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-01 05:57:41 -07:00
Joel Sing	90e8b8cdae	cmd/compile: intrinsify math/bits.Bswap on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap using the REV8 machine instruction. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ rb.1 │ rb.2 │ │ sec/op │ sec/op vs base │ ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10) ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10) ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10) ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10) geomean 13.11n 4.649n -64.54% Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece Reviewed-on: https://go-review.googlesource.com/c/go/+/660855 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>	2025-05-01 05:57:13 -07:00
Keith Randall	7d0cb2a2ad	cmd/compile: constant fold 128-bit multiplies The full 64x64->128 multiply comes up when using bits.Mul64. The 64x64->64+overflow multiply comes up in unsafe.Slice when using a constant length. Change-Id: I298515162ca07d804b2d699d03bc957ca30a4ebc Reviewed-on: https://go-review.googlesource.com/c/go/+/667175 Reviewed-by: Junyang Shao <shaojunyang@google.com> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-04-22 10:24:18 -07:00
Keith Randall	8af32240c6	cmd/compile: don't evaluate side effects of range over array If the thing we're ranging over is an array or ptr to array, and it doesn't have a function call or channel receive in it, then we shouldn't evaluate it. Typecheck the ranged-over value as a constant in that case. That makes the unified exporter replace the range expression with a constant int. Change-Id: I0d4ea081de70d20cf6d1fa8d25ef6cb021975554 Reviewed-on: https://go-review.googlesource.com/c/go/+/659317 Reviewed-by: Junyang Shao <shaojunyang@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Robert Griesemer <gri@google.com>	2025-04-21 15:50:43 -07:00

1 2 3 4 5 ...

592 commits