Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Meng Zhuo	d7a52f9369	cmd/compile: use MOV(D\|F) with const for Const(64\|32)F on riscv64 The original Const64F using: AUIPC + LD + FMVDX to load float64 const, we can use AUIPC + FLD instead, same as Const32F. Change-Id: I8ca0a0e90d820a26e69b74cd25df3cc662132bf7 Reviewed-on: https://go-review.googlesource.com/c/go/+/703215 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2025-10-26 18:35:09 -07:00
Jes Cok	46cc532900	cmd/compile/internal/ssa: fix typo in comment Change-Id: Ic48a756b71a62be1c6c4cfe781c02b89010e2338 GitHub-Last-Rev: `8c0d89b475` GitHub-Pull-Request: golang/go#75985 Reviewed-on: https://go-review.googlesource.com/c/go/+/713041 Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Jorropo <jorropo.pgm@gmail.com> Reviewed-by: David Chase <drchase@google.com> Auto-Submit: Keith Randall <khr@golang.org> Auto-Submit: Jorropo <jorropo.pgm@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-10-21 08:43:34 -07:00
Meng Zhuo	b9a4a09b0f	runtime: remove duff support for riscv64 Change-Id: I987d9f49fbd2650eef4224f72271bf752c54d39c Reviewed-on: https://go-review.googlesource.com/c/go/+/700538 Reviewed-by: Mark Freeman <markfreeman@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com>	2025-09-09 19:42:25 -07:00
Meng Zhuo	4dac9e093f	cmd/compile: use generated loops instead of DUFFCOPY on riscv64 MemmoveKnownSize112-4 632.1Mi ± 1% 1288.5Mi ± 0% +103.85% (p=0.000 n=10) MemmoveKnownSize128-4 636.1Mi ± 0% 1280.9Mi ± 1% +101.36% (p=0.000 n=10) MemmoveKnownSize192-4 645.3Mi ± 0% 1306.9Mi ± 1% +102.53% (p=0.000 n=10) MemmoveKnownSize248-4 650.2Mi ± 2% 1312.5Mi ± 1% +101.87% (p=0.000 n=10) MemmoveKnownSize256-4 650.7Mi ± 0% 1303.6Mi ± 1% +100.33% (p=0.000 n=10) MemmoveKnownSize512-4 658.2Mi ± 1% 1293.9Mi ± 0% +96.60% (p=0.000 n=10) MemmoveKnownSize1024-4 662.1Mi ± 0% 1312.6Mi ± 0% +98.26% (p=0.000 n=10) Change-Id: I43681ca029880025558b33ddc4295da3947c9b28 Reviewed-on: https://go-review.googlesource.com/c/go/+/700537 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Mark Freeman <markfreeman@google.com>	2025-09-09 19:42:09 -07:00
Meng Zhuo	879ff736d3	cmd/compile: use generated loops instead of DUFFZERO on riscv64 MemclrKnownSize112-4 5.602Gi ± 0% 5.601Gi ± 0% ~ (p=0.363 n=10) MemclrKnownSize128-4 6.933Gi ± 1% 6.545Gi ± 1% -5.59% (p=0.000 n=10) MemclrKnownSize192-4 8.055Gi ± 1% 7.804Gi ± 0% -3.12% (p=0.000 n=10) MemclrKnownSize248-4 8.489Gi ± 0% 8.718Gi ± 0% +2.69% (p=0.000 n=10) MemclrKnownSize256-4 8.762Gi ± 0% 8.763Gi ± 0% ~ (p=0.494 n=10) MemclrKnownSize512-4 9.514Gi ± 1% 9.514Gi ± 0% ~ (p=0.529 n=10) MemclrKnownSize1024-4 9.940Gi ± 0% 9.939Gi ± 1% ~ (p=0.989 n=10) ClearFat3-4 1.300Gi ± 0% 1.301Gi ± 0% ~ (p=0.447 n=10) ClearFat4-4 3.902Gi ± 0% 3.902Gi ± 0% ~ (p=0.971 n=10) ClearFat5-4 665.8Mi ± 0% 1331.5Mi ± 0% +100.01% (p=0.000 n=10) ClearFat6-4 665.8Mi ± 0% 1330.5Mi ± 0% +99.82% (p=0.000 n=10) ClearFat7-4 490.7Mi ± 0% 1331.9Mi ± 0% +171.45% (p=0.000 n=10) ClearFat8-4 5.201Gi ± 0% 5.202Gi ± 0% ~ (p=0.123 n=10) ClearFat9-4 856.1Mi ± 0% 1331.6Mi ± 0% +55.54% (p=0.000 n=10) ClearFat10-4 887.8Mi ± 0% 1331.9Mi ± 0% +50.03% (p=0.000 n=10) ClearFat11-4 915.3Mi ± 0% 1331.1Mi ± 0% +45.42% (p=0.000 n=10) ClearFat12-4 5.202Gi ± 0% 5.202Gi ± 0% ~ (p=0.481 n=10) ClearFat13-4 961.5Mi ± 0% 1331.8Mi ± 0% +38.50% (p=0.000 n=10) ClearFat14-4 981.0Mi ± 0% 1331.8Mi ± 0% +35.76% (p=0.000 n=10) ClearFat15-4 951.3Mi ± 0% 1331.4Mi ± 0% +39.96% (p=0.000 n=10) ClearFat16-4 1.600Gi ± 0% 5.202Gi ± 0% +225.10% (p=0.000 n=10) ClearFat18-4 1.018Gi ± 0% 1.300Gi ± 0% +27.77% (p=0.000 n=10) ClearFat20-4 2.601Gi ± 0% 4.938Gi ± 12% +89.87% (p=0.000 n=10) ClearFat24-4 2.601Gi ± 0% 5.201Gi ± 0% +99.96% (p=0.000 n=10) ClearFat32-4 1.982Gi ± 0% 5.203Gi ± 0% +162.55% (p=0.000 n=10) ClearFat40-4 3.467Gi ± 0% 4.338Gi ± 0% +25.11% (p=0.000 n=10) ClearFat48-4 3.671Gi ± 0% 5.201Gi ± 0% +41.69% (p=0.000 n=10) ClearFat56-4 3.640Gi ± 0% 5.201Gi ± 0% +42.88% (p=0.000 n=10) ClearFat64-4 2.250Gi ± 0% 5.202Gi ± 0% +131.25% (p=0.000 n=10) ClearFat72-4 4.064Gi ± 0% 5.201Gi ± 0% +27.97% (p=0.000 n=10) ClearFat128-4 4.496Gi ± 0% 5.203Gi ± 0% +15.71% (p=0.000 n=10) ClearFat256-4 4.756Gi ± 0% 5.201Gi ± 0% +9.36% (p=0.000 n=10) ClearFat512-4 2.512Gi ± 0% 5.201Gi ± 0% +107.03% (p=0.000 n=10) ClearFat1024-4 4.255Gi ± 0% 5.202Gi ± 0% +22.26% (p=0.000 n=10) ClearFat1032-4 4.260Gi ± 0% 5.201Gi ± 0% +22.09% (p=0.000 n=10) ClearFat1040-4 4.285Gi ± 1% 5.203Gi ± 0% +21.41% (p=0.000 n=10) geomean 2.005Gi 3.020Gi +50.58% Change-Id: Iea1da734ff8eaf1b5a2822ae2bdb7f4fd9b65651 Reviewed-on: https://go-review.googlesource.com/c/go/+/699635 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Freeman <markfreeman@google.com>	2025-09-09 19:41:39 -07:00
Michael Munday	320df537cc	cmd/compile: emit classify instructions for infinity tests on riscv64 The 'classify' instruction on RISC-V sets a bit in a mask to indicate the class a floating point value belongs to (e.g. whether the value is an infinity, a normal number, a subnormal number and so on). There are other places this instruction is useful but for now I've just used it for infinity tests. The gains are relatively small (~1-2 instructions per IsInf call) but using FCLASSD does potentially unlock further optimizations. It also reduces the number of loads from memory and the number of moves between general purpose and floating point register files. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ sec/op │ sec/op vs base │ Acos 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10) Acosh 249.8n ± 0% 254.4n ± 0% +1.86% (p=0.000 n=10) Asin 159.9n ± 0% 173.7n ± 0% +8.66% (p=0.000 n=10) Asinh 292.2n ± 0% 283.0n ± 0% -3.15% (p=0.000 n=10) Atan 119.1n ± 0% 119.0n ± 0% -0.08% (p=0.036 n=10) Atanh 265.1n ± 0% 271.6n ± 0% +2.43% (p=0.000 n=10) Atan2 194.9n ± 0% 186.7n ± 0% -4.23% (p=0.000 n=10) Cbrt 216.3n ± 0% 203.1n ± 0% -6.10% (p=0.000 n=10) Ceil 31.82n ± 0% 31.81n ± 0% ~ (p=0.063 n=10) Copysign 4.897n ± 0% 4.893n ± 3% -0.08% (p=0.038 n=10) Cos 123.9n ± 0% 107.7n ± 1% -13.03% (p=0.000 n=10) Cosh 293.0n ± 0% 264.6n ± 0% -9.68% (p=0.000 n=10) Erf 150.0n ± 0% 133.8n ± 0% -10.80% (p=0.000 n=10) Erfc 151.8n ± 0% 137.9n ± 0% -9.16% (p=0.000 n=10) Erfinv 173.8n ± 0% 173.8n ± 0% ~ (p=0.820 n=10) Erfcinv 173.8n ± 0% 173.8n ± 0% ~ (p=1.000 n=10) Exp 247.7n ± 0% 220.4n ± 0% -11.04% (p=0.000 n=10) ExpGo 261.4n ± 0% 232.5n ± 0% -11.04% (p=0.000 n=10) Expm1 176.2n ± 0% 164.9n ± 0% -6.41% (p=0.000 n=10) Exp2 220.4n ± 0% 190.2n ± 0% -13.70% (p=0.000 n=10) Exp2Go 232.5n ± 0% 204.0n ± 0% -12.22% (p=0.000 n=10) Abs 4.897n ± 0% 4.897n ± 0% ~ (p=0.726 n=10) Dim 16.32n ± 0% 16.31n ± 0% ~ (p=0.770 n=10) Floor 31.84n ± 0% 31.83n ± 0% ~ (p=0.677 n=10) Max 26.11n ± 0% 26.13n ± 0% ~ (p=0.290 n=10) Min 26.10n ± 0% 26.11n ± 0% ~ (p=0.424 n=10) Mod 416.2n ± 0% 337.8n ± 0% -18.83% (p=0.000 n=10) Frexp 63.65n ± 0% 50.60n ± 0% -20.50% (p=0.000 n=10) Gamma 218.8n ± 0% 206.4n ± 0% -5.62% (p=0.000 n=10) Hypot 92.20n ± 0% 94.69n ± 0% +2.70% (p=0.000 n=10) HypotGo 107.7n ± 0% 109.3n ± 0% +1.49% (p=0.000 n=10) Ilogb 59.54n ± 0% 44.04n ± 0% -26.04% (p=0.000 n=10) J0 708.9n ± 0% 674.5n ± 0% -4.86% (p=0.000 n=10) J1 707.6n ± 0% 676.1n ± 0% -4.44% (p=0.000 n=10) Jn 1.513µ ± 0% 1.427µ ± 0% -5.68% (p=0.000 n=10) Ldexp 70.20n ± 0% 57.09n ± 0% -18.68% (p=0.000 n=10) Lgamma 201.5n ± 0% 185.3n ± 1% -8.01% (p=0.000 n=10) Log 201.5n ± 0% 182.7n ± 0% -9.35% (p=0.000 n=10) Logb 59.54n ± 0% 46.53n ± 0% -21.86% (p=0.000 n=10) Log1p 178.8n ± 0% 173.9n ± 6% -2.74% (p=0.021 n=10) Log10 201.4n ± 0% 184.3n ± 0% -8.49% (p=0.000 n=10) Log2 79.17n ± 0% 66.07n ± 0% -16.54% (p=0.000 n=10) Modf 34.27n ± 0% 34.25n ± 0% ~ (p=0.559 n=10) Nextafter32 49.34n ± 0% 49.37n ± 0% +0.05% (p=0.040 n=10) Nextafter64 43.66n ± 0% 43.66n ± 0% ~ (p=0.869 n=10) PowInt 309.1n ± 0% 267.4n ± 0% -13.49% (p=0.000 n=10) PowFrac 769.6n ± 0% 677.3n ± 0% -11.98% (p=0.000 n=10) Pow10Pos 13.88n ± 0% 13.88n ± 0% ~ (p=0.811 n=10) Pow10Neg 19.58n ± 0% 19.57n ± 0% ~ (p=0.993 n=10) Round 23.65n ± 0% 23.66n ± 0% ~ (p=0.354 n=10) RoundToEven 27.75n ± 0% 27.75n ± 0% ~ (p=0.971 n=10) Remainder 380.0n ± 0% 309.9n ± 0% -18.45% (p=0.000 n=10) Signbit 13.06n ± 0% 13.06n ± 0% ~ (p=1.000 n=10) Sin 133.8n ± 0% 120.8n ± 0% -9.75% (p=0.000 n=10) Sincos 160.7n ± 0% 147.7n ± 0% -8.12% (p=0.000 n=10) Sinh 305.9n ± 0% 277.9n ± 0% -9.17% (p=0.000 n=10) SqrtIndirect 3.265n ± 0% 3.264n ± 0% ~ (p=0.546 n=10) SqrtLatency 19.58n ± 0% 19.58n ± 0% ~ (p=0.973 n=10) SqrtIndirectLatency 19.59n ± 0% 19.58n ± 0% ~ (p=0.370 n=10) SqrtGoLatency 205.7n ± 0% 202.7n ± 0% -1.46% (p=0.000 n=10) SqrtPrime 4.953µ ± 0% 4.954µ ± 0% ~ (p=0.477 n=10) Tan 163.2n ± 0% 150.2n ± 0% -7.99% (p=0.000 n=10) Tanh 312.4n ± 0% 284.2n ± 0% -9.01% (p=0.000 n=10) Trunc 31.83n ± 0% 31.83n ± 0% ~ (p=0.663 n=10) Y0 701.0n ± 0% 669.2n ± 0% -4.54% (p=0.000 n=10) Y1 704.5n ± 0% 672.4n ± 0% -4.55% (p=0.000 n=10) Yn 1.490µ ± 0% 1.422µ ± 0% -4.60% (p=0.000 n=10) Float64bits 5.713n ± 0% 5.710n ± 0% ~ (p=0.926 n=10) Float64frombits 4.896n ± 0% 4.896n ± 0% ~ (p=0.663 n=10) Float32bits 12.25n ± 0% 12.25n ± 0% ~ (p=0.571 n=10) Float32frombits 4.898n ± 0% 4.896n ± 0% ~ (p=0.754 n=10) FMA 4.895n ± 0% 4.895n ± 0% ~ (p=0.745 n=10) geomean 94.40n 89.43n -5.27% Change-Id: I4fe0f2e9f609e38d79463f9ba2519a3f9427432e Reviewed-on: https://go-review.googlesource.com/c/go/+/348389 Reviewed-by: Keith Randall <khr@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@google.com>	2025-08-13 20:33:56 -07:00
Michael Munday	fcc036f03b	cmd/compile: optimise float <-> int register moves on riscv64 Use the FMV* instructions to move values between the floating point and integer register files. Note: I'm unsure why there is a slowdown in the Float32bits benchmark, I've checked and an FMVXS instruction is being used as expected. There are multiple loads and other instructions in the main loop. goos: linux goarch: riscv64 pkg: math cpu: Spacemit(R) X60 │ fmv-before.txt │ fmv-after.txt │ │ sec/op │ sec/op vs base │ Acos 122.7n ± 0% 122.7n ± 0% ~ (p=1.000 n=10) Acosh 197.2n ± 0% 191.5n ± 0% -2.89% (p=0.000 n=10) Asin 122.7n ± 0% 122.7n ± 0% ~ (p=0.474 n=10) Asinh 231.0n ± 0% 224.1n ± 0% -2.99% (p=0.000 n=10) Atan 91.39n ± 0% 91.41n ± 0% ~ (p=0.465 n=10) Atanh 210.3n ± 0% 203.4n ± 0% -3.26% (p=0.000 n=10) Atan2 149.6n ± 0% 149.6n ± 0% ~ (p=0.721 n=10) Cbrt 176.5n ± 0% 165.9n ± 0% -6.01% (p=0.000 n=10) Ceil 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Copysign 3.756n ± 0% 3.756n ± 0% ~ (p=0.149 n=10) Cos 95.15n ± 0% 95.15n ± 0% ~ (p=0.374 n=10) Cosh 228.6n ± 0% 224.7n ± 0% -1.71% (p=0.000 n=10) Erf 115.2n ± 0% 115.2n ± 0% ~ (p=0.474 n=10) Erfc 116.4n ± 0% 116.4n ± 0% ~ (p=0.628 n=10) Erfinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Erfcinv 133.3n ± 0% 133.3n ± 0% ~ (p=1.000 n=10) Exp 194.1n ± 0% 190.3n ± 0% -1.93% (p=0.000 n=10) ExpGo 204.7n ± 0% 200.3n ± 0% -2.15% (p=0.000 n=10) Expm1 137.7n ± 0% 135.2n ± 0% -1.82% (p=0.000 n=10) Exp2 173.4n ± 0% 169.0n ± 0% -2.54% (p=0.000 n=10) Exp2Go 182.8n ± 0% 178.4n ± 0% -2.41% (p=0.000 n=10) Abs 3.756n ± 0% 3.756n ± 0% ~ (p=0.157 n=10) Dim 12.52n ± 0% 12.52n ± 0% ~ (p=0.737 n=10) Floor 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Max 21.29n ± 0% 20.03n ± 0% -5.92% (p=0.000 n=10) Min 21.28n ± 0% 20.04n ± 0% -5.85% (p=0.000 n=10) Mod 344.9n ± 0% 319.2n ± 0% -7.45% (p=0.000 n=10) Frexp 55.71n ± 0% 48.85n ± 0% -12.30% (p=0.000 n=10) Gamma 165.9n ± 0% 167.8n ± 0% +1.15% (p=0.000 n=10) Hypot 73.24n ± 0% 70.74n ± 0% -3.41% (p=0.000 n=10) HypotGo 84.50n ± 0% 82.63n ± 0% -2.21% (p=0.000 n=10) Ilogb 49.45n ± 0% 45.70n ± 0% -7.59% (p=0.000 n=10) J0 556.5n ± 0% 544.0n ± 0% -2.25% (p=0.000 n=10) J1 555.3n ± 0% 542.8n ± 0% -2.24% (p=0.000 n=10) Jn 1.181µ ± 0% 1.156µ ± 0% -2.12% (p=0.000 n=10) Ldexp 59.47n ± 0% 53.84n ± 0% -9.47% (p=0.000 n=10) Lgamma 167.2n ± 0% 154.6n ± 0% -7.51% (p=0.000 n=10) Log 160.9n ± 0% 154.6n ± 0% -3.92% (p=0.000 n=10) Logb 49.45n ± 0% 45.70n ± 0% -7.58% (p=0.000 n=10) Log1p 147.1n ± 0% 137.1n ± 0% -6.80% (p=0.000 n=10) Log10 162.1n ± 1% 154.6n ± 0% -4.63% (p=0.000 n=10) Log2 66.99n ± 0% 60.72n ± 0% -9.36% (p=0.000 n=10) Modf 29.42n ± 0% 26.29n ± 0% -10.64% (p=0.000 n=10) Nextafter32 41.95n ± 0% 37.88n ± 0% -9.70% (p=0.000 n=10) Nextafter64 38.82n ± 0% 33.49n ± 0% -13.73% (p=0.000 n=10) PowInt 252.3n ± 0% 237.3n ± 0% -5.95% (p=0.000 n=10) PowFrac 615.5n ± 0% 589.7n ± 0% -4.19% (p=0.000 n=10) Pow10Pos 10.64n ± 0% 10.64n ± 0% ~ (p=1.000 n=10) Pow10Neg 24.42n ± 0% 15.02n ± 0% -38.49% (p=0.000 n=10) Round 21.91n ± 0% 18.16n ± 0% -17.12% (p=0.000 n=10) RoundToEven 24.42n ± 0% 21.29n ± 0% -12.84% (p=0.000 n=10) Remainder 308.0n ± 0% 291.2n ± 0% -5.44% (p=0.000 n=10) Signbit 10.02n ± 0% 10.02n ± 0% ~ (p=1.000 n=10) Sin 102.7n ± 0% 102.7n ± 0% ~ (p=0.211 n=10) Sincos 124.0n ± 1% 123.3n ± 0% -0.56% (p=0.002 n=10) Sinh 239.1n ± 0% 234.7n ± 0% -1.84% (p=0.000 n=10) SqrtIndirect 2.504n ± 0% 2.504n ± 0% ~ (p=0.303 n=10) SqrtLatency 15.03n ± 0% 15.02n ± 0% ~ (p=0.598 n=10) SqrtIndirectLatency 15.02n ± 0% 15.02n ± 0% ~ (p=0.907 n=10) SqrtGoLatency 165.3n ± 0% 157.2n ± 0% -4.90% (p=0.000 n=10) SqrtPrime 3.801µ ± 0% 3.802µ ± 0% ~ (p=1.000 n=10) Tan 125.2n ± 0% 125.2n ± 0% ~ (p=0.458 n=10) Tanh 244.2n ± 0% 239.9n ± 0% -1.76% (p=0.000 n=10) Trunc 25.67n ± 0% 24.42n ± 0% -4.87% (p=0.000 n=10) Y0 550.2n ± 0% 538.1n ± 0% -2.21% (p=0.000 n=10) Y1 552.8n ± 0% 540.6n ± 0% -2.21% (p=0.000 n=10) Yn 1.168µ ± 0% 1.143µ ± 0% -2.14% (p=0.000 n=10) Float64bits 8.139n ± 0% 4.385n ± 0% -46.13% (p=0.000 n=10) Float64frombits 7.512n ± 0% 3.759n ± 0% -49.96% (p=0.000 n=10) Float32bits 8.138n ± 0% 9.393n ± 0% +15.42% (p=0.000 n=10) Float32frombits 7.513n ± 0% 3.757n ± 0% -49.98% (p=0.000 n=10) FMA 3.756n ± 0% 3.756n ± 0% ~ (p=0.246 n=10) geomean 77.43n 72.42n -6.47% Change-Id: I8dac69b1d17cb3d2af78d1c844d2b5d80000d667 Reviewed-on: https://go-review.googlesource.com/c/go/+/599235 Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Michael Munday <mikemndy@gmail.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Keith Randall <khr@golang.org>	2025-08-05 08:27:15 -07:00
Keith Randall	95693816a5	cmd/compile: move riscv64 over to new bounds check strategy Change-Id: Idd9eaf051aa57f7fef7049c12085926030c35d70 Reviewed-on: https://go-review.googlesource.com/c/go/+/682401 Reviewed-by: Mark Freeman <mark@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-08-04 10:08:13 -07:00
Joel Sing	4d10d4ad84	cmd/compile,internal/cpu,runtime: intrinsify math/bits.OnesCount on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.OnesCount using the CPOP/CPOPW machine instructions. Since the native Go implementation of OnesCount is relatively expensive, it is also worth emitting a check for Zbb support when compiled for rva20u64. On a Banana Pi F3, with GORISCV64=rva22u64: │ oc.1 │ oc.2 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 4.389n ± 0% -74.08% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.016n ± 0% -11.10% (p=0.000 n=10) OnesCount16-8 9.404n ± 0% 5.015n ± 0% -46.67% (p=0.000 n=10) OnesCount32-8 13.165n ± 0% 4.388n ± 0% -66.67% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 4.388n ± 0% -73.08% (p=0.000 n=10) geomean 11.40n 4.629n -59.40% On a Banana Pi F3, compiled with GORISCV64=rva20u64 and with Zbb detection enabled: │ oc.3 │ oc.4 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.930n ± 0% 5.643n ± 0% -66.67% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.642n ± 0% ~ (p=0.447 n=10) OnesCount16-8 10.030n ± 0% 6.896n ± 0% -31.25% (p=0.000 n=10) OnesCount32-8 13.170n ± 0% 5.642n ± 0% -57.16% (p=0.000 n=10) OnesCount64-8 16.300n ± 0% 5.642n ± 0% -65.39% (p=0.000 n=10) geomean 11.55n 5.873n -49.16% On a Banana Pi F3, compiled with GORISCV64=rva20u64 but with Zbb detection disabled: │ oc.3 │ oc.5 │ │ sec/op │ sec/op vs base │ OnesCount-8 16.93n ± 0% 29.47n ± 0% +74.07% (p=0.000 n=10) OnesCount8-8 5.642n ± 0% 5.643n ± 0% ~ (p=0.191 n=10) OnesCount16-8 10.03n ± 0% 15.05n ± 0% +50.05% (p=0.000 n=10) OnesCount32-8 13.17n ± 0% 18.18n ± 0% +38.04% (p=0.000 n=10) OnesCount64-8 16.30n ± 0% 21.94n ± 0% +34.60% (p=0.000 n=10) geomean 11.55n 15.84n +37.16% For hardware without Zbb, this adds ~5ns overhead, while for hardware with Zbb we achieve a performance gain up of up to 11ns. It is worth noting that OnesCount8 is cheap enough that it is preferable to stick with the generic version in this case. Change-Id: Id657e40e0dd1b1ab8cc0fe0f8a68df4c9f2d7da5 Reviewed-on: https://go-review.googlesource.com/c/go/+/660856 Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2025-05-01 05:57:41 -07:00
Joel Sing	90e8b8cdae	cmd/compile: intrinsify math/bits.Bswap on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.Bswap using the REV8 machine instruction. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ rb.1 │ rb.2 │ │ sec/op │ sec/op vs base │ ReverseBytes-4 18.790n ± 0% 4.026n ± 0% -78.57% (p=0.000 n=10) ReverseBytes16-4 6.710n ± 0% 5.368n ± 0% -20.00% (p=0.000 n=10) ReverseBytes32-4 13.420n ± 0% 5.368n ± 0% -60.00% (p=0.000 n=10) ReverseBytes64-4 17.450n ± 0% 4.026n ± 0% -76.93% (p=0.000 n=10) geomean 13.11n 4.649n -64.54% Change-Id: I26eee34270b1721f7304bb1cddb0fda129b20ece Reviewed-on: https://go-review.googlesource.com/c/go/+/660855 Reviewed-by: Mark Ryan <markdryan@rivosinc.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> Reviewed-by: Junyang Shao <shaojunyang@google.com>	2025-05-01 05:57:13 -07:00
Joel Sing	b70244ff7a	cmd/compile: intrinsify math/bits.Len on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.Len using the CLZ/CLZW machine instructions. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ clz.b.1 │ clz.b.2 │ │ sec/op │ sec/op vs base │ LeadingZeros-4 28.89n ± 0% 12.08n ± 0% -58.19% (p=0.000 n=10) LeadingZeros8-4 18.79n ± 0% 14.76n ± 0% -21.45% (p=0.000 n=10) LeadingZeros16-4 25.27n ± 0% 14.76n ± 0% -41.59% (p=0.000 n=10) LeadingZeros32-4 25.12n ± 0% 12.08n ± 0% -51.92% (p=0.000 n=10) LeadingZeros64-4 25.89n ± 0% 12.08n ± 0% -53.35% (p=0.000 n=10) geomean 24.55n 13.09n -46.70% Change-Id: I0dda684713dbdf5336af393f5ccbdae861c4f694 Reviewed-on: https://go-review.googlesource.com/c/go/+/652321 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2025-03-21 18:21:44 -07:00
Joel Sing	6fb7bdc96d	cmd/compile: intrinsify math/bits.TrailingZeros on riscv64 For riscv64/rva22u64 and above, we can intrinsify math/bits.TrailingZeros using the CTZ/CTZW machine instructions. On a StarFive VisionFive 2 with GORISCV64=rva22u64: │ ctz.b.1 │ ctz.b.2 │ │ sec/op │ sec/op vs base │ TrailingZeros-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10) TrailingZeros8-4 14.76n ± 0% 10.74n ± 0% -27.24% (p=0.000 n=10) TrailingZeros16-4 26.84n ± 0% 10.74n ± 0% -59.99% (p=0.000 n=10) TrailingZeros32-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10) TrailingZeros64-4 25.500n ± 0% 8.052n ± 0% -68.42% (p=0.000 n=10) geomean 23.09n 9.035n -60.88% Change-Id: I71edf2b988acb7a68e797afda4ee66d7a57d587e Reviewed-on: https://go-review.googlesource.com/c/go/+/652320 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: David Chase <drchase@google.com> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>	2025-03-15 19:07:53 -07:00
Michael Pratt	81c92352a7	runtime: move getcallerpc to internal/runtime/sys Moving these intrinsics to a base package enables other internal/runtime packages to use them. For #54766. Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46 Reviewed-on: https://go-review.googlesource.com/c/go/+/613260 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2024-09-17 15:14:14 +00:00
Joel Sing	0ee5d20b1f	cmd/compile,cmd/internal/obj/riscv: always provide ANDN, ORN and XNOR for riscv64 The ANDN, ORN and XNOR RISC-V Zbb extension instructions are easily synthesised. Make them always available by adding support to the riscv64 assembler so that we either emit two instruction sequences, or a single instruction, when permitted by the GORISCV64 profile. This means that these instructions can be used unconditionally, simplifying compiler rewrite rules, codegen tests and manually written assembly. Around 180 instructions are removed from the Go binary on riscv64 when built with rva22u64. Change-Id: Ib2d90f2593a306530dc0ed08a981acde4d01be20 Reviewed-on: https://go-review.googlesource.com/c/go/+/611895 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Tim King <taking@google.com> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>	2024-09-12 15:03:44 +00:00
Joel Sing	e126129d76	cmd/compile/internal/ssa: combine shift and addition for riscv64 rva22u64 When GORISCV64 enables rva22u64, combined shift and addition using the SH1ADD, SH2ADD and SH3ADD instructions that are available via the Zba extension. This results in more than 2000 instructions being removed from the Go binary on riscv64. Change-Id: Ia62ae7dda3d8083cff315113421bee73f518eea8 Reviewed-on: https://go-review.googlesource.com/c/go/+/606636 LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>	2024-08-28 13:46:24 +00:00
Joel Sing	3d9a89b057	cmd/compile: use integer min/max instructions on riscv64 When GORISCV64 enables rva22u64, make use of integer MIN/MINU/MAX/MAXU instructions in compiler rewrite rules. Change-Id: I4e7c514516acad03f2869d4c8936f06582cf7ea9 Reviewed-on: https://go-review.googlesource.com/c/go/+/559660 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Reviewed-by: Meng Zhuo <mengzhuo1203@gmail.com>	2024-08-19 13:46:02 +00:00
Joel Sing	997636760e	cmd/compile,cmd/internal/obj: provide rotation pseudo-instructions for riscv64 Provide and use rotation pseudo-instructions for riscv64. The RISC-V bitmanip extension adds support for hardware rotation instructions in the form of ROL, ROLW, ROR, RORI, RORIW and RORW. These are easily implemented in the assembler as pseudo-instructions for CPUs that do not support the bitmanip extension. This approach provides a number of advantages, including reducing the rewrite rules needed in the compiler, simplifying codegen tests and most importantly, allowing these instructions to be used in assembly (for example, riscv64 optimised versions of SHA-256 and SHA-512). When bitmanip support is added, these instruction sequences can simply be replaced with a single instruction if permitted by the GORISCV64 profile. Change-Id: Ia23402e1a82f211ac760690deb063386056ae1fa Reviewed-on: https://go-review.googlesource.com/c/go/+/565015 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Carlos Amedee <carlos@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com> Run-TryBot: Joel Sing <joel@sing.id.au>	2024-03-07 14:57:07 +00:00
Joel Sing	daa58db486	cmd/compile: improve rotations for riscv64 Enable canRotate for riscv64, enable rotation intrinsics and provide better rewrite implementations for rotations. By avoiding Lshx64 and RshUx64 we can produce better code, especially for 32 and 64 bit rotations. By enabling canRotate we also benefit from the generic rotation rewrite rules. Benchmark on a StarFive VisionFive 2: │ rotate.1 │ rotate.2 │ │ sec/op │ sec/op vs base │ RotateLeft-4 14.700n ± 0% 8.016n ± 0% -45.47% (p=0.000 n=10) RotateLeft8-4 14.70n ± 0% 10.69n ± 0% -27.28% (p=0.000 n=10) RotateLeft16-4 14.70n ± 0% 12.02n ± 0% -18.23% (p=0.000 n=10) RotateLeft32-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10) RotateLeft64-4 13.360n ± 0% 8.016n ± 0% -40.00% (p=0.000 n=10) geomean 14.15n 9.208n -34.92% Change-Id: I1a2036fdc57cf88ebb6617eb8d92e1d187e183b2 Reviewed-on: https://go-review.googlesource.com/c/go/+/560315 Reviewed-by: M Zhuo <mengzhuo1203@gmail.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: David Chase <drchase@google.com>	2024-02-16 11:59:07 +00:00
Meng Zhuo	09ed9a6585	cmd/compile: implement float min/max in hardware for riscv64 CL 514596 adds float min/max for amd64, this CL adds it for riscv64. The behavior of the RISC-V FMIN/FMAX instructions almost match Go's requirements. However according to RISCV spec 8.3 "NaN Generation and Propagation" >> if at least one input is a signaling NaN, or if both inputs are quiet >> NaNs, the result is the canonical NaN. If one operand is a quiet NaN >> and the other is not a NaN, the result is the non-NaN operand. Go using quiet NaN as NaN and according to Go spec >> if any argument is a NaN, the result is a NaN This requires the float min/max implementation to check whether one of operand is qNaN before float mix/max actually execute. This CL also fix a typo in minmax test. Benchmark on Visionfive2 goos: linux goarch: riscv64 pkg: runtime │ float_minmax.old.bench │ float_minmax.new.bench │ │ sec/op │ sec/op vs base │ MinFloat 158.20n ± 0% 28.13n ± 0% -82.22% (p=0.000 n=10) MaxFloat 158.10n ± 0% 28.12n ± 0% -82.21% (p=0.000 n=10) geomean 158.1n 28.12n -82.22% Update #59488 Change-Id: Iab48be6d32b8882044fb8c821438ca8840e5493d Reviewed-on: https://go-review.googlesource.com/c/go/+/514775 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Run-TryBot: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2024-01-26 01:41:50 +00:00
Ubuntu	8fc043ccfa	cmd/compile: optimize right shifts of int32 on riscv64 The compiler is currently sign extending 32 bit signed integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit signed values (sraw and sraiw) which sign extend the result of the shift to 64 bits. Change the compiler so that it uses sraw and sraiw for shifts of signed 32 bit integers reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: int32(a) >> 2 before: sll x5,x10,0x20 sra x10,x5,0x22 after: sraw x10,x10,0x2 int32(v) >> int(s) before: sext.w x5,x10 sltiu x6,x11,64 add x6,x6,-1 or x6,x11,x6 sra x10,x5,x6 after: sltiu x5,x11,32 add x5,x5,-1 or x5,x11,x5 sraw x10,x10,x5 int32(v) >> (int(s) & 31) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,31 sraw x10,x10,x5 int32(100) >> int(a) before: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,64 add x5,x5,-1 or x5,x10,x5 li x6,100 sra x10,x6,x5 after: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,32 add x5,x5,-1 or x5,x10,x5 li x6,100 sraw x10,x6,x5 int32(v) >> (int(s) & 63) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,63 sltiu x6,x5,32 add x6,x6,-1 or x5,x5,x6 sraw x10,x10,x5 In most cases we eliminate one instruction. In the case where we shift a int32 constant by a variable the number of instructions generated is identical. A sra is simply replaced by a sraw. In the unusual case where we shift right by a variable anded with a constant > 31 but < 64, we generate two additional instructions. As this is an unusual case we do not try to optimize for it. Some improvements can be seen in some of the existing benchmarks, notably in the utf8 package which performs right shifts of runes which are signed 32 bit integers. \| utf8-old \| utf8-new \| \| sec/op \| sec/op vs base \| EncodeASCIIRune-4 17.68n ± 0% 17.67n ± 0% ~ (p=0.312 n=10) EncodeJapaneseRune-4 35.34n ± 0% 34.53n ± 1% -2.31% (p=0.000 n=10) AppendASCIIRune-4 3.213n ± 0% 3.213n ± 0% ~ (p=0.318 n=10) AppendJapaneseRune-4 36.14n ± 0% 35.35n ± 0% -2.19% (p=0.000 n=10) DecodeASCIIRune-4 28.11n ± 0% 27.36n ± 0% -2.69% (p=0.000 n=10) DecodeJapaneseRune-4 38.55n ± 0% 38.58n ± 0% ~ (p=0.612 n=10) Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2 Reviewed-on: https://go-review.googlesource.com/c/go/+/535115 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com>	2023-10-30 14:47:06 +00:00
Joel Sing	f711892a8a	cmd/compile/internal: stop lowering OpConvert on riscv64 Lowering for OpConvert was removed for all architectures in CL#108496, prior to the riscv64 port being upstreamed. Remove lowering of OpConvert on riscv64, which brings it inline with all other architectures. This results in 1,600+ instructions being removed from the riscv64 go binary. Change-Id: Iaaf1f8b397875926604048b66ad8ac91a98c871e Reviewed-on: https://go-review.googlesource.com/c/go/+/533335 Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-10-07 12:31:59 +00:00
Mark Ryan	561bf0457f	cmd/compile: optimize right shifts of uint32 on riscv The compiler is currently zero extending 32 bit unsigned integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit unsigned values (srlw and srliw) which zero extend the result of the shift to 64 bits. Change the compiler so that it uses srlw and srliw for 32 bit unsigned shifts reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: uint32(a) >> 2 before: sll x5,x10,0x20 srl x10,x5,0x22 after: srlw x10,x10,0x2 uint32(a) >> int(b) before: sll x5,x10,0x20 srl x5,x5,0x20 srl x5,x5,x11 sltiu x6,x11,64 neg x6,x6 and x10,x5,x6 after: srlw x5,x10,x11 sltiu x6,x11,32 neg x6,x6 and x10,x5,x6 bits.RotateLeft32(uint32(a), 1) before: sll x5,x10,0x1 sll x6,x10,0x20 srl x7,x6,0x3f or x5,x5,x7 after: sll x5,x10,0x1 srlw x6,x10,0x1f or x10,x5,x6 bits.RotateLeft32(uint32(a), int(b)) before: and x6,x11,31 sll x7,x10,x6 sll x8,x10,0x20 srl x8,x8,0x20 add x6,x6,-32 neg x6,x6 srl x9,x8,x6 sltiu x6,x6,64 neg x6,x6 and x6,x9,x6 or x6,x6,x7 after: and x5,x11,31 sll x6,x10,x5 add x5,x5,-32 neg x5,x5 srlw x7,x10,x5 sltiu x5,x5,32 neg x5,x5 and x5,x7,x5 or x10,x6,x5 The one regression observed is the following case, an unbounded right shift of a uint32 where the value we're shifting by is known to be < 64 but > 31. As this is an unusual case this commit does not optimize for it, although the existing code does. uint32(a) >> (b & 63) before: sll x5,x10,0x20 srl x5,x5,0x20 and x6,x11,63 srl x10,x5,x6 after and x5,x11,63 srlw x6,x10,x5 sltiu x5,x5,32 neg x5,x5 and x10,x6,x5 Here we have one extra instruction. Some benchmark highlights, generated on a VisionFive2 8GB running Ubuntu 23.04. pkg: math/bits LeadingZeros32-4 18.64n ± 0% 17.32n ± 0% -7.11% (p=0.000 n=10) LeadingZeros64-4 15.47n ± 0% 15.51n ± 0% +0.26% (p=0.027 n=10) TrailingZeros16-4 18.48n ± 0% 17.68n ± 0% -4.33% (p=0.000 n=10) TrailingZeros32-4 16.87n ± 0% 16.07n ± 0% -4.74% (p=0.000 n=10) TrailingZeros64-4 15.26n ± 0% 15.27n ± 0% +0.07% (p=0.043 n=10) OnesCount32-4 20.08n ± 0% 19.29n ± 0% -3.96% (p=0.000 n=10) RotateLeft-4 8.864n ± 0% 8.838n ± 0% -0.30% (p=0.006 n=10) RotateLeft32-4 8.837n ± 0% 8.032n ± 0% -9.11% (p=0.000 n=10) Reverse32-4 29.77n ± 0% 26.52n ± 0% -10.93% (p=0.000 n=10) ReverseBytes32-4 9.640n ± 0% 8.838n ± 0% -8.32% (p=0.000 n=10) Sub32-4 8.835n ± 0% 8.035n ± 0% -9.06% (p=0.000 n=10) geomean 11.50n 11.33n -1.45% pkg: crypto/md5 Hash8Bytes-4 1.486µ ± 0% 1.426µ ± 0% -4.04% (p=0.000 n=10) Hash64-4 2.079µ ± 0% 1.968µ ± 0% -5.36% (p=0.000 n=10) Hash128-4 2.720µ ± 0% 2.557µ ± 0% -5.99% (p=0.000 n=10) Hash256-4 3.996µ ± 0% 3.733µ ± 0% -6.58% (p=0.000 n=10) Hash512-4 6.541µ ± 0% 6.072µ ± 0% -7.18% (p=0.000 n=10) Hash1K-4 11.64µ ± 0% 10.75µ ± 0% -7.58% (p=0.000 n=10) Hash8K-4 82.95µ ± 0% 76.32µ ± 0% -7.99% (p=0.000 n=10) Hash1M-4 10.436m ± 0% 9.591m ± 0% -8.10% (p=0.000 n=10) Hash8M-4 83.50m ± 0% 76.73m ± 0% -8.10% (p=0.000 n=10) Hash8BytesUnaligned-4 1.494µ ± 0% 1.434µ ± 0% -4.02% (p=0.000 n=10) Hash1KUnaligned-4 11.64µ ± 0% 10.76µ ± 0% -7.52% (p=0.000 n=10) Hash8KUnaligned-4 83.01µ ± 0% 76.32µ ± 0% -8.07% (p=0.000 n=10) geomean 28.32µ 26.42µ -6.72% Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693 Reviewed-on: https://go-review.googlesource.com/c/go/+/528975 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-07 12:31:38 +00:00
Xianmiao Qu	d98f74b31e	cmd/compile/internal: intrinsify publicationBarrier on riscv64 This enables publicationBarrier to be used as an intrinsic on riscv64, optimizing the required function call and return instructions for invoking the "runtime.publicationBarrier" function. This function is called by mallocgc. The benchmark results for malloc tested on Lichee-Pi-4A(TH1520, RISC-V 2.0G C910 x4) are as follows. goos: linux goarch: riscv64 pkg: runtime │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Malloc8-4 92.78n ± 1% 90.77n ± 1% -2.17% (p=0.001 n=10) Malloc16-4 156.5n ± 1% 151.7n ± 2% -3.10% (p=0.000 n=10) MallocTypeInfo8-4 131.7n ± 1% 130.6n ± 2% ~ (p=0.165 n=10) MallocTypeInfo16-4 186.5n ± 2% 186.2n ± 1% ~ (p=0.956 n=10) MallocLargeStruct-4 1.345µ ± 1% 1.355µ ± 1% ~ (p=0.093 n=10) geomean 216.9n 214.5n -1.10% Change-Id: Ieab6c02309614bac5c1b12b5ee3311f988ff644d Reviewed-on: https://go-review.googlesource.com/c/go/+/531719 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2023-10-03 19:29:38 +00:00
Meng Zhuo	63ab68ddc5	cmd/compile: add single-precision FMA code generation for riscv64 This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066 Reviewed-on: https://go-review.googlesource.com/c/go/+/506616 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org>	2023-08-22 12:05:36 +00:00
Meng Zhuo	05f9511582	cmd/compile: improve FP FMA performance on riscv64 FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can be used by the compiler to improve FP performance. Erf 188.0n ± 2% 139.5n ± 2% -25.82% (p=0.000 n=10) Erfc 193.6n ± 1% 143.2n ± 1% -26.01% (p=0.000 n=10) Erfinv 244.4n ± 2% 172.6n ± 0% -29.40% (p=0.000 n=10) Erfcinv 244.7n ± 2% 173.0n ± 1% -29.31% (p=0.000 n=10) geomean 216.0n 156.3n -27.65% Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA 11.6 Single-Precision Floating-Point Computational Instructions Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c Reviewed-on: https://go-review.googlesource.com/c/go/+/506036 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-22 08:38:08 +00:00
Keith Randall	21d82e6ac8	cmd/compile: batch write barrier calls Have the write barrier call return a pointer to a buffer into which the generated code records pointers that need write barrier treatment. Change-Id: I7871764298e0aa1513de417010c8d46b296b199e Reviewed-on: https://go-review.googlesource.com/c/go/+/447781 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Bypass: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2023-02-24 00:21:13 +00:00
Keith Randall	12befc3ce3	cmd/compile: improve scheduling pass Convert the scheduling pass from scheduling backwards to scheduling forwards. Forward scheduling makes it easier to prioritize scheduling values as soon as they are ready, which is important for things like nil checks, select ops, etc. Forward scheduling is also quite a bit clearer. It was originally backwards because computing uses is tricky, but I found a way to do it simply and with n lg n complexity. The new scheme also makes it easy to add new scheduling edges if needed. Fixes #42673 Update #56568 Change-Id: Ibbb38c52d191f50ce7a94f8c1cbd3cd9b614ea8b Reviewed-on: https://go-review.googlesource.com/c/go/+/270940 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com>	2023-01-20 04:54:01 +00:00
Keith Randall	45dc81d856	cmd/compile: add memory argument to GetCallerSP We need to make sure that when we get the stack pointer, we get it at the right time. V = GetCallerSP Call() W = GetCallerSP If Call causes a stack growth, then we will be in a situation where V != W. So it matters when GetCallerSP operations get scheduled. Add a memory argument to GetCallerSP so it can't be reordered with things like calls. Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184 Reviewed-on: https://go-review.googlesource.com/c/go/+/453515 Reviewed-by: Martin Möhrmann <moehrmann@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Robert Griesemer <gri@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-01-19 22:43:22 +00:00
Russ Cox	164406ad93	cmd/compile: rename gen and builtin to _gen and _builtin These two directories are full of //go:build ignore files. We can ignore them more easily by putting an underscore at the start of the name. That also works around a bug in Go 1.17 that was not fixed until Go 1.17.3. Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead Reviewed-on: https://go-review.googlesource.com/c/go/+/435472 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-10-04 19:35:46 +00:00

29 commits