Xiaolin Zhao
d98c51809d
cmd/compile: wire up math/bits.Len intrinsics for loong64
...
For the SubFromLen64 codegen test case to work as intended, we need
to fold c-(-(x-d)) into x+(c-d).
Still, some instances of LeadingZeros are not optimized into single
CLZ instructions right now (actually, the LeadingZeros micro-benchmarks
are currently still compiled with redundant adds/subs of 64, due to
interference of loop optimizations before lowering), but perf numbers
indicate it's not that bad after all.
Micro-benchmark results on Loongson 3A5000 and 3A6000:
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A5000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 3.660n ± 0% 1.348n ± 0% -63.17% (p=0.000 n=20)
LeadingZeros8 1.777n ± 0% 1.767n ± 0% -0.56% (p=0.000 n=20)
LeadingZeros16 2.816n ± 0% 1.770n ± 0% -37.14% (p=0.000 n=20)
LeadingZeros32 5.293n ± 1% 1.683n ± 0% -68.21% (p=0.000 n=20)
LeadingZeros64 3.622n ± 0% 1.349n ± 0% -62.76% (p=0.000 n=20)
geomean 3.229n 1.571n -51.35%
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 2.410n ± 0% 1.103n ± 1% -54.23% (p=0.000 n=20)
LeadingZeros8 1.236n ± 0% 1.501n ± 0% +21.44% (p=0.000 n=20)
LeadingZeros16 2.106n ± 0% 1.501n ± 0% -28.73% (p=0.000 n=20)
LeadingZeros32 2.860n ± 0% 1.324n ± 0% -53.72% (p=0.000 n=20)
LeadingZeros64 2.6135n ± 0% 0.9509n ± 0% -63.62% (p=0.000 n=20)
geomean 2.159n 1.256n -41.81%
Updates #59120
This patch is a copy of CL 483356.
Co-authored-by: WANG Xuerui <git@xen0n.name>
Change-Id: Iee81a17f7da06d77a427e73dfcc016f2b15ae556
Reviewed-on: https://go-review.googlesource.com/c/go/+/624575
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-11-06 00:40:40 +00:00
limeidan
0f58a7be8a
cmd/compile/internal: optimize condition branch implementation
...
os: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
│ old │ new │
│ sec/op │ sec/op vs base │
BinaryTree17 7.521 ± 1% 7.551 ± 2% ~ (p=0.190 n=10)
Fannkuch11 2.736 ± 0% 2.667 ± 0% -2.51% (p=0.000 n=10)
FmtFprintfEmpty 34.42n ± 0% 35.22n ± 0% +2.32% (p=0.000 n=10)
FmtFprintfString 61.24n ± 0% 56.84n ± 0% -7.18% (p=0.000 n=10)
FmtFprintfInt 68.04n ± 0% 65.65n ± 0% -3.51% (p=0.000 n=10)
FmtFprintfIntInt 111.9n ± 0% 106.0n ± 0% -5.32% (p=0.000 n=10)
FmtFprintfPrefixedInt 131.4n ± 0% 122.5n ± 0% -6.77% (p=0.000 n=10)
FmtFprintfFloat 241.1n ± 0% 235.1n ± 0% -2.51% (p=0.000 n=10)
FmtManyArgs 553.7n ± 0% 518.9n ± 0% -6.28% (p=0.000 n=10)
GobDecode 7.223m ± 1% 7.291m ± 1% +0.94% (p=0.004 n=10)
GobEncode 6.741m ± 1% 6.622m ± 2% -1.77% (p=0.011 n=10)
Gzip 288.9m ± 0% 280.3m ± 0% -3.00% (p=0.000 n=10)
Gunzip 34.07m ± 0% 33.33m ± 0% -2.18% (p=0.000 n=10)
HTTPClientServer 60.15µ ± 0% 60.63µ ± 0% +0.80% (p=0.000 n=10)
JSONEncode 10.052m ± 1% 9.840m ± 0% -2.12% (p=0.000 n=10)
JSONDecode 50.96m ± 0% 51.32m ± 0% +0.70% (p=0.002 n=10)
Mandelbrot200 4.525m ± 0% 4.602m ± 0% +1.69% (p=0.000 n=10)
GoParse 5.018m ± 0% 4.996m ± 0% -0.44% (p=0.000 n=10)
RegexpMatchEasy0_32 58.74n ± 0% 59.95n ± 0% +2.06% (p=0.000 n=10)
RegexpMatchEasy0_1K 464.9n ± 0% 466.1n ± 0% +0.26% (p=0.000 n=10)
RegexpMatchEasy1_32 64.88n ± 0% 59.64n ± 0% -8.08% (p=0.000 n=10)
RegexpMatchEasy1_1K 557.2n ± 0% 564.4n ± 0% +1.29% (p=0.000 n=10)
RegexpMatchMedium_32 879.3n ± 0% 912.8n ± 1% +3.81% (p=0.000 n=10)
RegexpMatchMedium_1K 28.08µ ± 0% 28.70µ ± 0% +2.20% (p=0.000 n=10)
RegexpMatchHard_32 1.456µ ± 0% 1.414µ ± 0% -2.88% (p=0.000 n=10)
RegexpMatchHard_1K 43.81µ ± 0% 42.23µ ± 0% -3.61% (p=0.000 n=10)
Revcomp 472.4m ± 0% 474.5m ± 1% +0.45% (p=0.000 n=10)
Template 83.45m ± 0% 83.39m ± 0% ~ (p=0.481 n=10)
TimeParse 291.3n ± 0% 283.8n ± 0% -2.57% (p=0.000 n=10)
TimeFormat 322.8n ± 0% 313.1n ± 0% -3.02% (p=0.000 n=10)
geomean 54.32µ 53.45µ -1.61%
Change-Id: If68fdd952ec6137c77e25ce8932358cac28da324
Reviewed-on: https://go-review.googlesource.com/c/go/+/620977
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: sophie zhao <zhaoxiaolin@loongson.cn>
2024-10-24 08:23:34 +00:00
Xiaolin Zhao
e45c125a3c
cmd/compile: add patterns for bitfield opcodes on loong64
...
goos: linux
goarch: loong64
pkg: math/bits
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
LeadingZeros 1.0095n ± 0% 0.8011n ± 0% -20.64% (p=0.000 n=10)
LeadingZeros8 1.201n ± 0% 1.167n ± 0% -2.83% (p=0.000 n=10)
LeadingZeros16 1.201n ± 0% 1.167n ± 0% -2.83% (p=0.000 n=10)
LeadingZeros32 1.201n ± 0% 1.134n ± 0% -5.58% (p=0.000 n=10)
LeadingZeros64 0.8007n ± 0% 1.0115n ± 0% +26.32% (p=0.000 n=10)
TrailingZeros 0.8054n ± 0% 0.8106n ± 1% +0.65% (p=0.000 n=10)
TrailingZeros8 1.067n ± 0% 1.002n ± 1% -6.09% (p=0.000 n=10)
TrailingZeros16 1.0540n ± 0% 0.8389n ± 0% -20.40% (p=0.000 n=10)
TrailingZeros32 0.8014n ± 0% 0.8117n ± 0% +1.29% (p=0.000 n=10)
TrailingZeros64 0.8015n ± 0% 0.8124n ± 1% +1.36% (p=0.000 n=10)
OnesCount 3.418n ± 0% 3.417n ± 0% ~ (p=0.911 n=10)
OnesCount8 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
OnesCount16 1.440n ± 0% 1.299n ± 0% -9.79% (p=0.000 n=10)
OnesCount32 2.969n ± 0% 2.940n ± 0% -0.94% (p=0.000 n=10)
OnesCount64 3.563n ± 0% 3.558n ± 0% -0.14% (p=0.000 n=10)
RotateLeft 0.6677n ± 0% 0.6670n ± 0% ~ (p=0.055 n=10)
RotateLeft8 1.318n ± 1% 1.321n ± 0% ~ (p=0.117 n=10)
RotateLeft16 0.8457n ± 1% 0.8442n ± 0% ~ (p=0.325 n=10)
RotateLeft32 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.837 n=10)
RotateLeft64 0.6678n ± 0% 0.6670n ± 0% -0.13% (p=0.000 n=10)
Reverse 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
Reverse8 0.6989n ± 0% 0.6969n ± 1% ~ (p=0.138 n=10)
Reverse16 0.6998n ± 1% 0.7004n ± 1% ~ (p=0.985 n=10)
Reverse32 0.4158n ± 1% 0.4159n ± 1% ~ (p=0.870 n=10)
Reverse64 0.4165n ± 1% 0.4194n ± 2% ~ (p=0.093 n=10)
ReverseBytes 0.8004n ± 0% 0.8004n ± 0% ~ (p=1.000 n=10)
ReverseBytes16 0.4183n ± 2% 0.4148n ± 1% ~ (p=0.055 n=10)
ReverseBytes32 0.4143n ± 2% 0.4153n ± 1% ~ (p=0.869 n=10)
ReverseBytes64 0.4168n ± 1% 0.4177n ± 1% ~ (p=0.184 n=10)
Add 1.201n ± 0% 1.201n ± 0% ~ (p=0.087 n=10)
Add32 1.603n ± 0% 1.601n ± 0% -0.12% (p=0.000 n=10)
Add64 1.201n ± 0% 1.201n ± 0% ~ (p=0.211 n=10)
Add64multiple 1.839n ± 0% 1.835n ± 0% -0.24% (p=0.001 n=10)
Sub 1.202n ± 0% 1.201n ± 0% -0.04% (p=0.033 n=10)
Sub32 2.401n ± 0% 1.601n ± 0% -33.32% (p=0.000 n=10)
Sub64 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=10)
Sub64multiple 2.105n ± 0% 2.096n ± 0% -0.40% (p=0.000 n=10)
Mul 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Mul32 0.8041n ± 0% 0.8014n ± 0% -0.34% (p=0.000 n=10)
Mul64 0.8008n ± 0% 0.8004n ± 0% -0.05% (p=0.000 n=10)
Div 8.977n ± 0% 8.945n ± 0% -0.36% (p=0.000 n=10)
Div32 4.084n ± 0% 4.086n ± 0% ~ (p=0.445 n=10)
Div64 9.316n ± 0% 9.301n ± 0% -0.17% (p=0.000 n=10)
geomean 1.141n 1.117n -2.09%
Change-Id: I4dc1eaab6728f771bc722ed331fe5c6429bd1037
Reviewed-on: https://go-review.googlesource.com/c/go/+/618475
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2024-10-18 01:09:11 +00:00
Xiaolin Zhao
ef3e1dae2f
cmd/compile: optimize loong64 with register indexed load/store
...
goos: linux
goarch: loong64
pkg: test/bench/go1
cpu: Loongson-3A6000 @ 2500.00MHz
| bench.old | bench.new |
| sec/op | sec/op vs base |
BinaryTree17 7.766 ± 1% 7.640 ± 2% -1.62% (p=0.000 n=20)
Fannkuch11 2.649 ± 0% 2.358 ± 0% -10.96% (p=0.000 n=20)
FmtFprintfEmpty 35.89n ± 0% 35.87n ± 0% -0.06% (p=0.000 n=20)
FmtFprintfString 59.44n ± 0% 57.25n ± 2% -3.68% (p=0.000 n=20)
FmtFprintfInt 62.07n ± 0% 60.04n ± 0% -3.27% (p=0.000 n=20)
FmtFprintfIntInt 97.90n ± 0% 97.26n ± 0% -0.65% (p=0.000 n=20)
FmtFprintfPrefixedInt 116.7n ± 0% 119.2n ± 0% +2.14% (p=0.000 n=20)
FmtFprintfFloat 204.5n ± 0% 201.9n ± 0% -1.30% (p=0.000 n=20)
FmtManyArgs 455.9n ± 0% 466.8n ± 0% +2.39% (p=0.000 n=20)
GobDecode 7.458m ± 1% 7.138m ± 1% -4.28% (p=0.000 n=20)
GobEncode 8.573m ± 1% 8.473m ± 1% ~ (p=0.091 n=20)
Gzip 280.2m ± 0% 284.9m ± 0% +1.67% (p=0.000 n=20)
Gunzip 32.68m ± 0% 32.67m ± 0% ~ (p=0.211 n=20)
HTTPClientServer 54.22µ ± 0% 53.24µ ± 0% -1.80% (p=0.000 n=20)
JSONEncode 9.427m ± 1% 9.152m ± 0% -2.92% (p=0.000 n=20)
JSONDecode 47.08m ± 1% 46.85m ± 1% -0.49% (p=0.007 n=20)
Mandelbrot200 4.601m ± 0% 4.605m ± 0% +0.08% (p=0.000 n=20)
GoParse 4.776m ± 0% 4.655m ± 1% -2.52% (p=0.000 n=20)
RegexpMatchEasy0_32 59.77n ± 0% 57.59n ± 0% -3.66% (p=0.000 n=20)
RegexpMatchEasy0_1K 458.1n ± 0% 458.8n ± 0% +0.15% (p=0.000 n=20)
RegexpMatchEasy1_32 59.36n ± 0% 59.24n ± 0% -0.20% (p=0.000 n=20)
RegexpMatchEasy1_1K 557.7n ± 0% 560.2n ± 0% +0.46% (p=0.000 n=20)
RegexpMatchMedium_32 803.1n ± 0% 772.8n ± 0% -3.77% (p=0.000 n=20)
RegexpMatchMedium_1K 27.29µ ± 0% 25.88µ ± 0% -5.18% (p=0.000 n=20)
RegexpMatchHard_32 1.385µ ± 0% 1.304µ ± 0% -5.85% (p=0.000 n=20)
RegexpMatchHard_1K 40.92µ ± 0% 39.58µ ± 0% -3.27% (p=0.000 n=20)
Revcomp 474.3m ± 0% 410.0m ± 0% -13.56% (p=0.000 n=20)
Template 78.16m ± 0% 76.32m ± 1% -2.36% (p=0.000 n=20)
TimeParse 271.8n ± 0% 272.1n ± 0% +0.11% (p=0.000 n=20)
TimeFormat 292.3n ± 0% 294.8n ± 0% +0.86% (p=0.000 n=20)
geomean 51.98µ 50.82µ -2.22%
Change-Id: Ia78f1ddee8f1d9ec7192a4b8d2a4ec6058679956
Reviewed-on: https://go-review.googlesource.com/c/go/+/615918
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-10-17 07:32:25 +00:00
Michael Pratt
81c92352a7
runtime: move getcallerpc to internal/runtime/sys
...
Moving these intrinsics to a base package enables other internal/runtime
packages to use them.
For #54766 .
Change-Id: I0b3eded3bb45af53e3eb5bab93e3792e6a8beb46
Reviewed-on: https://go-review.googlesource.com/c/go/+/613260
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2024-09-17 15:14:14 +00:00
Xiaolin Zhao
f243cf6016
cmd/compile: optimize math.Float64(32)bits and math.Float64(32)frombits on loong64
...
Use float <-> int register moves without conversion instead of stores
and loads to move float <-> int values like arm64 and mips64.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 15.98n ± 0% 15.94n ± 0% -0.25% (p=0.000 n=20)
Acosh 27.75n ± 0% 25.56n ± 0% -7.89% (p=0.000 n=20)
Asin 15.85n ± 0% 15.76n ± 0% -0.57% (p=0.000 n=20)
Asinh 39.79n ± 0% 37.69n ± 0% -5.28% (p=0.000 n=20)
Atan 7.261n ± 0% 7.242n ± 0% -0.27% (p=0.000 n=20)
Atanh 28.30n ± 0% 27.62n ± 0% -2.40% (p=0.000 n=20)
Atan2 15.85n ± 0% 15.75n ± 0% -0.63% (p=0.000 n=20)
Cbrt 27.02n ± 0% 21.08n ± 0% -21.98% (p=0.000 n=20)
Ceil 2.830n ± 1% 2.896n ± 1% +2.31% (p=0.000 n=20)
Copysign 0.8022n ± 0% 0.8004n ± 0% -0.22% (p=0.000 n=20)
Cos 11.64n ± 0% 11.61n ± 0% -0.26% (p=0.000 n=20)
Cosh 35.98n ± 0% 33.44n ± 0% -7.05% (p=0.000 n=20)
Erf 10.09n ± 0% 10.08n ± 0% -0.10% (p=0.000 n=20)
Erfc 11.40n ± 0% 11.35n ± 0% -0.44% (p=0.000 n=20)
Erfinv 12.31n ± 0% 12.29n ± 0% -0.16% (p=0.000 n=20)
Erfcinv 12.16n ± 0% 12.17n ± 0% +0.08% (p=0.000 n=20)
Exp 28.41n ± 0% 26.44n ± 0% -6.95% (p=0.000 n=20)
ExpGo 28.68n ± 0% 27.07n ± 0% -5.60% (p=0.000 n=20)
Expm1 17.21n ± 0% 16.75n ± 0% -2.67% (p=0.000 n=20)
Exp2 24.71n ± 0% 23.01n ± 0% -6.88% (p=0.000 n=20)
Exp2Go 25.17n ± 0% 23.91n ± 0% -4.99% (p=0.000 n=20)
Abs 0.8004n ± 0% 0.8004n ± 0% ~ (p=0.224 n=20)
Dim 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
Floor 2.848n ± 0% 2.859n ± 0% +0.39% (p=0.000 n=20)
Max 3.074n ± 0% 3.071n ± 0% ~ (p=0.481 n=20)
Min 3.179n ± 0% 3.176n ± 0% -0.09% (p=0.003 n=20)
Mod 49.62n ± 0% 44.82n ± 0% -9.67% (p=0.000 n=20)
Frexp 7.604n ± 0% 6.803n ± 0% -10.53% (p=0.000 n=20)
Gamma 18.01n ± 0% 17.61n ± 0% -2.22% (p=0.000 n=20)
Hypot 7.204n ± 0% 7.604n ± 0% +5.55% (p=0.000 n=20)
HypotGo 7.204n ± 0% 7.604n ± 0% +5.56% (p=0.000 n=20)
Ilogb 6.003n ± 0% 6.003n ± 0% ~ (p=0.407 n=20)
J0 76.43n ± 0% 76.24n ± 0% -0.25% (p=0.000 n=20)
J1 76.44n ± 0% 76.44n ± 0% ~ (p=1.000 n=20)
Jn 168.2n ± 0% 168.5n ± 0% +0.18% (p=0.000 n=20)
Ldexp 8.804n ± 0% 7.604n ± 0% -13.63% (p=0.000 n=20)
Lgamma 19.01n ± 0% 19.01n ± 0% ~ (p=0.695 n=20)
Log 19.38n ± 0% 19.12n ± 0% -1.34% (p=0.000 n=20)
Logb 6.003n ± 0% 6.003n ± 0% ~ (p=1.000 n=20)
Log1p 18.57n ± 0% 16.72n ± 0% -9.96% (p=0.000 n=20)
Log10 20.67n ± 0% 20.45n ± 0% -1.06% (p=0.000 n=20)
Log2 9.605n ± 0% 8.804n ± 0% -8.34% (p=0.000 n=20)
Modf 4.402n ± 0% 4.402n ± 0% ~ (p=1.000 n=20)
Nextafter32 7.204n ± 0% 5.603n ± 0% -22.22% (p=0.000 n=20)
Nextafter64 6.803n ± 0% 6.003n ± 0% -11.76% (p=0.000 n=20)
PowInt 39.62n ± 0% 37.22n ± 0% -6.06% (p=0.000 n=20)
PowFrac 120.9n ± 0% 108.9n ± 0% -9.93% (p=0.000 n=20)
Pow10Pos 1.601n ± 0% 1.601n ± 0% ~ (p=0.487 n=20)
Pow10Neg 2.675n ± 0% 2.675n ± 0% ~ (p=1.000 n=20)
Round 3.018n ± 0% 2.401n ± 0% -20.46% (p=0.000 n=20)
RoundToEven 3.822n ± 0% 3.001n ± 0% -21.48% (p=0.000 n=20)
Remainder 45.62n ± 0% 42.42n ± 0% -7.01% (p=0.000 n=20)
Signbit 0.9075n ± 0% 0.8004n ± 0% -11.81% (p=0.000 n=20)
Sin 12.65n ± 0% 12.65n ± 0% ~ (p=0.503 n=20)
Sincos 14.81n ± 0% 14.60n ± 0% -1.42% (p=0.000 n=20)
Sinh 36.75n ± 0% 35.11n ± 0% -4.46% (p=0.000 n=20)
SqrtIndirect 1.201n ± 0% 1.201n ± 0% ~ (p=1.000 n=20) ¹
SqrtLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtIndirectLatency 4.002n ± 0% 4.002n ± 0% ~ (p=1.000 n=20)
SqrtGoLatency 52.85n ± 0% 40.82n ± 0% -22.76% (p=0.000 n=20)
SqrtPrime 887.4n ± 0% 887.4n ± 0% ~ (p=0.751 n=20)
Tan 13.95n ± 0% 13.97n ± 0% +0.18% (p=0.000 n=20)
Tanh 36.79n ± 0% 34.89n ± 0% -5.16% (p=0.000 n=20)
Trunc 2.849n ± 0% 2.861n ± 0% +0.42% (p=0.000 n=20)
Y0 77.44n ± 0% 77.64n ± 0% +0.26% (p=0.000 n=20)
Y1 74.41n ± 0% 74.33n ± 0% -0.11% (p=0.000 n=20)
Yn 158.7n ± 0% 159.0n ± 0% +0.19% (p=0.000 n=20)
Float64bits 0.8774n ± 0% 0.4002n ± 0% -54.39% (p=0.000 n=20)
Float64frombits 0.8042n ± 0% 0.4002n ± 0% -50.24% (p=0.000 n=20)
Float32bits 1.1230n ± 0% 0.5336n ± 0% -52.48% (p=0.000 n=20)
Float32frombits 1.0670n ± 0% 0.8004n ± 0% -24.99% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.605 n=20)
geomean 10.87n 10.10n -7.15%
¹ all samples are equal
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
│ bench.old │ bench.new │
│ sec/op │ sec/op vs base │
Acos 33.10n ± 0% 31.95n ± 2% -3.46% (p=0.000 n=20)
Acosh 58.38n ± 0% 50.44n ± 0% -13.60% (p=0.000 n=20)
Asin 32.70n ± 0% 31.94n ± 0% -2.32% (p=0.000 n=20)
Asinh 57.65n ± 0% 50.83n ± 0% -11.82% (p=0.000 n=20)
Atan 14.21n ± 0% 14.21n ± 0% ~ (p=0.501 n=20)
Atanh 60.86n ± 0% 54.44n ± 0% -10.56% (p=0.000 n=20)
Atan2 32.02n ± 0% 34.02n ± 0% +6.25% (p=0.000 n=20)
Cbrt 55.58n ± 0% 40.64n ± 0% -26.88% (p=0.000 n=20)
Ceil 9.566n ± 0% 9.566n ± 0% ~ (p=0.463 n=20)
Copysign 0.8005n ± 0% 0.8005n ± 0% ~ (p=0.806 n=20)
Cos 18.02n ± 0% 18.02n ± 0% ~ (p=0.191 n=20)
Cosh 64.44n ± 0% 65.64n ± 0% +1.86% (p=0.000 n=20)
Erf 16.15n ± 0% 16.16n ± 0% ~ (p=0.770 n=20)
Erfc 18.71n ± 0% 18.83n ± 0% +0.61% (p=0.000 n=20)
Erfinv 19.33n ± 0% 19.34n ± 0% ~ (p=0.513 n=20)
Erfcinv 18.90n ± 0% 19.78n ± 0% +4.63% (p=0.000 n=20)
Exp 50.04n ± 0% 49.66n ± 0% -0.75% (p=0.000 n=20)
ExpGo 50.03n ± 0% 50.03n ± 0% ~ (p=0.723 n=20)
Expm1 28.41n ± 0% 28.27n ± 0% -0.49% (p=0.000 n=20)
Exp2 50.08n ± 0% 51.23n ± 0% +2.31% (p=0.000 n=20)
Exp2Go 49.77n ± 0% 49.89n ± 0% +0.24% (p=0.000 n=20)
Abs 0.8009n ± 0% 0.8006n ± 0% ~ (p=0.317 n=20)
Dim 1.987n ± 0% 1.993n ± 0% +0.28% (p=0.001 n=20)
Floor 8.543n ± 0% 8.548n ± 0% ~ (p=0.509 n=20)
Max 6.670n ± 0% 6.672n ± 0% ~ (p=0.335 n=20)
Min 6.694n ± 0% 6.694n ± 0% ~ (p=0.459 n=20)
Mod 56.44n ± 0% 53.23n ± 0% -5.70% (p=0.000 n=20)
Frexp 8.409n ± 0% 7.606n ± 0% -9.55% (p=0.000 n=20)
Gamma 35.64n ± 0% 35.23n ± 0% -1.15% (p=0.000 n=20)
Hypot 11.21n ± 0% 10.61n ± 0% -5.31% (p=0.000 n=20)
HypotGo 11.50n ± 0% 11.01n ± 0% -4.30% (p=0.000 n=20)
Ilogb 7.606n ± 0% 6.804n ± 0% -10.54% (p=0.000 n=20)
J0 125.3n ± 0% 126.5n ± 0% +0.96% (p=0.000 n=20)
J1 124.9n ± 0% 125.3n ± 0% +0.32% (p=0.000 n=20)
Jn 264.3n ± 0% 265.9n ± 0% +0.61% (p=0.000 n=20)
Ldexp 9.606n ± 0% 9.204n ± 0% -4.19% (p=0.000 n=20)
Lgamma 38.82n ± 0% 38.85n ± 0% +0.06% (p=0.019 n=20)
Log 38.44n ± 0% 28.04n ± 0% -27.06% (p=0.000 n=20)
Logb 8.405n ± 0% 7.605n ± 0% -9.52% (p=0.000 n=20)
Log1p 31.62n ± 0% 27.11n ± 0% -14.26% (p=0.000 n=20)
Log10 38.83n ± 0% 28.42n ± 0% -26.81% (p=0.000 n=20)
Log2 11.21n ± 0% 10.41n ± 0% -7.14% (p=0.000 n=20)
Modf 5.204n ± 0% 5.205n ± 0% ~ (p=0.983 n=20)
Nextafter32 8.809n ± 0% 7.208n ± 0% -18.18% (p=0.000 n=20)
Nextafter64 8.405n ± 0% 8.406n ± 0% +0.01% (p=0.007 n=20)
PowInt 48.83n ± 0% 44.78n ± 0% -8.28% (p=0.000 n=20)
PowFrac 146.9n ± 0% 142.1n ± 0% -3.23% (p=0.000 n=20)
Pow10Pos 2.334n ± 0% 2.333n ± 0% ~ (p=0.110 n=20)
Pow10Neg 4.803n ± 0% 4.803n ± 0% ~ (p=0.130 n=20)
Round 4.816n ± 0% 3.819n ± 0% -20.70% (p=0.000 n=20)
RoundToEven 5.735n ± 0% 5.204n ± 0% -9.26% (p=0.000 n=20)
Remainder 52.05n ± 0% 49.64n ± 0% -4.63% (p=0.000 n=20)
Signbit 1.201n ± 0% 1.001n ± 0% -16.65% (p=0.000 n=20)
Sin 20.63n ± 0% 20.64n ± 0% +0.05% (p=0.040 n=20)
Sincos 23.82n ± 0% 24.62n ± 0% +3.36% (p=0.000 n=20)
Sinh 71.25n ± 0% 68.44n ± 0% -3.94% (p=0.000 n=20)
SqrtIndirect 2.001n ± 0% 2.001n ± 0% ~ (p=0.182 n=20)
SqrtLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.754 n=20)
SqrtIndirectLatency 4.003n ± 0% 4.003n ± 0% ~ (p=0.773 n=20)
SqrtGoLatency 60.84n ± 0% 81.26n ± 0% +33.56% (p=0.000 n=20)
SqrtPrime 1.791µ ± 0% 1.791µ ± 0% ~ (p=0.784 n=20)
Tan 27.22n ± 0% 27.22n ± 0% ~ (p=0.819 n=20)
Tanh 70.88n ± 0% 69.04n ± 0% -2.60% (p=0.000 n=20)
Trunc 8.543n ± 0% 8.543n ± 0% ~ (p=0.784 n=20)
Y0 122.9n ± 0% 122.9n ± 0% ~ (p=0.559 n=20)
Y1 123.3n ± 0% 121.7n ± 0% -1.30% (p=0.000 n=20)
Yn 263.0n ± 0% 262.6n ± 0% -0.15% (p=0.000 n=20)
Float64bits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float64frombits 1.2010n ± 0% 0.6004n ± 0% -50.01% (p=0.000 n=20)
Float32bits 1.7010n ± 0% 0.8005n ± 0% -52.94% (p=0.000 n=20)
Float32frombits 1.5010n ± 0% 0.8005n ± 0% -46.67% (p=0.000 n=20)
FMA 2.001n ± 0% 2.001n ± 0% ~ (p=0.238 n=20)
geomean 17.41n 16.15n -7.19%
Change-Id: I0a0c263af2f07203eab1782e69c706f20c689d8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/604737
Auto-Submit: Tim King <taking@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Tim King <taking@google.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
2024-09-13 19:29:23 +00:00
Jorropo
9177e12ccc
cmd/compile: fix loong64 MINF → FMINF name and friends
...
CL 580283 left cmd/compile/internal/ssa/_gen/ in a state where `go run *.go` would always fails ! :'(
Change-Id: I0b3aea9b3f6275cb17c552898c5034e15f0107d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/603995
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2024-08-07 20:28:11 +00:00
Xiaolin Zhao
e705a2d16e
cmd/compile, math: make math.{Abs,Copysign} intrinsics on loong64
...
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Copysign 1.9710n ± 0% 0.8006n ± 0% -59.38% (p=0.000 n=10)
Abs 1.8745n ± 0% 0.8006n ± 0% -57.29% (p=0.000 n=10)
geomean 1.922n 0.8006n -58.35%
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A5000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Copysign 2.4020n ± 0% 0.9006n ± 0% -62.51% (p=0.000 n=10)
Abs 2.4020n ± 0% 0.8005n ± 0% -66.67% (p=0.000 n=10)
geomean 2.402n 0.8491n -64.65%
Updates #59120 .
Change-Id: Ic409e1f4d15ad15cb3568a5aaa100046e9302842
Reviewed-on: https://go-review.googlesource.com/c/go/+/580280
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: David Chase <drchase@google.com>
2024-08-07 01:16:42 +00:00
Xiaolin Zhao
ff14e08cd3
cmd/compile, math: improve implementation of math.{Max,Min} on loong64
...
Make math.{Min,Max} intrinsics and implement math.{archMax,archMin}
in hardware.
goos: linux
goarch: loong64
pkg: math
cpu: Loongson-3A6000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 7.606n ± 0% 3.087n ± 0% -59.41% (p=0.000 n=20)
Min 7.205n ± 0% 2.904n ± 0% -59.69% (p=0.000 n=20)
MinFloat 37.220n ± 0% 4.802n ± 0% -87.10% (p=0.000 n=20)
MaxFloat 33.620n ± 0% 4.802n ± 0% -85.72% (p=0.000 n=20)
geomean 16.18n 3.792n -76.57%
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3A5000 @ 2500.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Max 10.010n ± 0% 7.196n ± 0% -28.11% (p=0.000 n=20)
Min 8.806n ± 0% 7.155n ± 0% -18.75% (p=0.000 n=20)
MinFloat 60.010n ± 0% 7.976n ± 0% -86.71% (p=0.000 n=20)
MaxFloat 56.410n ± 0% 7.980n ± 0% -85.85% (p=0.000 n=20)
geomean 23.37n 7.566n -67.63%
Updates #59120 .
Change-Id: I6815d20bc304af3cbf5d6ca8fe0ca1c2ddebea2d
Reviewed-on: https://go-review.googlesource.com/c/go/+/580283
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Qiqi Huang <huangqiqi@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2024-08-07 01:16:28 +00:00
Guoqi Chen
6b77d1b736
cmd/compile: update loong64 CALL* ops
...
allow the loong64 CALL* ops to take variable number of args
Update #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: I4706d9651fcbf9a0f201af6820c97b1a924f14e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521781
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: David Chase <drchase@google.com>
2023-11-21 19:04:19 +00:00
Guoqi Chen
ebca52eeb7
cmd/compile/internal: add register info for loong64 regABI
...
Update #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ifd7d94147b01e4fc83978b53dca2bcc0ad1ac4e3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521779
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Auto-Submit: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
2023-11-21 19:04:14 +00:00
Guoqi Chen
070139a130
cmd/compile,cmd/internal,runtime: change registers on loong64 to avoid regABI arguments
...
Update #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: Ic7e2e7fb4c1d3670e6abbfb817aa6e4e654e08d3
Reviewed-on: https://go-review.googlesource.com/c/go/+/521777
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Than McIntosh <thanm@google.com>
Auto-Submit: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2023-11-21 17:59:37 +00:00
Guoqi Chen
f43581131e
cmd/compile, cmd/internal, runtime: change the registers used by the duff device for loong64
...
Add R21 to the allocatable registers, use R20 and R21 in duff
device. This CL is in preparation for subsequent regABI support.
Updates #40724
Co-authored-by: Xiaolin Zhao <zhaoxiaolin@loongson.cn>
Change-Id: If1661adc0f766925fbe74827a369797f95fa28a9
Reviewed-on: https://go-review.googlesource.com/c/go/+/521775
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: Than McIntosh <thanm@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-11-21 17:42:40 +00:00
Guoqi Chen
3754ca0af2
cmd/compile: improve the implementation of Lowered{Move,Zero} on linux/loong64
...
Like the CL 487295, when implementing Lowered{Move,Zero}, 8 is first subtracted
from Rarg0 (parameter Ptr), and then the offset of 8 is added during subsequent
operations on Rarg0. This operation is meaningless, so delete it.
Change LoweredMove's Rarg0 register to R20, consistent with duffcopy.
goos: linux
goarch: loong64
pkg: runtime
cpu: Loongson-3C5000 @ 2200.00MHz
│ old.bench │ new.bench │
│ sec/op │ sec/op vs base │
Memmove/15 19.10n ± 0% 19.10n ± 0% ~ (p=0.483 n=15)
MemmoveUnalignedDst/15 25.02n ± 0% 25.02n ± 0% ~ (p=0.741 n=15)
MemmoveUnalignedDst/32 48.22n ± 0% 48.22n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedDst/64 90.57n ± 0% 90.52n ± 0% ~ (p=0.212 n=15)
MemmoveUnalignedDstOverlap/32 44.12n ± 0% 44.13n ± 0% +0.02% (p=0.000 n=15)
MemmoveUnalignedDstOverlap/64 87.79n ± 0% 87.80n ± 0% +0.01% (p=0.002 n=15)
MemmoveUnalignedSrc/0 3.639n ± 0% 3.639n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/1 7.733n ± 0% 7.733n ± 0% ~ (p=1.000 n=15)
MemmoveUnalignedSrc/2 9.097n ± 0% 9.097n ± 0% ~ (p=1.000 n=15)
MemmoveUnalignedSrc/3 10.46n ± 0% 10.46n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/4 11.83n ± 0% 11.83n ± 0% ~ (p=1.000 n=15) ¹
MemmoveUnalignedSrc/64 93.71n ± 0% 93.70n ± 0% ~ (p=0.128 n=15)
Memclr/4096 699.1n ± 0% 699.1n ± 0% ~ (p=0.682 n=15)
Memclr/65536 11.18µ ± 0% 11.18µ ± 0% -0.01% (p=0.000 n=15)
Memclr/1M 175.2µ ± 0% 175.2µ ± 0% ~ (p=0.191 n=15)
Memclr/4M 661.8µ ± 0% 662.0µ ± 0% ~ (p=0.486 n=15)
MemclrUnaligned/4_5 19.39n ± 0% 20.47n ± 0% +5.57% (p=0.000 n=15)
MemclrUnaligned/4_16 22.29n ± 0% 21.38n ± 0% -4.08% (p=0.000 n=15)
MemclrUnaligned/4_64 30.58n ± 0% 29.81n ± 0% -2.52% (p=0.000 n=15)
MemclrUnaligned/4_65536 11.19µ ± 0% 11.20µ ± 0% +0.02% (p=0.000 n=15)
GoMemclr/5 12.73n ± 0% 12.73n ± 0% ~ (p=0.261 n=15)
GoMemclr/16 10.01n ± 0% 10.00n ± 0% ~ (p=0.264 n=15)
GoMemclr/256 50.94n ± 0% 50.94n ± 0% ~ (p=0.372 n=15)
ClearFat15 14.95n ± 0% 15.01n ± 4% ~ (p=0.925 n=15)
ClearFat1032 125.5n ± 0% 125.6n ± 0% +0.08% (p=0.000 n=15)
CopyFat64 10.58n ± 0% 10.01n ± 0% -5.39% (p=0.000 n=15)
CopyFat1040 244.3n ± 0% 155.6n ± 0% -36.31% (p=0.000 n=15)
Issue18740/2byte 29.82µ ± 0% 29.82µ ± 0% ~ (p=0.648 n=30)
Issue18740/4byte 18.18µ ± 0% 18.18µ ± 0% -0.02% (p=0.001 n=30)
Issue18740/8byte 8.395µ ± 0% 8.395µ ± 0% ~ (p=0.401 n=30)
geomean 154.5n 151.8n -1.70%
¹ all samples are equal
Change-Id: Ia3f3c8b25e1e93c97ab72328651de78ca9dec016
Reviewed-on: https://go-review.googlesource.com/c/go/+/488515
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Bryan Mills <bcmills@google.com>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-20 00:01:44 +00:00
Guoqi Chen
06f420fc19
runtime: remove the meaningless offset of 8 for duffzero on loong64
...
Currently we subtract 8 from offset when calling duffzero because 8
is added to offset in the duffzero implementation. This operation is
meaningless, so remove it.
Change-Id: I7e451d04d7e98ccafe711645d81d3aadf376766f
Reviewed-on: https://go-review.googlesource.com/c/go/+/487295
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Meidan Li <limeidan@loongson.cn>
Reviewed-by: WANG Xuerui <git@xen0n.name>
Run-TryBot: WANG Xuerui <git@xen0n.name>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Reviewed-by: Carlos Amedee <carlos@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Auto-Submit: Ian Lance Taylor <iant@golang.org>
2023-09-01 15:48:45 +00:00
Wayne Zuo
96428e160d
cmd/compile: split DIVV/DIVVU op on loong64
...
Previously, we need calculate both quotient and remainder together.
However, in most cases, only one result is needed. By separating these
instructions, we can save one instruction in most cases.
Change-Id: I0a2d4167cda68ab606783ba1aa2720ede19d6b53
Reviewed-on: https://go-review.googlesource.com/c/go/+/475315
Reviewed-by: Than McIntosh <thanm@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: abner chenc <chenguoqi@loongson.cn>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2023-04-11 01:59:02 +00:00
Wayne Zuo
14015be5bb
cmd/compile: optimize multiplication on loong64
...
Previously, multiplication on loong64 architecture was performed using
MULV and MULHVU instructions to calculate the low 64-bit and high
64-bit of a multiplication respectively. However, in most cases, only
the low 64-bits are needed. This commit enalbes only computating the low
64-bit result with the MULV instruction.
Reduce the binary size slightly.
file before after Δ %
addr2line 2833777 2833849 +72 +0.003%
asm 5267499 5266963 -536 -0.010%
buildid 2579706 2579402 -304 -0.012%
cgo 4798260 4797444 -816 -0.017%
compile 25247419 25175030 -72389 -0.287%
cover 4973091 4972027 -1064 -0.021%
dist 3631013 3565653 -65360 -1.800%
doc 4076036 4074004 -2032 -0.050%
fix 3496378 3496066 -312 -0.009%
link 6984102 6983214 -888 -0.013%
nm 2743820 2743516 -304 -0.011%
objdump 4277171 4277035 -136 -0.003%
pack 2379248 2378872 -376 -0.016%
pprof 14419090 14419874 +784 +0.005%
test2json 2684386 2684018 -368 -0.014%
trace 13640018 13631034 -8984 -0.066%
vet 7748918 7752630 +3712 +0.048%
go 15643850 15638098 -5752 -0.037%
total 127423782 127268729 -155053 -0.122%
Change-Id: Ifce4a9a3ed1d03c170681e39cb6f3541db9882dc
Reviewed-on: https://go-review.googlesource.com/c/go/+/472775
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-03-03 01:33:00 +00:00
Keith Randall
21d82e6ac8
cmd/compile: batch write barrier calls
...
Have the write barrier call return a pointer to a buffer into which
the generated code records pointers that need write barrier treatment.
Change-Id: I7871764298e0aa1513de417010c8d46b296b199e
Reviewed-on: https://go-review.googlesource.com/c/go/+/447781
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Bypass: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2023-02-24 00:21:13 +00:00
Keith Randall
45dc81d856
cmd/compile: add memory argument to GetCallerSP
...
We need to make sure that when we get the stack pointer, we get it
at the right time.
V = GetCallerSP
Call()
W = GetCallerSP
If Call causes a stack growth, then we will be in a situation
where V != W. So it matters when GetCallerSP operations get scheduled.
Add a memory argument to GetCallerSP so it can't be reordered with
things like calls.
Change-Id: I6cc801134c38e358c5a1ec0c09d38379a16a4184
Reviewed-on: https://go-review.googlesource.com/c/go/+/453515
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Robert Griesemer <gri@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-01-19 22:43:22 +00:00
Wayne Zuo
ffc4496306
cmd/compile: remove output registers limit for MUL/DIV on loong64
...
This limitation exists on MIPS platform, but not on loong64.
Change-Id: I14bb3ec6895a8f7850873c171e1756843ffea72e
Reviewed-on: https://go-review.googlesource.com/c/go/+/449395
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2022-11-11 01:35:11 +00:00
Wayne Zuo
268f4629df
cmd/compile: enable brachelim pass on loong64
...
Change-Id: I4fd1c307901c265ab9865bf8a74460ddc15e5d14
Reviewed-on: https://go-review.googlesource.com/c/go/+/416735
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: xiaodong liu <teaofmoli@gmail.com>
Auto-Submit: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
2022-11-09 06:10:55 +00:00
Guoqi Chen
8a9e2d9d49
cmd/compile: add missing tail calls flag for linux/loong64
...
Set the value of the variable tailCall to true and prevent
allocating or clobber the linker register.
Change-Id: I4ec19c67056cb99196911aa7c0054be89ab7eb8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/414954
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: WANG Xuerui <git@xen0n.name>
2022-10-29 03:14:03 +00:00
Russ Cox
164406ad93
cmd/compile: rename gen and builtin to _gen and _builtin
...
These two directories are full of //go:build ignore files.
We can ignore them more easily by putting an underscore
at the start of the name. That also works around a bug
in Go 1.17 that was not fixed until Go 1.17.3.
Change-Id: Ia5389b65c79b1e6d08e4fef374d335d776d44ead
Reviewed-on: https://go-review.googlesource.com/c/go/+/435472
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-04 19:35:46 +00:00