Stowage/go - Remotebranch.eu

Stowage/go

mirror of https://github.com/golang/go.git synced 2025-12-08 06:10:04 +00:00

Author	SHA1	Message	Date
Meng Zhuo	09ed9a6585	cmd/compile: implement float min/max in hardware for riscv64 CL 514596 adds float min/max for amd64, this CL adds it for riscv64. The behavior of the RISC-V FMIN/FMAX instructions almost match Go's requirements. However according to RISCV spec 8.3 "NaN Generation and Propagation" >> if at least one input is a signaling NaN, or if both inputs are quiet >> NaNs, the result is the canonical NaN. If one operand is a quiet NaN >> and the other is not a NaN, the result is the non-NaN operand. Go using quiet NaN as NaN and according to Go spec >> if any argument is a NaN, the result is a NaN This requires the float min/max implementation to check whether one of operand is qNaN before float mix/max actually execute. This CL also fix a typo in minmax test. Benchmark on Visionfive2 goos: linux goarch: riscv64 pkg: runtime │ float_minmax.old.bench │ float_minmax.new.bench │ │ sec/op │ sec/op vs base │ MinFloat 158.20n ± 0% 28.13n ± 0% -82.22% (p=0.000 n=10) MaxFloat 158.10n ± 0% 28.12n ± 0% -82.21% (p=0.000 n=10) geomean 158.1n 28.12n -82.22% Update #59488 Change-Id: Iab48be6d32b8882044fb8c821438ca8840e5493d Reviewed-on: https://go-review.googlesource.com/c/go/+/514775 Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com> Run-TryBot: M Zhuo <mengzhuo1203@gmail.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2024-01-26 01:41:50 +00:00
Joel Sing	70c7fb75e9	cmd/compile: correct code generation for right shifts on riscv64 The code generation on riscv64 will currently result in incorrect assembly when a 32 bit integer is right shifted by an amount that exceeds the size of the type. In particular, this occurs when an int32 or uint32 is cast to a 64 bit type and right shifted by a value larger than 31. Fix this by moving the SRAW/SRLW conversion into the right shift rules and removing the SignExt32to64/ZeroExt32to64. Add additional rules that rewrite to SRAIW/SRLIW when the shift is less than the size of the type, or replace/eliminate the shift when it exceeds the size of the type. Add SSA tests that would have caught this issue. Also add additional codegen tests to ensure that the resulting assembly is what we expect in these overflow cases. Fixes #64285 Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec Reviewed-on: https://go-review.googlesource.com/c/go/+/545035 Reviewed-by: Russ Cox <rsc@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Mark Ryan <markdryan@rivosinc.com> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-12-01 19:30:59 +00:00
Ubuntu	8fc043ccfa	cmd/compile: optimize right shifts of int32 on riscv64 The compiler is currently sign extending 32 bit signed integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit signed values (sraw and sraiw) which sign extend the result of the shift to 64 bits. Change the compiler so that it uses sraw and sraiw for shifts of signed 32 bit integers reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: int32(a) >> 2 before: sll x5,x10,0x20 sra x10,x5,0x22 after: sraw x10,x10,0x2 int32(v) >> int(s) before: sext.w x5,x10 sltiu x6,x11,64 add x6,x6,-1 or x6,x11,x6 sra x10,x5,x6 after: sltiu x5,x11,32 add x5,x5,-1 or x5,x11,x5 sraw x10,x10,x5 int32(v) >> (int(s) & 31) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,31 sraw x10,x10,x5 int32(100) >> int(a) before: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,64 add x5,x5,-1 or x5,x10,x5 li x6,100 sra x10,x6,x5 after: bltz x10,<target address calls runtime.panicshift> sltiu x5,x10,32 add x5,x5,-1 or x5,x10,x5 li x6,100 sraw x10,x6,x5 int32(v) >> (int(s) & 63) before: sext.w x5,x10 and x6,x11,63 sra x10,x5,x6 after: and x5,x11,63 sltiu x6,x5,32 add x6,x6,-1 or x5,x5,x6 sraw x10,x10,x5 In most cases we eliminate one instruction. In the case where we shift a int32 constant by a variable the number of instructions generated is identical. A sra is simply replaced by a sraw. In the unusual case where we shift right by a variable anded with a constant > 31 but < 64, we generate two additional instructions. As this is an unusual case we do not try to optimize for it. Some improvements can be seen in some of the existing benchmarks, notably in the utf8 package which performs right shifts of runes which are signed 32 bit integers. \| utf8-old \| utf8-new \| \| sec/op \| sec/op vs base \| EncodeASCIIRune-4 17.68n ± 0% 17.67n ± 0% ~ (p=0.312 n=10) EncodeJapaneseRune-4 35.34n ± 0% 34.53n ± 1% -2.31% (p=0.000 n=10) AppendASCIIRune-4 3.213n ± 0% 3.213n ± 0% ~ (p=0.318 n=10) AppendJapaneseRune-4 36.14n ± 0% 35.35n ± 0% -2.19% (p=0.000 n=10) DecodeASCIIRune-4 28.11n ± 0% 27.36n ± 0% -2.69% (p=0.000 n=10) DecodeJapaneseRune-4 38.55n ± 0% 38.58n ± 0% ~ (p=0.612 n=10) Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2 Reviewed-on: https://go-review.googlesource.com/c/go/+/535115 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com>	2023-10-30 14:47:06 +00:00
Mark Ryan	fce6be15cc	cmd/compile: regenerate rewriteRISCV64.go to match cl 528975 The final revision of https://go-review.googlesource.com/c/go/+/528975 made a small change to the RISCV64.rules file but neglected to update the regenerated rewriteRISCV64.go file. Change-Id: I04599f4e3b0dac7102c54166c9bae6fc9b6621d1 Reviewed-on: https://go-review.googlesource.com/c/go/+/533815 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>	2023-10-09 22:19:13 +00:00
Joel Sing	f711892a8a	cmd/compile/internal: stop lowering OpConvert on riscv64 Lowering for OpConvert was removed for all architectures in CL#108496, prior to the riscv64 port being upstreamed. Remove lowering of OpConvert on riscv64, which brings it inline with all other architectures. This results in 1,600+ instructions being removed from the riscv64 go binary. Change-Id: Iaaf1f8b397875926604048b66ad8ac91a98c871e Reviewed-on: https://go-review.googlesource.com/c/go/+/533335 Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Pratt <mpratt@google.com>	2023-10-07 12:31:59 +00:00
Mark Ryan	561bf0457f	cmd/compile: optimize right shifts of uint32 on riscv The compiler is currently zero extending 32 bit unsigned integers to 64 bits before right shifting them using a 64 bit shift instruction. There's no need to do this as RISC-V has instructions for right shifting 32 bit unsigned values (srlw and srliw) which zero extend the result of the shift to 64 bits. Change the compiler so that it uses srlw and srliw for 32 bit unsigned shifts reducing in most cases the number of instructions needed to perform the shift. Here are some examples of code sequences that are changed by this patch: uint32(a) >> 2 before: sll x5,x10,0x20 srl x10,x5,0x22 after: srlw x10,x10,0x2 uint32(a) >> int(b) before: sll x5,x10,0x20 srl x5,x5,0x20 srl x5,x5,x11 sltiu x6,x11,64 neg x6,x6 and x10,x5,x6 after: srlw x5,x10,x11 sltiu x6,x11,32 neg x6,x6 and x10,x5,x6 bits.RotateLeft32(uint32(a), 1) before: sll x5,x10,0x1 sll x6,x10,0x20 srl x7,x6,0x3f or x5,x5,x7 after: sll x5,x10,0x1 srlw x6,x10,0x1f or x10,x5,x6 bits.RotateLeft32(uint32(a), int(b)) before: and x6,x11,31 sll x7,x10,x6 sll x8,x10,0x20 srl x8,x8,0x20 add x6,x6,-32 neg x6,x6 srl x9,x8,x6 sltiu x6,x6,64 neg x6,x6 and x6,x9,x6 or x6,x6,x7 after: and x5,x11,31 sll x6,x10,x5 add x5,x5,-32 neg x5,x5 srlw x7,x10,x5 sltiu x5,x5,32 neg x5,x5 and x5,x7,x5 or x10,x6,x5 The one regression observed is the following case, an unbounded right shift of a uint32 where the value we're shifting by is known to be < 64 but > 31. As this is an unusual case this commit does not optimize for it, although the existing code does. uint32(a) >> (b & 63) before: sll x5,x10,0x20 srl x5,x5,0x20 and x6,x11,63 srl x10,x5,x6 after and x5,x11,63 srlw x6,x10,x5 sltiu x5,x5,32 neg x5,x5 and x10,x6,x5 Here we have one extra instruction. Some benchmark highlights, generated on a VisionFive2 8GB running Ubuntu 23.04. pkg: math/bits LeadingZeros32-4 18.64n ± 0% 17.32n ± 0% -7.11% (p=0.000 n=10) LeadingZeros64-4 15.47n ± 0% 15.51n ± 0% +0.26% (p=0.027 n=10) TrailingZeros16-4 18.48n ± 0% 17.68n ± 0% -4.33% (p=0.000 n=10) TrailingZeros32-4 16.87n ± 0% 16.07n ± 0% -4.74% (p=0.000 n=10) TrailingZeros64-4 15.26n ± 0% 15.27n ± 0% +0.07% (p=0.043 n=10) OnesCount32-4 20.08n ± 0% 19.29n ± 0% -3.96% (p=0.000 n=10) RotateLeft-4 8.864n ± 0% 8.838n ± 0% -0.30% (p=0.006 n=10) RotateLeft32-4 8.837n ± 0% 8.032n ± 0% -9.11% (p=0.000 n=10) Reverse32-4 29.77n ± 0% 26.52n ± 0% -10.93% (p=0.000 n=10) ReverseBytes32-4 9.640n ± 0% 8.838n ± 0% -8.32% (p=0.000 n=10) Sub32-4 8.835n ± 0% 8.035n ± 0% -9.06% (p=0.000 n=10) geomean 11.50n 11.33n -1.45% pkg: crypto/md5 Hash8Bytes-4 1.486µ ± 0% 1.426µ ± 0% -4.04% (p=0.000 n=10) Hash64-4 2.079µ ± 0% 1.968µ ± 0% -5.36% (p=0.000 n=10) Hash128-4 2.720µ ± 0% 2.557µ ± 0% -5.99% (p=0.000 n=10) Hash256-4 3.996µ ± 0% 3.733µ ± 0% -6.58% (p=0.000 n=10) Hash512-4 6.541µ ± 0% 6.072µ ± 0% -7.18% (p=0.000 n=10) Hash1K-4 11.64µ ± 0% 10.75µ ± 0% -7.58% (p=0.000 n=10) Hash8K-4 82.95µ ± 0% 76.32µ ± 0% -7.99% (p=0.000 n=10) Hash1M-4 10.436m ± 0% 9.591m ± 0% -8.10% (p=0.000 n=10) Hash8M-4 83.50m ± 0% 76.73m ± 0% -8.10% (p=0.000 n=10) Hash8BytesUnaligned-4 1.494µ ± 0% 1.434µ ± 0% -4.02% (p=0.000 n=10) Hash1KUnaligned-4 11.64µ ± 0% 10.76µ ± 0% -7.52% (p=0.000 n=10) Hash8KUnaligned-4 83.01µ ± 0% 76.32µ ± 0% -8.07% (p=0.000 n=10) geomean 28.32µ 26.42µ -6.72% Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693 Reviewed-on: https://go-review.googlesource.com/c/go/+/528975 Reviewed-by: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-10-07 12:31:38 +00:00
Xianmiao Qu	d98f74b31e	cmd/compile/internal: intrinsify publicationBarrier on riscv64 This enables publicationBarrier to be used as an intrinsic on riscv64, optimizing the required function call and return instructions for invoking the "runtime.publicationBarrier" function. This function is called by mallocgc. The benchmark results for malloc tested on Lichee-Pi-4A(TH1520, RISC-V 2.0G C910 x4) are as follows. goos: linux goarch: riscv64 pkg: runtime │ old.txt │ new.txt │ │ sec/op │ sec/op vs base │ Malloc8-4 92.78n ± 1% 90.77n ± 1% -2.17% (p=0.001 n=10) Malloc16-4 156.5n ± 1% 151.7n ± 2% -3.10% (p=0.000 n=10) MallocTypeInfo8-4 131.7n ± 1% 130.6n ± 2% ~ (p=0.165 n=10) MallocTypeInfo16-4 186.5n ± 2% 186.2n ± 1% ~ (p=0.956 n=10) MallocLargeStruct-4 1.345µ ± 1% 1.355µ ± 1% ~ (p=0.093 n=10) geomean 216.9n 214.5n -1.10% Change-Id: Ieab6c02309614bac5c1b12b5ee3311f988ff644d Reviewed-on: https://go-review.googlesource.com/c/go/+/531719 Reviewed-by: Michael Pratt <mpratt@google.com> Auto-Submit: Michael Pratt <mpratt@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2023-10-03 19:29:38 +00:00
Meng Zhuo	63ab68ddc5	cmd/compile: add single-precision FMA code generation for riscv64 This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066 Reviewed-on: https://go-review.googlesource.com/c/go/+/506616 Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: M Zhuo <mzh@golangcn.org>	2023-08-22 12:05:36 +00:00
Meng Zhuo	05f9511582	cmd/compile: improve FP FMA performance on riscv64 FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can be used by the compiler to improve FP performance. Erf 188.0n ± 2% 139.5n ± 2% -25.82% (p=0.000 n=10) Erfc 193.6n ± 1% 143.2n ± 1% -26.01% (p=0.000 n=10) Erfinv 244.4n ± 2% 172.6n ± 0% -29.40% (p=0.000 n=10) Erfcinv 244.7n ± 2% 173.0n ± 1% -29.31% (p=0.000 n=10) geomean 216.0n 156.3n -27.65% Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA 11.6 Single-Precision Floating-Point Computational Instructions Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c Reviewed-on: https://go-review.googlesource.com/c/go/+/506036 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-08-22 08:38:08 +00:00
Joel Sing	33da4ce457	cmd/compile: sign or zero extend for 32 bit equality on riscv64 For 32 bit equality (Eq32), rather than always zero extending to 64 bits, sign extend for signed types and zero extend for unsigned types. This makes no difference to the equality test (via SUB), however it increases the likelihood of avoiding unnecessary sign or zero extension simply for the purpose of equality testing. While here, replace the Neq* rules with (Not (Eq*)) - this makes no difference to the generated code (as the intermediates get expanded and eliminated), however it means that changes to the equality rules also reflect in the inequality rules. As an example, the following: lw t0,956(t0) slli t0,t0,0x20 srli t0,t0,0x20 li t1,1 bne t1,t0,278fc Becomes: lw t0,1024(t0) li t1,1 bne t1,t0,278b0 Removes almost 1000 instructions from the Go binary on riscv64. Change-Id: Iac60635f494f6db87faa47752bd1cc16e6b5967f Reviewed-on: https://go-review.googlesource.com/c/go/+/516595 Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: M Zhuo <mzh@golangcn.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2023-08-15 03:29:11 +00:00
Meng Zhuo	3fce111535	cmd/compile: fix FMA negative commutativity of riscv64 According to RISCV manual 11.6: FMADD x,y,z computes xy+z and FNMADD x,y,z => -xy-z FMSUB x,y,z => xy-z FNMSUB x,y,z => -xy+z respectively However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which is wrong and should be convert to FNMSUB according to manual. Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575 Reviewed-on: https://go-review.googlesource.com/c/go/+/506575 Run-TryBot: M Zhuo <mzh@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2023-07-05 22:05:44 +00:00
Keith Randall	a3f3868c7a	cmd/compile: replace isSigned(t) with t.IsSigned() No change in semantics, just removing an unneeded helper. Also align rules a bit. Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa Reviewed-on: https://go-review.googlesource.com/c/go/+/483439 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2023-04-10 17:07:24 +00:00
Keith Randall	60140a86b3	cmd/compile: clean up store rules to use store type, not argument type Argument type is dangerous because it may be thinner than the actual store being issued. Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa Reviewed-on: https://go-review.googlesource.com/c/go/+/483438 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: David Chase <drchase@google.com>	2023-04-10 17:06:55 +00:00
Keith Randall	21f434058c	cmd/compile: ensure constant folding of pointer arithmetic remains a pointer For c + nil, we want the result to still be of pointer type. Fixes ppc64le build failure with CL 468455, in issue33724.go. The problem in that test is that it requires a nil check to be scheduled before the corresponding load. This normally happens fine because we prioritize nil checks. If we have nilcheck(p) and load(p), once p is scheduled the nil check will always go before the load. The issue we saw in 33724 is that when p is a nil pointer, we ended up with two different p's, an int64(0) as the argument to the nil check and an (Outer)(0) as the argument to the load. Those two zeroes don't get CSEd, so if the (Outer)(0) happens to get scheduled first, the load can end up before the nilcheck. Fix this by always having constant arithmetic preserve the pointerness of the value, so that both zeroes are of type *Outer and get CSEd. Update #58482 Update #33724 Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513 Reviewed-on: https://go-review.googlesource.com/c/go/+/468615 Reviewed-by: Keith Randall <khr@google.com> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: David Chase <drchase@google.com> TryBot-Bypass: Keith Randall <khr@golang.org>	2023-02-17 03:56:57 +00:00
Keith Randall	f959fb3872	cmd/compile: add anchored version of SP The SPanchored opcode is identical to SP, except that it takes a memory argument so that it (and more importantly, anything that uses it) must be scheduled at or after that memory argument. This opcode ensures that a LEAQ of a variable gets scheduled after the corresponding VARDEF for that variable. This may lead to less CSE of LEAQ operations. The effect is very small. The go binary is only 80 bytes bigger after this CL. Usually LEAQs get folded into load/store operations, so the effect is only for pointerful types, large enough to need a duffzero, and have their address passed somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because the two uses are on different sides of a function call and the LEAQ ends up being rematerialized at the second use anyway. Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293 Reviewed-on: https://go-review.googlesource.com/c/go/+/452916 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Martin Möhrmann <moehrmann@google.com> Reviewed-by: Martin Möhrmann <martin@golang.org> Reviewed-by: Keith Randall <khr@google.com>	2023-01-19 22:43:12 +00:00
Dmitri Shuralyov	47a0d46716	cmd/compile/internal/ssa: generate code via a //go:generate directive The standard way to generate code in a Go package is via //go:generate directives, which are invoked by the developer explicitly running: go generate import/path/of/said/package Switch to using that approach here. This way, developers don't need to learn and remember a custom way that each particular Go package may choose to implement its code generation. It also enables conveniences such as 'go generate -n' to discover how code is generated without running anything (this works on all packages that rely on //go:generate directives), being able to generate multiple packages at once and from any directory, and so on. Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7 Reviewed-on: https://go-review.googlesource.com/c/go/+/460135 Reviewed-by: Dmitri Shuralyov <dmitshur@google.com> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org> Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>	2023-01-19 22:42:34 +00:00
Guoqi Chen	0b2ad1d815	cmd/compile: sign-extend the 2nd argument of the LoweredAtomicCas32 on loong64,mips64x,riscv64 The function LoweredAtomicCas32 is implemented using the LL-SC instruction pair on loong64, mips64x, riscv64. However,the LL instruction on loong64, mips64x, riscv64 is sign-extended, so it is necessary to sign-extend the 2nd parameter "old" of the LoweredAtomicCas32, so that the instruction BNE after LL can get the desired result. The function prototype of LoweredAtomicCas32 in golang: func Cas32(ptr uint32, old, new uint32) bool When using an intrinsify implementation: case 1: (ptr) <= 0x80000000 && old < 0x80000000 E.g: (ptr) = 0x7FFFFFFF, old = Rarg1= 0x7FFFFFFF After run the instruction "LL (Rarg0), Rtmp": Rtmp = 0x7FFFFFFF Rtmp ! = Rarg1(old) is false, the result we expect case 2: (ptr) >= 0x80000000 && old >= 0x80000000 E.g: (*ptr) = 0x80000000, old = Rarg1= 0x80000000 After run the instruction "LL (Rarg0), Rtmp": Rtmp = 0xFFFFFFFF_80000000 Rtmp ! = Rarg1(old) is true, which we do not expect When using an non-intrinsify implementation: Because Rarg1 is loaded from the stack using sign-extended instructions ld.w, the situation described in Case 2 above does not occur Benchmarks on linux/loong64: name old time/op new time/op delta Cas 50.0ns ± 0% 50.1ns ± 0% ~ (p=1.000 n=1+1) Cas64 50.0ns ± 0% 50.1ns ± 0% ~ (p=1.000 n=1+1) Cas-4 56.0ns ± 0% 56.0ns ± 0% ~ (p=1.000 n=1+1) Cas64-4 56.0ns ± 0% 56.0ns ± 0% ~ (p=1.000 n=1+1) Benchmarks on Loongson 3A4000 (GOARCH=mips64le, 1.8GHz) name old time/op new time/op delta Cas 70.4ns ± 0% 70.3ns ± 0% ~ (p=1.000 n=1+1) Cas64 70.7ns ± 0% 70.6ns ± 0% ~ (p=1.000 n=1+1) Cas-4 81.1ns ± 0% 80.8ns ± 0% ~ (p=1.000 n=1+1) Cas64-4 80.9ns ± 0% 80.9ns ± 0% ~ (p=1.000 n=1+1) Fixes #57282 Change-Id: I190a7fc648023b15fa392f7fdda5ac18c1561bac Reviewed-on: https://go-review.googlesource.com/c/go/+/457135 Run-TryBot: Than McIntosh <thanm@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Matthew Dempsky <mdempsky@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Than McIntosh <thanm@google.com> Reviewed-by: David Chase <drchase@google.com>	2022-12-17 01:12:22 +00:00
Johan Brandhorst-Satzkorn	85196fc982	cmd/internal/ssa: correct references to _gen folder The gen folder was renamed to _gen in CL 435472, but references in code and docs were not updated. This updates the references. Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5 Reviewed-on: https://go-review.googlesource.com/c/go/+/444355 Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-10-23 17:42:11 +00:00
Joel Sing	4274ffd4b8	cmd/compile: fold negation into subtraction on riscv64 Fold negation into subtraction and avoid double negation. This removes around 500 instructions from the Go binary on riscv64. Change-Id: I4aac6c87baa2a0759b180ba87876d488a23df6d7 Reviewed-on: https://go-review.googlesource.com/c/go/+/431105 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-10-11 04:04:13 +00:00
Joel Sing	ba8c94b5f2	cmd/compile: convert SLT/SLTU with constant into immediate form on riscv64 Convert SLT/SLTU with a suitably valued constant into a SLTI/SLTIU instruction. This can reduce instructions and avoid register loads. Now that we generate more SLTI/SLTIU instructions, absorb these into branches when it makes sense to do so. Removes more than 800 instructions from the Go binary on linux/riscv64. Change-Id: I42c4e00486697acd4da7669d441b5690795f18ae Reviewed-on: https://go-review.googlesource.com/c/go/+/428499 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joedian Reid <joedian@golang.org>	2022-10-11 04:03:17 +00:00
Joel Sing	0ca355318f	cmd/compile: combine masking and zero extension on riscv64 Combine masking with a negative value and zero extension into a single AND operation. Change-Id: I0b2a735b696d65568839fc4504445eeac3d869a6 Reviewed-on: https://go-review.googlesource.com/c/go/+/428498 Reviewed-by: Joedian Reid <joedian@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au>	2022-10-11 04:02:34 +00:00
Joel Sing	7234c90352	cmd/compile: combine operations with immediate on riscv64 Replace two immediate operations with one, where possible. Change-Id: Idc00e868155c9ca1d872aaaf70ea1f73e9eac4d6 Reviewed-on: https://go-review.googlesource.com/c/go/+/428497 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-09-19 19:01:45 +00:00
Joel Sing	83d94daec2	cmd/compile: avoid the use of XOR for boolean equality on riscv64 The use of SEQZ/SNEZ and SUB allows for other optimisations to be utilised, particularly absorption into branch equality conditions. Change-Id: I74e7d6a07a8decc1bdb651660c322bcc6eb6a10a Reviewed-on: https://go-review.googlesource.com/c/go/+/428216 Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-09-19 19:01:06 +00:00
Joel Sing	a7bcc94719	cmd/compile: resolve known outcomes for SLTI/SLTIU on riscv64 When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the outcome based on the values of the immediates. Resolve these cases. Improves code generation for various shift operations. While here, sort tests by architecture to improve readability and ease future maintenance. Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d Reviewed-on: https://go-review.googlesource.com/c/go/+/428217 Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Jenny Rakoczy <jenny@golang.org> Reviewed-by: Jenny Rakoczy <jenny@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Auto-Submit: Jenny Rakoczy <jenny@golang.org>	2022-09-17 17:17:52 +00:00
Wayne Zuo	5760fde4df	cmd/compile: avoid sign extension after word arithmetic on riscv64 These instructions already do sign extension on output, so we can get rid of it. Note: (MOVWreg (MULW x y)) may araise from divisions by constant, generic rules replace them with multiply and may produce (Rsh32x64 (Mul32 _ _) _). Change-Id: I41bc9b519e38bc6027311de604dadb962cd0bbf4 Reviewed-on: https://go-review.googlesource.com/c/go/+/429757 Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Auto-Submit: Jenny Rakoczy <jenny@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Jenny Rakoczy <jenny@golang.org>	2022-09-15 21:04:37 +00:00
Joel Sing	77da976419	cmd/compile: remove redundant SEQZ/SNEZ on riscv64 In particular, (SEQZ (SNEZ x)) can arise from (Not (IsNonNil x)). Change-Id: Ie249cd1934d71087e0f774cf8f6c937ceeed7ad5 Reviewed-on: https://go-review.googlesource.com/c/go/+/428215 Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Michael Knyszek <mknyszek@google.com>	2022-09-07 05:39:23 +00:00
Joel Sing	b6a6847b2f	cmd/compile: avoid zero extension after properly typed atomic operation on riscv64 LoweredAtomicLoad8 is implemented using MOVBU, hence it is already zero extended. LoweredAtomicCas32 and LoweredAtomicCas64 return a properly typed boolean. Change-Id: Ie0acbaa19403d59c7e5f76d060cc13ee51eb7834 Reviewed-on: https://go-review.googlesource.com/c/go/+/428214 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Michael Knyszek <mknyszek@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au>	2022-09-07 05:38:50 +00:00
Joel Sing	c011270fa5	cmd/compile: improve Slicemask on riscv64 Implement Slicemask the same way every other architecture does - negate then arithmetic right shift. This sets or clears the sign bit, before extending it to the entire register. Removes around 2,500 instructions from the Go binary on linux/riscv64. Change-Id: I4d675b826e7eb23fe2b1e6e46b95dcd49ab49733 Reviewed-on: https://go-review.googlesource.com/c/go/+/426354 Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Michael Knyszek <mknyszek@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-09-07 05:37:53 +00:00
Joel Sing	3e11e61f3c	cmd/compile: optimise subtraction with const on riscv64 Convert subtraction from const to a negated ADDI with negative const value, where possible. At worst this avoids a register load and uses the same number of instructions. At best, this allows for further optimisation to occur, particularly where equality is involved. For example, this sequence: li t0,-1 sub t1,t0,a0 snez t1,t1 Becomes: addi t0,a0,1 snez t0,t0 Removes more than 2000 instructions from the Go binary on linux/riscv64. Change-Id: I68f3be897bc645d4a8fa3ab3cef165a00a74df19 Reviewed-on: https://go-review.googlesource.com/c/go/+/426263 Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au>	2022-09-02 20:14:40 +00:00
Joel Sing	646c3eee06	cmd/compile: negate comparision with FNES/FNED on riscv64 The FNES and FNED instructions are pseudo-instructions, which the assembler expands to FEQS/NEG or FEQD/NEG - if we're comparing the result via a branch instruction, we can avoid an instruction by negating both the branch comparision and the floating point comparision. This only removes a handful of instructions from the Go binary, however, it will provide benefit to floating point intensive code. Change-Id: I4e3124440b7659acc4d9bc9948b755a4900a422f Reviewed-on: https://go-review.googlesource.com/c/go/+/426261 Reviewed-by: Meng Zhuo <mzh@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Run-TryBot: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-09-02 20:14:16 +00:00
Wayne Zuo	da6556968f	cmd/compile: simplify bounded shift on riscv64 The prove pass will mark some shifts bounded, and then we can use that information to generate better code on riscv64. Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b Reviewed-on: https://go-review.googlesource.com/c/go/+/422616 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Keith Randall <khr@google.com> Auto-Submit: Keith Randall <khr@golang.org> Run-TryBot: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-31 20:21:00 +00:00
Joel Sing	971373f56a	cmd/compile: remove NEG when used with SEQZ/SNEZ on riscv64 The negation does not change the comparison to zero. Also remove unnecessary x.Uses == 1 condition from equivalent BEQZ/BNEZ rules. Change-Id: I62dd8e383e42bfe5c46d11bbf78d8e5ff862a1d5 Reviewed-on: https://go-review.googlesource.com/c/go/+/426262 Reviewed-by: Cherry Mui <cherryyz@google.com> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Heschi Kreinick <heschi@google.com>	2022-08-31 20:08:03 +00:00
Joel Sing	239115c3ef	cmd/compile: avoid extending floating point comparision on riscv64 The result of these operations are already extended. Change-Id: Ifc8ba362dda7035d8fd0d40046a96f61d3082877 Reviewed-on: https://go-review.googlesource.com/c/go/+/426260 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-31 20:05:46 +00:00
Joel Sing	9085ff5859	cmd/compile: avoid extending when already sufficiently masked on riscv64 Removes more than 2000 instructions from the Go binary on linux/risv64. Change-Id: I6db3e3b1c93f29f00869adcba7c6192bfb90b25c Reviewed-on: https://go-review.googlesource.com/c/go/+/426259 TryBot-Result: Gopher Robot <gobot@golang.org> Reviewed-by: Heschi Kreinick <heschi@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-31 20:05:06 +00:00
Wayne Zuo	a6219737e3	cmd/compile: intrinsify Sub64 on riscv64 After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.64ms ± 1% 1.60ms ± 1% -2.36% (p=0.008 n=5+5) ScalarBaseMult/P224 1.53ms ± 1% 1.47ms ± 2% -4.24% (p=0.008 n=5+5) ScalarBaseMult/P384 5.12ms ± 2% 5.03ms ± 2% ~ (p=0.095 n=5+5) ScalarBaseMult/P521 22.3ms ± 2% 13.8ms ± 1% -37.89% (p=0.008 n=5+5) ScalarMult/P256 4.49ms ± 2% 4.26ms ± 2% -5.13% (p=0.008 n=5+5) ScalarMult/P224 4.33ms ± 1% 4.09ms ± 1% -5.59% (p=0.008 n=5+5) ScalarMult/P384 16.3ms ± 1% 15.5ms ± 2% -4.78% (p=0.008 n=5+5) ScalarMult/P521 101ms ± 0% 47ms ± 2% -53.36% (p=0.008 n=5+5) Change-Id: I31cf0506e27f9d85f576af1813630a19c20dda8a Reviewed-on: https://go-review.googlesource.com/c/go/+/420095 Reviewed-by: Cherry Mui <cherryyz@google.com> Reviewed-by: Joel Sing <joel@sing.id.au> Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:59 +00:00
Wayne Zuo	969f48a3a2	cmd/compile: intrinsify Add64 on riscv64 According to RISCV instruction set manual v2.2 Sec 2.4, we can implement overflowing check for unsigned addition cheaply using SLTU instructions. After this CL, the performance difference in crypto/elliptic benchmarks on linux/riscv64 are: name old time/op new time/op delta ScalarBaseMult/P256 1.93ms ± 1% 1.64ms ± 1% -14.96% (p=0.008 n=5+5) ScalarBaseMult/P224 1.80ms ± 2% 1.53ms ± 1% -14.89% (p=0.008 n=5+5) ScalarBaseMult/P384 6.15ms ± 2% 5.12ms ± 2% -16.73% (p=0.008 n=5+5) ScalarBaseMult/P521 25.9ms ± 1% 22.3ms ± 2% -13.78% (p=0.008 n=5+5) ScalarMult/P256 5.59ms ± 1% 4.49ms ± 2% -19.79% (p=0.008 n=5+5) ScalarMult/P224 5.42ms ± 1% 4.33ms ± 1% -20.01% (p=0.008 n=5+5) ScalarMult/P384 19.9ms ± 2% 16.3ms ± 1% -18.15% (p=0.008 n=5+5) ScalarMult/P521 97.3ms ± 1% 100.7ms ± 0% +3.48% (p=0.008 n=5+5) Change-Id: Ic4c82ced4b072a4a6575343fa9f29dd09b0cabc4 Reviewed-on: https://go-review.googlesource.com/c/go/+/420094 Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-08-27 05:43:32 +00:00
Wayne Zuo	b60432df14	cmd/compile: deadcode for LoweredMuluhilo on riscv64 This is a follow up of CL 425101 on RISCV64. According to RISCV Volume 1, Unprivileged Spec v. 20191213 Chapter 7.1: If both the high and low bits of the same product are required, then the recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2 (source register specifiers must be in same order and rdh cannot be the same as rs1 or rs2). Microarchitectures can then fuse these into a single multiply operation instead of performing two separate multiplies. So we should not split Muluhilo to separate instructions. Updates #54607 Change-Id: If47461f3aaaf00e27cd583a9990e144fb8bcdb17 Reviewed-on: https://go-review.googlesource.com/c/go/+/425203 Auto-Submit: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org> Reviewed-by: Keith Randall <khr@google.com> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-08-24 18:08:33 +00:00
Joel Sing	95547aee8c	cmd/compile: cast riscv64 rewrite shifts to unsigned int This appeases Go 1.4, making it possible to bootstrap GOARCH=riscv64 with a Go 1.4 compiler. Fixes #52583 Change-Id: Ib13c2afeb095b2bb1464dcd7f1502574209bc7ab Reviewed-on: https://go-review.googlesource.com/c/go/+/409974 TryBot-Result: Gopher Robot <gobot@golang.org> Run-TryBot: Joel Sing <joel@sing.id.au> Reviewed-by: Bryan Mills <bcmills@google.com> Reviewed-by: Cherry Mui <cherryyz@google.com>	2022-06-06 19:03:15 +00:00
Cherry Mui	d6e6140c98	cmd/compile: fix boolean comparison on RISCV64 Following CL 405114, for RISCV64. May fix RISCV64 builds. Updates #52788. Change-Id: Ifc34658703d1e8b97665e7b862060152e3005d71 Reviewed-on: https://go-review.googlesource.com/c/go/+/405553 Reviewed-by: David Chase <drchase@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com>	2022-05-12 19:11:22 +00:00
Cherry Mui	1ed30ca537	cmd/compile: correct type of pointer difference on RISCV64 Pointer comparison is lowered to the following on RISCV64 (EqPtr x y) => (SEQZ (SUB <x.Type> x y)) The difference of two pointers (the SUB) should not be pointer type. Otherwise it can cause the GC to find a bad pointer. Should fix #51101. Change-Id: I7e73c2155c36ff403c032981a9aa9cccbfdf0f64 Reviewed-on: https://go-review.googlesource.com/c/go/+/385655 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> Reviewed-by: Keith Randall <khr@golang.org> TryBot-Result: Gopher Robot <gobot@golang.org>	2022-02-14 23:08:44 +00:00
Joel Sing	fe8347b61a	cmd/compile: optimise immediate operands with constants on riscv64 Instructions with immediates can be precomputed when operating on a constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI and ADDI. Additionally, optimise ANDI and ORI when the immediate is all ones or all zeroes. In particular, the RISCV64 logical left and right shift rules (Lshx/RshUx) produce sequences that check if the shift amount exceeds 64 and if so returns zero. When the shift amount is a constant we can precompute and eliminate the filter entirely. Likewise the arithmetic right shift rules produce sequences that check if the shift amount exceeds 64 and if so, ensures that the lower six bits of the shift are all ones. When the shift amount is a constant we can precompute the shift value. Arithmetic right shift sequences like: 117fc: 00100513 li a0,1 11800: 04053593 sltiu a1,a0,64 11804: fff58593 addi a1,a1,-1 11808: 0015e593 ori a1,a1,1 1180c: 40b45433 sra s0,s0,a1 Are now a single srai instruction: 117fc: 40145413 srai s0,s0,0x1 Likewise for logical left shift (and logical right shift): 1d560: 01100413 li s0,17 1d564: 04043413 sltiu s0,s0,64 1d568: 40800433 neg s0,s0 1d56c: 01131493 slli s1,t1,0x11 1d570: 0084f433 and s0,s1,s0 Which are now a single slli (or srli) instruction: 1d120: 01131413 slli s0,t1,0x11 This removes more than 30,000 instructions from the Go binary and should improve performance in a variety of areas - of note runtime.makemap_small drops from 48 to 36 instructions. Similar gains exist in at least other parts of runtime and math/bits. Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6 Reviewed-on: https://go-review.googlesource.com/c/go/+/350689 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Michael Munday <mike.munday@lowrisc.org> Reviewed-by: Cherry Mui <cherryyz@google.com>	2021-09-24 10:51:48 +00:00
Cherry Mui	c10b980220	cmd/compile: restore tail call for method wrappers For certain type of method wrappers we used to generate a tail call. That was disabled in CL 307234 when register ABI is used, because with the current IR it was difficult to generate a tail call with the arguments in the right places. The problem was that the IR does not contain a CALL-like node with arguments; instead, it contains an OAS node that adjusts the receiver, than an OTAILCALL node that just contains the target, but no argument (with the assumption that the OAS node will put the adjusted receiver in the right place). With register ABI, putting arguments in registers are done in SSA. The assignment (OAS) doesn't put the receiver in register. This CL changes the IR of a tail call to take an actual OCALL node. Specifically, a tail call is represented as OTAILCALL (OCALL target args...) This way, the call target and args are connected through the OCALL node. So the call can be analyzed in SSA and the args can be passed in the right places. (Alternatively, we could have OTAILCALL node directly take the target and the args, without the OCALL node. Using an OCALL node is convenient as there are existing code that processes OCALL nodes which do not need to be changed. Also, a tail call is similar to ORETURN (OCALL target args...), except it doesn't preserve the frame. I did the former but I'm open to change.) The SSA representation is similar. Previously, the IR lowers to a Store the receiver then a BlockRetJmp which jumps to the target (without putting the arg in register). Now we use a TailCall op, which takes the target and the args. The call expansion pass and the register allocator handles TailCall pretty much like a StaticCall, and it will do the right ABI analysis and put the args in the right places. (Args other than the receiver are already in the right places. For register args it generates no code for them. For stack args currently it generates a self copy. I'll work on optimize that out.) BlockRetJmp is still used, signaling it is a tail call. The actual call is made in the TailCall op so BlockRetJmp generates no code (we could use BlockExit if we like). This slightly reduces binary size: old new cmd/go 14003088 13953936 cmd/link 6275552 6271456 Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0 Reviewed-on: https://go-review.googlesource.com/c/go/+/350145 Trust: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Matthew Dempsky <mdempsky@google.com> Reviewed-by: David Chase <drchase@google.com>	2021-09-17 22:59:44 +00:00
Michael Munday	c69f5c0d76	cmd/compile: add support for Abs and Copysign intrinsics on riscv64 Also, add the FABSS and FABSD pseudo instructions to the assembler. The compiler could use FSGNJX[SD] directly but there doesn't seem to be much advantage to doing so and the pseudo instructions are easier to understand. Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d Reviewed-on: https://go-review.googlesource.com/c/go/+/348989 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-09-10 10:45:59 +00:00
Michael Munday	ea51e223c2	cmd/{asm,compile}: add fused multiply-add support on riscv64 Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions. Argument order is: OP RS1, RS2, RS3, RD Also, add support for the FMA intrinsic to the compiler. Automatic FMA matching is left to a future CL. Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3 Reviewed-on: https://go-review.googlesource.com/c/go/+/293030 Trust: Michael Munday <mike.munday@lowrisc.org> Run-TryBot: Michael Munday <mike.munday@lowrisc.org> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-09-01 21:17:04 +00:00
Joel Sing	8fff20ffeb	cmd/compile: absorb NEG into branch when possible on riscv64 We can end up with this situation due to our equality tests being based on 'SEQZ (SUB x y)' - if x is a zero valued constant, 'SUB x y' can be converted to 'NEG x'. When used with a branch the SEQZ can be absorbed, leading to 'BNEZ (NEG x)' where the NEG is redundant. Removes around 1700 instructions from the go binary on riscv64. Change-Id: I947a080d8bf7d2d6378ab114172e2342ce2c51db Reviewed-on: https://go-review.googlesource.com/c/go/+/342850 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-08-21 11:23:14 +00:00
Joel Sing	bcd146d398	cmd/compile: convert branch with zero to more optimal branch zero on riscv64 Convert BLT and BGE with a zero valued constant to BGTZ/BLTZ/BLEZ/BGEZ as appropriate. Removes over 4,500 instructions from the go binary on riscv64. Change-Id: Icc266e968b126ba04863ec88529630a9dd44498b Reviewed-on: https://go-review.googlesource.com/c/go/+/342849 Trust: Joel Sing <joel@sing.id.au> Reviewed-by: Cherry Mui <cherryyz@google.com> Run-TryBot: Cherry Mui <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org>	2021-08-21 11:22:07 +00:00
Meng Zhuo	1951afc919	cmd/compile: lowered MulUintptr on riscv64 According to RISCV instruction set manual v2.2 Sec 6.1 MULHU followed by MUL will be fused into one multiply by microarchitecture name old time/op new time/op delta MulUintptr/small 11.2ns ±24% 9.2ns ± 0% -17.54% (p=0.000 n=10+9) MulUintptr/large 15.9ns ± 0% 10.9ns ± 0% -31.55% (p=0.000 n=8+8) Change-Id: I3d152218f83948cbc5c576bda29dc86e9b4206ee Reviewed-on: https://go-review.googlesource.com/c/go/+/338753 Trust: Meng Zhuo <mzh@golangcn.org> Reviewed-by: Joel Sing <joel@sing.id.au>	2021-08-17 01:29:37 +00:00
Meng Zhuo	efd206eb40	cmd/compile: intrinsify Mul64 on riscv64 According to RISCV instruction set manual v2.2 Sec 6.1 MULHU followed by MUL will be fused into one multiply by microarchitecture Benchstat on Hifive unmatched: name old time/op new time/op delta Hash8Bytes 245ns ± 3% 186ns ± 4% -23.99% (p=0.000 n=10+10) Hash320Bytes 1.94µs ± 1% 1.31µs ± 1% -32.38% (p=0.000 n=9+10) Hash1K 5.84µs ± 0% 3.84µs ± 0% -34.20% (p=0.000 n=10+9) Hash8K 45.3µs ± 0% 29.4µs ± 0% -35.04% (p=0.000 n=10+10) name old speed new speed delta Hash8Bytes 32.7MB/s ± 3% 43.0MB/s ± 4% +31.61% (p=0.000 n=10+10) Hash320Bytes 165MB/s ± 1% 244MB/s ± 1% +47.88% (p=0.000 n=9+10) Hash1K 175MB/s ± 0% 266MB/s ± 0% +51.98% (p=0.000 n=10+9) Hash8K 181MB/s ± 0% 279MB/s ± 0% +53.94% (p=0.000 n=10+10) Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12 Reviewed-on: https://go-review.googlesource.com/c/go/+/329970 Reviewed-by: Joel Sing <joel@sing.id.au> Run-TryBot: Joel Sing <joel@sing.id.au> TryBot-Result: Go Bot <gobot@golang.org> Trust: Meng Zhuo <mzh@golangcn.org>	2021-08-16 13:50:11 +00:00
Cherry Zhang	4a7effa418	cmd/compile: mark R12 clobbered for special calls In external linking mode the external linker may insert trampolines, which use R12 as a scratch register. So a call could potentially clobber R12 if the target is laid out too far. Mark R12 clobbered. Also, we will use R12 for trampolines in the Go linker as well. CL 310731 updated the generated rewrite files so imports are grouped, but the generator was not updated to do so. Grouped imports are nice. But as those are generated files, for simplicity and my laziness, just regenerate with the current generator (which makes imports not grouped). Change-Id: Iddb741ff7314a291ade5fbffc7d315f555808409 Reviewed-on: https://go-review.googlesource.com/c/go/+/314453 Trust: Cherry Zhang <cherryyz@google.com> Run-TryBot: Cherry Zhang <cherryyz@google.com> TryBot-Result: Go Bot <gobot@golang.org> Reviewed-by: Than McIntosh <thanm@google.com>	2021-04-28 14:01:59 +00:00
Russ Cox	95ed5c3800	internal/buildcfg: move build configuration out of cmd/internal/objabi The go/build package needs access to this configuration, so move it into a new package available to the standard library. Change-Id: I868a94148b52350c76116451f4ad9191246adcff Reviewed-on: https://go-review.googlesource.com/c/go/+/310731 Trust: Russ Cox <rsc@golang.org> Run-TryBot: Russ Cox <rsc@golang.org> Reviewed-by: Austin Clements <austin@google.com> Reviewed-by: Jay Conrod <jayconrod@google.com>	2021-04-16 19:20:53 +00:00

1 2

98 commits