Commit graph

98 commits

Author SHA1 Message Date
Meng Zhuo
09ed9a6585 cmd/compile: implement float min/max in hardware for riscv64
CL 514596 adds float min/max for amd64, this CL adds it for riscv64.

The behavior of the RISC-V FMIN/FMAX instructions almost match Go's
requirements.

However according to RISCV spec 8.3 "NaN Generation and Propagation"
>> if at least one input is a signaling NaN, or if both inputs are quiet
>> NaNs, the result is the canonical NaN. If one operand is a quiet NaN
>> and the other is not a NaN, the result is the non-NaN operand.

Go using quiet NaN as NaN and according to Go spec
>> if any argument is a NaN, the result is a NaN

This requires the float min/max implementation to check whether one
of operand is qNaN before float mix/max actually execute.

This CL also fix a typo in minmax test.

Benchmark on Visionfive2
goos: linux
goarch: riscv64
pkg: runtime
         │ float_minmax.old.bench │       float_minmax.new.bench        │
         │         sec/op         │   sec/op     vs base                │
MinFloat             158.20n ± 0%   28.13n ± 0%  -82.22% (p=0.000 n=10)
MaxFloat             158.10n ± 0%   28.12n ± 0%  -82.21% (p=0.000 n=10)
geomean               158.1n        28.12n       -82.22%

Update #59488

Change-Id: Iab48be6d32b8882044fb8c821438ca8840e5493d
Reviewed-on: https://go-review.googlesource.com/c/go/+/514775
Reviewed-by: Mauri de Souza Meneguzzo <mauri870@gmail.com>
Run-TryBot: M Zhuo <mengzhuo1203@gmail.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2024-01-26 01:41:50 +00:00
Joel Sing
70c7fb75e9 cmd/compile: correct code generation for right shifts on riscv64
The code generation on riscv64 will currently result in incorrect
assembly when a 32 bit integer is right shifted by an amount that
exceeds the size of the type. In particular, this occurs when an
int32 or uint32 is cast to a 64 bit type and right shifted by a
value larger than 31.

Fix this by moving the SRAW/SRLW conversion into the right shift
rules and removing the SignExt32to64/ZeroExt32to64. Add additional
rules that rewrite to SRAIW/SRLIW when the shift is less than the
size of the type, or replace/eliminate the shift when it exceeds
the size of the type.

Add SSA tests that would have caught this issue. Also add additional
codegen tests to ensure that the resulting assembly is what we
expect in these overflow cases.

Fixes #64285

Change-Id: Ie97b05668597cfcb91413afefaab18ee1aa145ec
Reviewed-on: https://go-review.googlesource.com/c/go/+/545035
Reviewed-by: Russ Cox <rsc@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Mark Ryan <markdryan@rivosinc.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-12-01 19:30:59 +00:00
Ubuntu
8fc043ccfa cmd/compile: optimize right shifts of int32 on riscv64
The compiler is currently sign extending 32 bit signed integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit signed values (sraw and sraiw) which sign extend
the result of the shift to 64 bits.  Change the compiler so that
it uses sraw and sraiw for shifts of signed 32 bit integers reducing
in most cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

int32(a) >> 2

  before:

    sll     x5,x10,0x20
    sra     x10,x5,0x22

  after:

    sraw    x10,x10,0x2

int32(v) >> int(s)

  before:

    sext.w  x5,x10
    sltiu   x6,x11,64
    add     x6,x6,-1
    or      x6,x11,x6
    sra     x10,x5,x6

  after:

    sltiu   x5,x11,32
    add     x5,x5,-1
    or      x5,x11,x5
    sraw    x10,x10,x5

int32(v) >> (int(s) & 31)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

after:

    and     x5,x11,31
    sraw    x10,x10,x5

int32(100) >> int(a)

  before:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,64
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sra     x10,x6,x5

  after:

    bltz    x10,<target address calls runtime.panicshift>
    sltiu   x5,x10,32
    add     x5,x5,-1
    or      x5,x10,x5
    li      x6,100
    sraw    x10,x6,x5

int32(v) >> (int(s) & 63)

  before:

    sext.w  x5,x10
    and     x6,x11,63
    sra     x10,x5,x6

  after:

    and     x5,x11,63
    sltiu   x6,x5,32
    add     x6,x6,-1
    or      x5,x5,x6
    sraw    x10,x10,x5

In most cases we eliminate one instruction.  In the case where
we shift a int32 constant by a variable the number of instructions
generated is identical.  A sra is simply replaced by a sraw.  In the
unusual case where we shift right by a variable anded with a constant
> 31 but < 64, we generate two additional instructions.  As this is
an unusual case we do not try to optimize for it.

Some improvements can be seen in some of the existing benchmarks,
notably in the utf8 package which performs right shifts of runes
which are signed 32 bit integers.

                      |  utf8-old   |              utf8-new            |
                      |   sec/op    |   sec/op     vs base             |
EncodeASCIIRune-4       17.68n ± 0%   17.67n ± 0%       ~ (p=0.312 n=10)
EncodeJapaneseRune-4    35.34n ± 0%   34.53n ± 1%  -2.31% (p=0.000 n=10)
AppendASCIIRune-4       3.213n ± 0%   3.213n ± 0%       ~ (p=0.318 n=10)
AppendJapaneseRune-4    36.14n ± 0%   35.35n ± 0%  -2.19% (p=0.000 n=10)
DecodeASCIIRune-4       28.11n ± 0%   27.36n ± 0%  -2.69% (p=0.000 n=10)
DecodeJapaneseRune-4    38.55n ± 0%   38.58n ± 0%       ~ (p=0.612 n=10)

Change-Id: I60a91cbede9ce65597571c7b7dd9943eeb8d3cc2
Reviewed-on: https://go-review.googlesource.com/c/go/+/535115
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
2023-10-30 14:47:06 +00:00
Mark Ryan
fce6be15cc cmd/compile: regenerate rewriteRISCV64.go to match cl 528975
The final revision of

https://go-review.googlesource.com/c/go/+/528975

made a small change to the RISCV64.rules file but neglected to update
the regenerated rewriteRISCV64.go file.

Change-Id: I04599f4e3b0dac7102c54166c9bae6fc9b6621d1
Reviewed-on: https://go-review.googlesource.com/c/go/+/533815
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
LUCI-TryBot-Result: Go LUCI <golang-scoped@luci-project-accounts.iam.gserviceaccount.com>
2023-10-09 22:19:13 +00:00
Joel Sing
f711892a8a cmd/compile/internal: stop lowering OpConvert on riscv64
Lowering for OpConvert was removed for all architectures in CL#108496,
prior to the riscv64 port being upstreamed. Remove lowering of OpConvert
on riscv64, which brings it inline with all other architectures. This
results in 1,600+ instructions being removed from the riscv64 go binary.

Change-Id: Iaaf1f8b397875926604048b66ad8ac91a98c871e
Reviewed-on: https://go-review.googlesource.com/c/go/+/533335
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Pratt <mpratt@google.com>
2023-10-07 12:31:59 +00:00
Mark Ryan
561bf0457f cmd/compile: optimize right shifts of uint32 on riscv
The compiler is currently zero extending 32 bit unsigned integers to
64 bits before right shifting them using a 64 bit shift instruction.
There's no need to do this as RISC-V has instructions for right
shifting 32 bit unsigned values (srlw and srliw) which zero extend
the result of the shift to 64 bits.  Change the compiler so that
it uses srlw and srliw for 32 bit unsigned shifts reducing in most
cases the number of instructions needed to perform the shift.

Here are some examples of code sequences that are changed by this
patch:

uint32(a) >> 2

  before:

    sll     x5,x10,0x20
    srl     x10,x5,0x22

  after:

    srlw    x10,x10,0x2

uint32(a) >> int(b)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    srl     x5,x5,x11
    sltiu   x6,x11,64
    neg     x6,x6
    and     x10,x5,x6

  after:

    srlw    x5,x10,x11
    sltiu   x6,x11,32
    neg     x6,x6
    and     x10,x5,x6

bits.RotateLeft32(uint32(a), 1)

  before:

    sll     x5,x10,0x1
    sll     x6,x10,0x20
    srl     x7,x6,0x3f
    or      x5,x5,x7

  after:

   sll     x5,x10,0x1
   srlw    x6,x10,0x1f
   or      x10,x5,x6

bits.RotateLeft32(uint32(a), int(b))

  before:
    and     x6,x11,31
    sll     x7,x10,x6
    sll     x8,x10,0x20
    srl     x8,x8,0x20
    add     x6,x6,-32
    neg     x6,x6
    srl     x9,x8,x6
    sltiu   x6,x6,64
    neg     x6,x6
    and     x6,x9,x6
    or      x6,x6,x7

  after:

    and     x5,x11,31
    sll     x6,x10,x5
    add     x5,x5,-32
    neg     x5,x5
    srlw    x7,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x5,x7,x5
    or      x10,x6,x5

The one regression observed is the following case, an unbounded right
shift of a uint32 where the value we're shifting by is known to be
< 64 but > 31.  As this is an unusual case this commit does not
optimize for it, although the existing code does.

uint32(a) >> (b & 63)

  before:

    sll     x5,x10,0x20
    srl     x5,x5,0x20
    and     x6,x11,63
    srl     x10,x5,x6

  after

    and     x5,x11,63
    srlw    x6,x10,x5
    sltiu   x5,x5,32
    neg     x5,x5
    and     x10,x6,x5

Here we have one extra instruction.

Some benchmark highlights, generated on a VisionFive2 8GB running
Ubuntu 23.04.

pkg: math/bits
LeadingZeros32-4    18.64n ± 0%     17.32n ± 0%   -7.11% (p=0.000 n=10)
LeadingZeros64-4    15.47n ± 0%     15.51n ± 0%   +0.26% (p=0.027 n=10)
TrailingZeros16-4   18.48n ± 0%     17.68n ± 0%   -4.33% (p=0.000 n=10)
TrailingZeros32-4   16.87n ± 0%     16.07n ± 0%   -4.74% (p=0.000 n=10)
TrailingZeros64-4   15.26n ± 0%     15.27n ± 0%   +0.07% (p=0.043 n=10)
OnesCount32-4       20.08n ± 0%     19.29n ± 0%   -3.96% (p=0.000 n=10)
RotateLeft-4        8.864n ± 0%     8.838n ± 0%   -0.30% (p=0.006 n=10)
RotateLeft32-4      8.837n ± 0%     8.032n ± 0%   -9.11% (p=0.000 n=10)
Reverse32-4         29.77n ± 0%     26.52n ± 0%  -10.93% (p=0.000 n=10)
ReverseBytes32-4    9.640n ± 0%     8.838n ± 0%   -8.32% (p=0.000 n=10)
Sub32-4             8.835n ± 0%     8.035n ± 0%   -9.06% (p=0.000 n=10)
geomean             11.50n          11.33n        -1.45%

pkg: crypto/md5
Hash8Bytes-4             1.486µ ± 0%   1.426µ ± 0%  -4.04% (p=0.000 n=10)
Hash64-4                 2.079µ ± 0%   1.968µ ± 0%  -5.36% (p=0.000 n=10)
Hash128-4                2.720µ ± 0%   2.557µ ± 0%  -5.99% (p=0.000 n=10)
Hash256-4                3.996µ ± 0%   3.733µ ± 0%  -6.58% (p=0.000 n=10)
Hash512-4                6.541µ ± 0%   6.072µ ± 0%  -7.18% (p=0.000 n=10)
Hash1K-4                 11.64µ ± 0%   10.75µ ± 0%  -7.58% (p=0.000 n=10)
Hash8K-4                 82.95µ ± 0%   76.32µ ± 0%  -7.99% (p=0.000 n=10)
Hash1M-4                10.436m ± 0%   9.591m ± 0%  -8.10% (p=0.000 n=10)
Hash8M-4                 83.50m ± 0%   76.73m ± 0%  -8.10% (p=0.000 n=10)
Hash8BytesUnaligned-4    1.494µ ± 0%   1.434µ ± 0%  -4.02% (p=0.000 n=10)
Hash1KUnaligned-4        11.64µ ± 0%   10.76µ ± 0%  -7.52% (p=0.000 n=10)
Hash8KUnaligned-4        83.01µ ± 0%   76.32µ ± 0%  -8.07% (p=0.000 n=10)
geomean                  28.32µ        26.42µ       -6.72%

Change-Id: I20483a6668cca1b53fe83944bee3706aadcf8693
Reviewed-on: https://go-review.googlesource.com/c/go/+/528975
Reviewed-by: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-10-07 12:31:38 +00:00
Xianmiao Qu
d98f74b31e cmd/compile/internal: intrinsify publicationBarrier on riscv64
This enables publicationBarrier to be used as an intrinsic
on riscv64, optimizing the required function call and return
instructions for invoking the "runtime.publicationBarrier"
function.

This function is called by mallocgc. The benchmark results for malloc tested on Lichee-Pi-4A(TH1520, RISC-V 2.0G C910 x4) are as follows.

goos: linux
goarch: riscv64
pkg: runtime
                    │   old.txt   │              new.txt               │
                    │   sec/op    │   sec/op     vs base               │
Malloc8-4             92.78n ± 1%   90.77n ± 1%  -2.17% (p=0.001 n=10)
Malloc16-4            156.5n ± 1%   151.7n ± 2%  -3.10% (p=0.000 n=10)
MallocTypeInfo8-4     131.7n ± 1%   130.6n ± 2%       ~ (p=0.165 n=10)
MallocTypeInfo16-4    186.5n ± 2%   186.2n ± 1%       ~ (p=0.956 n=10)
MallocLargeStruct-4   1.345µ ± 1%   1.355µ ± 1%       ~ (p=0.093 n=10)
geomean               216.9n        214.5n       -1.10%


Change-Id: Ieab6c02309614bac5c1b12b5ee3311f988ff644d
Reviewed-on: https://go-review.googlesource.com/c/go/+/531719
Reviewed-by: Michael Pratt <mpratt@google.com>
Auto-Submit: Michael Pratt <mpratt@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2023-10-03 19:29:38 +00:00
Meng Zhuo
63ab68ddc5 cmd/compile: add single-precision FMA code generation for riscv64
This CL adds FMADDS,FMSUBS,FNMADDS,FNMSUBS SSA support for riscv

Change-Id: I1e7dd322b46b9e0f4923dbba256303d69ed12066
Reviewed-on: https://go-review.googlesource.com/c/go/+/506616
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: M Zhuo <mzh@golangcn.org>
2023-08-22 12:05:36 +00:00
Meng Zhuo
05f9511582 cmd/compile: improve FP FMA performance on riscv64
FMADD/FMSUB/FNSUB are an efficient FP FMA instructions, which can
be used by the compiler to improve FP performance.

Erf               188.0n ± 2%   139.5n ± 2%  -25.82% (p=0.000 n=10)
Erfc              193.6n ± 1%   143.2n ± 1%  -26.01% (p=0.000 n=10)
Erfinv            244.4n ± 2%   172.6n ± 0%  -29.40% (p=0.000 n=10)
Erfcinv           244.7n ± 2%   173.0n ± 1%  -29.31% (p=0.000 n=10)
geomean           216.0n        156.3n       -27.65%

Ref: The RISC-V Instruction Set Manual Volume I: Unprivileged ISA
11.6 Single-Precision Floating-Point Computational Instructions

Change-Id: I89aa3a4df7576fdd47f4a6ee608ac16feafd093c
Reviewed-on: https://go-review.googlesource.com/c/go/+/506036
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-08-22 08:38:08 +00:00
Joel Sing
33da4ce457 cmd/compile: sign or zero extend for 32 bit equality on riscv64
For 32 bit equality (Eq32), rather than always zero extending to 64 bits,
sign extend for signed types and zero extend for unsigned types. This makes
no difference to the equality test (via SUB), however it increases the
likelihood of avoiding unnecessary sign or zero extension simply for the
purpose of equality testing.

While here, replace the Neq* rules with (Not (Eq*)) - this makes no
difference to the generated code (as the intermediates get expanded and
eliminated), however it means that changes to the equality rules also
reflect in the inequality rules.

As an example, the following:

   lw      t0,956(t0)
   slli    t0,t0,0x20
   srli    t0,t0,0x20
   li      t1,1
   bne     t1,t0,278fc

Becomes:

   lw      t0,1024(t0)
   li      t1,1
   bne     t1,t0,278b0

Removes almost 1000 instructions from the Go binary on riscv64.

Change-Id: Iac60635f494f6db87faa47752bd1cc16e6b5967f
Reviewed-on: https://go-review.googlesource.com/c/go/+/516595
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: M Zhuo <mzh@golangcn.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2023-08-15 03:29:11 +00:00
Meng Zhuo
3fce111535 cmd/compile: fix FMA negative commutativity of riscv64
According to RISCV manual 11.6:

FMADD x,y,z computes x*y+z and
FNMADD x,y,z => -x*y-z
FMSUB x,y,z => x*y-z
FNMSUB x,y,z => -x*y+z respectively

However our implement of SSA convert FMADD -x,y,z to FNMADD x,y,z which
is wrong and should be convert to FNMSUB according to manual.

Change-Id: Ib297bc83824e121fd7dda171ed56ea9694a4e575
Reviewed-on: https://go-review.googlesource.com/c/go/+/506575
Run-TryBot: M Zhuo <mzh@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2023-07-05 22:05:44 +00:00
Keith Randall
a3f3868c7a cmd/compile: replace isSigned(t) with t.IsSigned()
No change in semantics, just removing an unneeded helper.

Also align rules a bit.

Change-Id: Ie4dabb99392315a7700c645b3d0931eb8766a5fa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483439
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-04-10 17:07:24 +00:00
Keith Randall
60140a86b3 cmd/compile: clean up store rules to use store type, not argument type
Argument type is dangerous because it may be thinner than the actual
store being issued.

Change-Id: Id19fbd8e6c41390a453994f897dd5048473136aa
Reviewed-on: https://go-review.googlesource.com/c/go/+/483438
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: David Chase <drchase@google.com>
2023-04-10 17:06:55 +00:00
Keith Randall
21f434058c cmd/compile: ensure constant folding of pointer arithmetic remains a pointer
For c + nil, we want the result to still be of pointer type.

Fixes ppc64le build failure with CL 468455, in issue33724.go.

The problem in that test is that it requires a nil check to be
scheduled before the corresponding load. This normally happens fine
because we prioritize nil checks. If we have nilcheck(p) and load(p),
once p is scheduled the nil check will always go before the load.

The issue we saw in 33724 is that when p is a nil pointer, we ended up
with two different p's, an int64(0) as the argument to the nil check
and an (*Outer)(0) as the argument to the load. Those two zeroes don't
get CSEd, so if the (*Outer)(0) happens to get scheduled first, the
load can end up before the nilcheck.

Fix this by always having constant arithmetic preserve the pointerness
of the value, so that both zeroes are of type *Outer and get CSEd.

Update #58482
Update #33724

Change-Id: Ib9b8c0446f1690b574e0f3c0afb9934efbaf3513
Reviewed-on: https://go-review.googlesource.com/c/go/+/468615
Reviewed-by: Keith Randall <khr@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Bypass: Keith Randall <khr@golang.org>
2023-02-17 03:56:57 +00:00
Keith Randall
f959fb3872 cmd/compile: add anchored version of SP
The SPanchored opcode is identical to SP, except that it takes a memory
argument so that it (and more importantly, anything that uses it)
must be scheduled at or after that memory argument.

This opcode ensures that a LEAQ of a variable gets scheduled after the
corresponding VARDEF for that variable.

This may lead to less CSE of LEAQ operations. The effect is very small.
The go binary is only 80 bytes bigger after this CL. Usually LEAQs get
folded into load/store operations, so the effect is only for pointerful
types, large enough to need a duffzero, and have their address passed
somewhere. Even then, usually the CSEd LEAQs will be un-CSEd because
the two uses are on different sides of a function call and the LEAQ
ends up being rematerialized at the second use anyway.

Change-Id: Ib893562cd05369b91dd563b48fb83f5250950293
Reviewed-on: https://go-review.googlesource.com/c/go/+/452916
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Martin Möhrmann <moehrmann@google.com>
Reviewed-by: Martin Möhrmann <martin@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
2023-01-19 22:43:12 +00:00
Dmitri Shuralyov
47a0d46716 cmd/compile/internal/ssa: generate code via a //go:generate directive
The standard way to generate code in a Go package is via //go:generate
directives, which are invoked by the developer explicitly running:

	go generate import/path/of/said/package

Switch to using that approach here.

This way, developers don't need to learn and remember a custom way that
each particular Go package may choose to implement its code generation.
It also enables conveniences such as 'go generate -n' to discover how
code is generated without running anything (this works on all packages
that rely on //go:generate directives), being able to generate multiple
packages at once and from any directory, and so on.

Change-Id: I0e5b6a1edeff670a8e588befeef0c445613803c7
Reviewed-on: https://go-review.googlesource.com/c/go/+/460135
Reviewed-by: Dmitri Shuralyov <dmitshur@google.com>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Dmitri Shuralyov <dmitshur@golang.org>
Run-TryBot: Dmitri Shuralyov <dmitshur@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2023-01-19 22:42:34 +00:00
Guoqi Chen
0b2ad1d815 cmd/compile: sign-extend the 2nd argument of the LoweredAtomicCas32 on loong64,mips64x,riscv64
The function LoweredAtomicCas32 is implemented using the LL-SC instruction pair
on loong64, mips64x, riscv64. However,the LL instruction on loong64, mips64x,
riscv64 is sign-extended, so it is necessary to sign-extend the 2nd parameter
"old" of the LoweredAtomicCas32, so that the instruction BNE after LL can get
the desired result.

The function prototype of LoweredAtomicCas32 in golang:
    func Cas32(ptr *uint32, old, new uint32) bool

When using an intrinsify implementation:
    case 1: (*ptr) <= 0x80000000 && old < 0x80000000
        E.g: (*ptr) = 0x7FFFFFFF, old = Rarg1= 0x7FFFFFFF

        After run the instruction "LL (Rarg0), Rtmp": Rtmp = 0x7FFFFFFF
        Rtmp ! = Rarg1(old) is false, the result we expect

    case 2: (*ptr) >= 0x80000000 && old >= 0x80000000
        E.g: (*ptr) = 0x80000000, old = Rarg1= 0x80000000

        After run the instruction "LL (Rarg0), Rtmp": Rtmp = 0xFFFFFFFF_80000000
        Rtmp ! = Rarg1(old) is true, which we do not expect

When using an non-intrinsify implementation:
    Because Rarg1 is loaded from the stack using sign-extended instructions
    ld.w, the situation described in Case 2 above does not occur

Benchmarks on linux/loong64:
name     old time/op  new time/op  delta
Cas      50.0ns ± 0%  50.1ns ± 0%   ~     (p=1.000 n=1+1)
Cas64    50.0ns ± 0%  50.1ns ± 0%   ~     (p=1.000 n=1+1)
Cas-4    56.0ns ± 0%  56.0ns ± 0%   ~     (p=1.000 n=1+1)
Cas64-4  56.0ns ± 0%  56.0ns ± 0%   ~     (p=1.000 n=1+1)

Benchmarks on Loongson 3A4000 (GOARCH=mips64le, 1.8GHz)
name     old time/op  new time/op  delta
Cas      70.4ns ± 0%  70.3ns ± 0%   ~     (p=1.000 n=1+1)
Cas64    70.7ns ± 0%  70.6ns ± 0%   ~     (p=1.000 n=1+1)
Cas-4    81.1ns ± 0%  80.8ns ± 0%   ~     (p=1.000 n=1+1)
Cas64-4  80.9ns ± 0%  80.9ns ± 0%   ~     (p=1.000 n=1+1)

Fixes #57282

Change-Id: I190a7fc648023b15fa392f7fdda5ac18c1561bac
Reviewed-on: https://go-review.googlesource.com/c/go/+/457135
Run-TryBot: Than McIntosh <thanm@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Reviewed-by: David Chase <drchase@google.com>
2022-12-17 01:12:22 +00:00
Johan Brandhorst-Satzkorn
85196fc982 cmd/internal/ssa: correct references to _gen folder
The gen folder was renamed to _gen in CL 435472, but references in code
and docs were not updated. This updates the references.

Change-Id: Ibadc0cdcb5bed145c3257b58465a8df370487ae5
Reviewed-on: https://go-review.googlesource.com/c/go/+/444355
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Johan Brandhorst-Satzkorn <johan.brandhorst@gmail.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-10-23 17:42:11 +00:00
Joel Sing
4274ffd4b8 cmd/compile: fold negation into subtraction on riscv64
Fold negation into subtraction and avoid double negation.

This removes around 500 instructions from the Go binary on riscv64.

Change-Id: I4aac6c87baa2a0759b180ba87876d488a23df6d7
Reviewed-on: https://go-review.googlesource.com/c/go/+/431105
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-10-11 04:04:13 +00:00
Joel Sing
ba8c94b5f2 cmd/compile: convert SLT/SLTU with constant into immediate form on riscv64
Convert SLT/SLTU with a suitably valued constant into a SLTI/SLTIU instruction.
This can reduce instructions and avoid register loads. Now that we generate
more SLTI/SLTIU instructions, absorb these into branches when it makes sense
to do so.

Removes more than 800 instructions from the Go binary on linux/riscv64.

Change-Id: I42c4e00486697acd4da7669d441b5690795f18ae
Reviewed-on: https://go-review.googlesource.com/c/go/+/428499
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joedian Reid <joedian@golang.org>
2022-10-11 04:03:17 +00:00
Joel Sing
0ca355318f cmd/compile: combine masking and zero extension on riscv64
Combine masking with a negative value and zero extension into a single
AND operation.

Change-Id: I0b2a735b696d65568839fc4504445eeac3d869a6
Reviewed-on: https://go-review.googlesource.com/c/go/+/428498
Reviewed-by: Joedian Reid <joedian@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
2022-10-11 04:02:34 +00:00
Joel Sing
7234c90352 cmd/compile: combine operations with immediate on riscv64
Replace two immediate operations with one, where possible.

Change-Id: Idc00e868155c9ca1d872aaaf70ea1f73e9eac4d6
Reviewed-on: https://go-review.googlesource.com/c/go/+/428497
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-19 19:01:45 +00:00
Joel Sing
83d94daec2 cmd/compile: avoid the use of XOR for boolean equality on riscv64
The use of SEQZ/SNEZ and SUB allows for other optimisations to be utilised,
particularly absorption into branch equality conditions.

Change-Id: I74e7d6a07a8decc1bdb651660c322bcc6eb6a10a
Reviewed-on: https://go-review.googlesource.com/c/go/+/428216
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-09-19 19:01:06 +00:00
Joel Sing
a7bcc94719 cmd/compile: resolve known outcomes for SLTI/SLTIU on riscv64
When SLTI/SLTIU is used with ANDI/ORI, it may be possible to determine the
outcome based on the values of the immediates. Resolve these cases.

Improves code generation for various shift operations.

While here, sort tests by architecture to improve readability and ease
future maintenance.

Change-Id: I87e71e016a0e396a928e7d6389a2df61583dfd8d
Reviewed-on: https://go-review.googlesource.com/c/go/+/428217
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
2022-09-17 17:17:52 +00:00
Wayne Zuo
5760fde4df cmd/compile: avoid sign extension after word arithmetic on riscv64
These instructions already do sign extension on output, so we can get rid of it.

Note: (MOVWreg (MULW x y)) may araise from divisions by constant,
generic rules replace them with multiply and may produce (Rsh32x64 (Mul32 _ _) _).

Change-Id: I41bc9b519e38bc6027311de604dadb962cd0bbf4
Reviewed-on: https://go-review.googlesource.com/c/go/+/429757
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Auto-Submit: Jenny Rakoczy <jenny@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Jenny Rakoczy <jenny@golang.org>
2022-09-15 21:04:37 +00:00
Joel Sing
77da976419 cmd/compile: remove redundant SEQZ/SNEZ on riscv64
In particular, (SEQZ (SNEZ x)) can arise from (Not (IsNonNil x)).

Change-Id: Ie249cd1934d71087e0f774cf8f6c937ceeed7ad5
Reviewed-on: https://go-review.googlesource.com/c/go/+/428215
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
2022-09-07 05:39:23 +00:00
Joel Sing
b6a6847b2f cmd/compile: avoid zero extension after properly typed atomic operation on riscv64
LoweredAtomicLoad8 is implemented using MOVBU, hence it is already zero
extended. LoweredAtomicCas32 and LoweredAtomicCas64 return a properly
typed boolean.

Change-Id: Ie0acbaa19403d59c7e5f76d060cc13ee51eb7834
Reviewed-on: https://go-review.googlesource.com/c/go/+/428214
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
2022-09-07 05:38:50 +00:00
Joel Sing
c011270fa5 cmd/compile: improve Slicemask on riscv64
Implement Slicemask the same way every other architecture does - negate
then arithmetic right shift. This sets or clears the sign bit, before
extending it to the entire register.

Removes around 2,500 instructions from the Go binary on linux/riscv64.

Change-Id: I4d675b826e7eb23fe2b1e6e46b95dcd49ab49733
Reviewed-on: https://go-review.googlesource.com/c/go/+/426354
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Michael Knyszek <mknyszek@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-09-07 05:37:53 +00:00
Joel Sing
3e11e61f3c cmd/compile: optimise subtraction with const on riscv64
Convert subtraction from const to a negated ADDI with negative const
value, where possible. At worst this avoids a register load and uses
the same number of instructions. At best, this allows for further
optimisation to occur, particularly where equality is involved.

For example, this sequence:

   li      t0,-1
   sub     t1,t0,a0
   snez    t1,t1

Becomes:

   addi    t0,a0,1
   snez    t0,t0

Removes more than 2000 instructions from the Go binary on linux/riscv64.

Change-Id: I68f3be897bc645d4a8fa3ab3cef165a00a74df19
Reviewed-on: https://go-review.googlesource.com/c/go/+/426263
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
2022-09-02 20:14:40 +00:00
Joel Sing
646c3eee06 cmd/compile: negate comparision with FNES/FNED on riscv64
The FNES and FNED instructions are pseudo-instructions, which the
assembler expands to FEQS/NEG or FEQD/NEG - if we're comparing the
result via a branch instruction, we can avoid an instruction by
negating both the branch comparision and the floating point
comparision.

This only removes a handful of instructions from the Go binary,
however, it will provide benefit to floating point intensive code.

Change-Id: I4e3124440b7659acc4d9bc9948b755a4900a422f
Reviewed-on: https://go-review.googlesource.com/c/go/+/426261
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Run-TryBot: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-09-02 20:14:16 +00:00
Wayne Zuo
da6556968f cmd/compile: simplify bounded shift on riscv64
The prove pass will mark some shifts bounded, and then we can use that
information to generate better code on riscv64.

Change-Id: Ia22f43d0598453c9417adac7017db28d7240948b
Reviewed-on: https://go-review.googlesource.com/c/go/+/422616
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Keith Randall <khr@google.com>
Auto-Submit: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-31 20:21:00 +00:00
Joel Sing
971373f56a cmd/compile: remove NEG when used with SEQZ/SNEZ on riscv64
The negation does not change the comparison to zero.

Also remove unnecessary x.Uses == 1 condition from equivalent BEQZ/BNEZ rules.

Change-Id: I62dd8e383e42bfe5c46d11bbf78d8e5ff862a1d5
Reviewed-on: https://go-review.googlesource.com/c/go/+/426262
Reviewed-by: Cherry Mui <cherryyz@google.com>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Heschi Kreinick <heschi@google.com>
2022-08-31 20:08:03 +00:00
Joel Sing
239115c3ef cmd/compile: avoid extending floating point comparision on riscv64
The result of these operations are already extended.

Change-Id: Ifc8ba362dda7035d8fd0d40046a96f61d3082877
Reviewed-on: https://go-review.googlesource.com/c/go/+/426260
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-31 20:05:46 +00:00
Joel Sing
9085ff5859 cmd/compile: avoid extending when already sufficiently masked on riscv64
Removes more than 2000 instructions from the Go binary on linux/risv64.

Change-Id: I6db3e3b1c93f29f00869adcba7c6192bfb90b25c
Reviewed-on: https://go-review.googlesource.com/c/go/+/426259
TryBot-Result: Gopher Robot <gobot@golang.org>
Reviewed-by: Heschi Kreinick <heschi@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-31 20:05:06 +00:00
Wayne Zuo
a6219737e3 cmd/compile: intrinsify Sub64 on riscv64
After this CL, the performance difference in crypto/elliptic
benchmarks on linux/riscv64 are:

name                 old time/op    new time/op    delta
ScalarBaseMult/P256    1.64ms ± 1%    1.60ms ± 1%   -2.36%  (p=0.008 n=5+5)
ScalarBaseMult/P224    1.53ms ± 1%    1.47ms ± 2%   -4.24%  (p=0.008 n=5+5)
ScalarBaseMult/P384    5.12ms ± 2%    5.03ms ± 2%     ~     (p=0.095 n=5+5)
ScalarBaseMult/P521    22.3ms ± 2%    13.8ms ± 1%  -37.89%  (p=0.008 n=5+5)
ScalarMult/P256        4.49ms ± 2%    4.26ms ± 2%   -5.13%  (p=0.008 n=5+5)
ScalarMult/P224        4.33ms ± 1%    4.09ms ± 1%   -5.59%  (p=0.008 n=5+5)
ScalarMult/P384        16.3ms ± 1%    15.5ms ± 2%   -4.78%  (p=0.008 n=5+5)
ScalarMult/P521         101ms ± 0%      47ms ± 2%  -53.36%  (p=0.008 n=5+5)

Change-Id: I31cf0506e27f9d85f576af1813630a19c20dda8a
Reviewed-on: https://go-review.googlesource.com/c/go/+/420095
Reviewed-by: Cherry Mui <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-27 05:43:59 +00:00
Wayne Zuo
969f48a3a2 cmd/compile: intrinsify Add64 on riscv64
According to RISCV instruction set manual v2.2 Sec 2.4, we can
implement overflowing check for unsigned addition cheaply using
SLTU instructions.

After this CL, the performance difference in crypto/elliptic
benchmarks on linux/riscv64 are:

name                 old time/op    new time/op    delta
ScalarBaseMult/P256    1.93ms ± 1%    1.64ms ± 1%  -14.96%  (p=0.008 n=5+5)
ScalarBaseMult/P224    1.80ms ± 2%    1.53ms ± 1%  -14.89%  (p=0.008 n=5+5)
ScalarBaseMult/P384    6.15ms ± 2%    5.12ms ± 2%  -16.73%  (p=0.008 n=5+5)
ScalarBaseMult/P521    25.9ms ± 1%    22.3ms ± 2%  -13.78%  (p=0.008 n=5+5)
ScalarMult/P256        5.59ms ± 1%    4.49ms ± 2%  -19.79%  (p=0.008 n=5+5)
ScalarMult/P224        5.42ms ± 1%    4.33ms ± 1%  -20.01%  (p=0.008 n=5+5)
ScalarMult/P384        19.9ms ± 2%    16.3ms ± 1%  -18.15%  (p=0.008 n=5+5)
ScalarMult/P521        97.3ms ± 1%   100.7ms ± 0%   +3.48%  (p=0.008 n=5+5)

Change-Id: Ic4c82ced4b072a4a6575343fa9f29dd09b0cabc4
Reviewed-on: https://go-review.googlesource.com/c/go/+/420094
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-08-27 05:43:32 +00:00
Wayne Zuo
b60432df14 cmd/compile: deadcode for LoweredMuluhilo on riscv64
This is a follow up of CL 425101 on RISCV64.

According to RISCV Volume 1, Unprivileged Spec v. 20191213 Chapter 7.1:
If both the high and low bits of the same product are required, then the
recommended code sequence is: MULH[[S]U] rdh, rs1, rs2; MUL rdl, rs1, rs2
(source register specifiers must be in same order and rdh cannot be the
same as rs1 or rs2). Microarchitectures can then fuse these into a single
multiply operation instead of performing two separate multiplies.

So we should not split Muluhilo to separate instructions.

Updates #54607

Change-Id: If47461f3aaaf00e27cd583a9990e144fb8bcdb17
Reviewed-on: https://go-review.googlesource.com/c/go/+/425203
Auto-Submit: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Wayne Zuo <wdvxdr@golangcn.org>
Reviewed-by: Keith Randall <khr@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-08-24 18:08:33 +00:00
Joel Sing
95547aee8c cmd/compile: cast riscv64 rewrite shifts to unsigned int
This appeases Go 1.4, making it possible to bootstrap GOARCH=riscv64 with
a Go 1.4 compiler.

Fixes #52583

Change-Id: Ib13c2afeb095b2bb1464dcd7f1502574209bc7ab
Reviewed-on: https://go-review.googlesource.com/c/go/+/409974
TryBot-Result: Gopher Robot <gobot@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
Reviewed-by: Bryan Mills <bcmills@google.com>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2022-06-06 19:03:15 +00:00
Cherry Mui
d6e6140c98 cmd/compile: fix boolean comparison on RISCV64
Following CL 405114, for RISCV64.

May fix RISCV64 builds.

Updates #52788.

Change-Id: Ifc34658703d1e8b97665e7b862060152e3005d71
Reviewed-on: https://go-review.googlesource.com/c/go/+/405553
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
2022-05-12 19:11:22 +00:00
Cherry Mui
1ed30ca537 cmd/compile: correct type of pointer difference on RISCV64
Pointer comparison is lowered to the following on RISCV64

(EqPtr x y) => (SEQZ (SUB <x.Type> x y))

The difference of two pointers (the SUB) should not be pointer
type. Otherwise it can cause the GC to find a bad pointer.

Should fix #51101.

Change-Id: I7e73c2155c36ff403c032981a9aa9cccbfdf0f64
Reviewed-on: https://go-review.googlesource.com/c/go/+/385655
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gopher Robot <gobot@golang.org>
2022-02-14 23:08:44 +00:00
Joel Sing
fe8347b61a cmd/compile: optimise immediate operands with constants on riscv64
Instructions with immediates can be precomputed when operating on a
constant - do so for SLTI/SLTIU, SLLI/SRLI/SRAI, NEG/NEGW, ANDI, ORI
and ADDI. Additionally, optimise ANDI and ORI when the immediate is
all ones or all zeroes.

In particular, the RISCV64 logical left and right shift rules
(Lsh*x*/Rsh*Ux*) produce sequences that check if the shift amount
exceeds 64 and if so returns zero. When the shift amount is a
constant we can precompute and eliminate the filter entirely.

Likewise the arithmetic right shift rules produce sequences that
check if the shift amount exceeds 64 and if so, ensures that the
lower six bits of the shift are all ones. When the shift amount
is a constant we can precompute the shift value.

Arithmetic right shift sequences like:

   117fc:       00100513                li      a0,1
   11800:       04053593                sltiu   a1,a0,64
   11804:       fff58593                addi    a1,a1,-1
   11808:       0015e593                ori     a1,a1,1
   1180c:       40b45433                sra     s0,s0,a1

Are now a single srai instruction:

   117fc:       40145413                srai    s0,s0,0x1

Likewise for logical left shift (and logical right shift):

   1d560:       01100413                li      s0,17
   1d564:       04043413                sltiu   s0,s0,64
   1d568:       40800433                neg     s0,s0
   1d56c:       01131493                slli    s1,t1,0x11
   1d570:       0084f433                and     s0,s1,s0

Which are now a single slli (or srli) instruction:

   1d120:       01131413                slli    s0,t1,0x11

This removes more than 30,000 instructions from the Go binary and
should improve performance in a variety of areas - of note
runtime.makemap_small drops from 48 to 36 instructions. Similar
gains exist in at least other parts of runtime and math/bits.

Change-Id: I33f6f3d1fd36d9ff1bda706997162bfe4bb859b6
Reviewed-on: https://go-review.googlesource.com/c/go/+/350689
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Michael Munday <mike.munday@lowrisc.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-09-24 10:51:48 +00:00
Cherry Mui
c10b980220 cmd/compile: restore tail call for method wrappers
For certain type of method wrappers we used to generate a tail
call. That was disabled in CL 307234 when register ABI is used,
because with the current IR it was difficult to generate a tail
call with the arguments in the right places. The problem was that
the IR does not contain a CALL-like node with arguments; instead,
it contains an OAS node that adjusts the receiver, than an
OTAILCALL node that just contains the target, but no argument
(with the assumption that the OAS node will put the adjusted
receiver in the right place). With register ABI, putting
arguments in registers are done in SSA. The assignment (OAS)
doesn't put the receiver in register.

This CL changes the IR of a tail call to take an actual OCALL
node. Specifically, a tail call is represented as

OTAILCALL (OCALL target args...)

This way, the call target and args are connected through the OCALL
node. So the call can be analyzed in SSA and the args can be passed
in the right places.

(Alternatively, we could have OTAILCALL node directly take the
target and the args, without the OCALL node. Using an OCALL node is
convenient as there are existing code that processes OCALL nodes
which do not need to be changed. Also, a tail call is similar to
ORETURN (OCALL target args...), except it doesn't preserve the
frame. I did the former but I'm open to change.)

The SSA representation is similar. Previously, the IR lowers to
a Store the receiver then a BlockRetJmp which jumps to the target
(without putting the arg in register). Now we use a TailCall op,
which takes the target and the args. The call expansion pass and
the register allocator handles TailCall pretty much like a
StaticCall, and it will do the right ABI analysis and put the args
in the right places. (Args other than the receiver are already in
the right places. For register args it generates no code for them.
For stack args currently it generates a self copy. I'll work on
optimize that out.) BlockRetJmp is still used, signaling it is a
tail call. The actual call is made in the TailCall op so
BlockRetJmp generates no code (we could use BlockExit if we like).

This slightly reduces binary size:
              old        new
cmd/go     14003088   13953936
cmd/link    6275552    6271456

Change-Id: I2d16d8d419fe1f17554916d317427383e17e27f0
Reviewed-on: https://go-review.googlesource.com/c/go/+/350145
Trust: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: David Chase <drchase@google.com>
2021-09-17 22:59:44 +00:00
Michael Munday
c69f5c0d76 cmd/compile: add support for Abs and Copysign intrinsics on riscv64
Also, add the FABSS and FABSD pseudo instructions to the assembler.
The compiler could use FSGNJX[SD] directly but there doesn't seem
to be much advantage to doing so and the pseudo instructions are
easier to understand.

Change-Id: Ie8825b8aa8773c69cc4f07a32ef04abf4061d80d
Reviewed-on: https://go-review.googlesource.com/c/go/+/348989
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-09-10 10:45:59 +00:00
Michael Munday
ea51e223c2 cmd/{asm,compile}: add fused multiply-add support on riscv64
Add support to the assembler for F[N]M{ADD,SUB}[SD] instructions.
Argument order is:

  OP RS1, RS2, RS3, RD

Also, add support for the FMA intrinsic to the compiler. Automatic
FMA matching is left to a future CL.

Change-Id: I47166c7393b2ab6bfc2e42aa8c1a8997c3a071b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/293030
Trust: Michael Munday <mike.munday@lowrisc.org>
Run-TryBot: Michael Munday <mike.munday@lowrisc.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-09-01 21:17:04 +00:00
Joel Sing
8fff20ffeb cmd/compile: absorb NEG into branch when possible on riscv64
We can end up with this situation due to our equality tests being based on
'SEQZ (SUB x y)' - if x is a zero valued constant, 'SUB x y' can be converted
to 'NEG x'. When used with a branch the SEQZ can be absorbed, leading to
'BNEZ (NEG x)' where the NEG is redundant.

Removes around 1700 instructions from the go binary on riscv64.

Change-Id: I947a080d8bf7d2d6378ab114172e2342ce2c51db
Reviewed-on: https://go-review.googlesource.com/c/go/+/342850
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-21 11:23:14 +00:00
Joel Sing
bcd146d398 cmd/compile: convert branch with zero to more optimal branch zero on riscv64
Convert BLT and BGE with a zero valued constant to BGTZ/BLTZ/BLEZ/BGEZ as
appropriate.

Removes over 4,500 instructions from the go binary on riscv64.

Change-Id: Icc266e968b126ba04863ec88529630a9dd44498b
Reviewed-on: https://go-review.googlesource.com/c/go/+/342849
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Mui <cherryyz@google.com>
Run-TryBot: Cherry Mui <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
2021-08-21 11:22:07 +00:00
Meng Zhuo
1951afc919 cmd/compile: lowered MulUintptr on riscv64
According to RISCV instruction set manual v2.2 Sec 6.1
MULHU followed by MUL will be fused into one multiply by microarchitecture

name              old time/op  new time/op  delta
MulUintptr/small  11.2ns ±24%   9.2ns ± 0%  -17.54%  (p=0.000 n=10+9)
MulUintptr/large  15.9ns ± 0%  10.9ns ± 0%  -31.55%  (p=0.000 n=8+8)

Change-Id: I3d152218f83948cbc5c576bda29dc86e9b4206ee
Reviewed-on: https://go-review.googlesource.com/c/go/+/338753
Trust: Meng Zhuo <mzh@golangcn.org>
Reviewed-by: Joel Sing <joel@sing.id.au>
2021-08-17 01:29:37 +00:00
Meng Zhuo
efd206eb40 cmd/compile: intrinsify Mul64 on riscv64
According to RISCV instruction set manual v2.2 Sec 6.1
MULHU followed by MUL will be fused into one multiply by microarchitecture

Benchstat on Hifive unmatched:
name          old time/op    new time/op    delta
Hash8Bytes       245ns ± 3%     186ns ± 4%  -23.99%  (p=0.000 n=10+10)
Hash320Bytes    1.94µs ± 1%    1.31µs ± 1%  -32.38%  (p=0.000 n=9+10)
Hash1K          5.84µs ± 0%    3.84µs ± 0%  -34.20%  (p=0.000 n=10+9)
Hash8K          45.3µs ± 0%    29.4µs ± 0%  -35.04%  (p=0.000 n=10+10)

name          old speed      new speed      delta
Hash8Bytes    32.7MB/s ± 3%  43.0MB/s ± 4%  +31.61%  (p=0.000 n=10+10)
Hash320Bytes   165MB/s ± 1%   244MB/s ± 1%  +47.88%  (p=0.000 n=9+10)
Hash1K         175MB/s ± 0%   266MB/s ± 0%  +51.98%  (p=0.000 n=10+9)
Hash8K         181MB/s ± 0%   279MB/s ± 0%  +53.94%  (p=0.000 n=10+10)

Change-Id: I3561495d02a4a0ad8578e9b9819bf0a4eaca5d12
Reviewed-on: https://go-review.googlesource.com/c/go/+/329970
Reviewed-by: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Go Bot <gobot@golang.org>
Trust: Meng Zhuo <mzh@golangcn.org>
2021-08-16 13:50:11 +00:00
Cherry Zhang
4a7effa418 cmd/compile: mark R12 clobbered for special calls
In external linking mode the external linker may insert
trampolines, which use R12 as a scratch register. So a call could
potentially clobber R12 if the target is laid out too far. Mark
R12 clobbered.

Also, we will use R12 for trampolines in the Go linker as well.

CL 310731 updated the generated rewrite files so imports are
grouped, but the generator was not updated to do so. Grouped
imports are nice. But as those are generated files, for
simplicity and my laziness, just regenerate with the current
generator (which makes imports not grouped).

Change-Id: Iddb741ff7314a291ade5fbffc7d315f555808409
Reviewed-on: https://go-review.googlesource.com/c/go/+/314453
Trust: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
2021-04-28 14:01:59 +00:00
Russ Cox
95ed5c3800 internal/buildcfg: move build configuration out of cmd/internal/objabi
The go/build package needs access to this configuration,
so move it into a new package available to the standard library.

Change-Id: I868a94148b52350c76116451f4ad9191246adcff
Reviewed-on: https://go-review.googlesource.com/c/go/+/310731
Trust: Russ Cox <rsc@golang.org>
Run-TryBot: Russ Cox <rsc@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: Jay Conrod <jayconrod@google.com>
2021-04-16 19:20:53 +00:00