MOV*nop and MOV*reg seem superfluous. They are there to keep type
information around that would otherwise get thrown away. Not sure
what we need it for. I think our compiler needs a normalization of
how types are represented in SSA, especially after lowering.
MOV*nop gets in the way of some optimization rules firing, like for
load combining.
For now, just fold MOV*nop and MOV*const. It's certainly safe to
do that, as the type info on the MOV*const isn't ever useful.
R=go1.17
Change-Id: I3630a80afc2455a8e9cd9fde10c7abe05ddc3767
Reviewed-on: https://go-review.googlesource.com/c/go/+/276792
Trust: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
As done with other equality tests, zero extend before subtraction rather than
after (or in this case, at the same time). While at face value this appears to
require more instructions, in reality it allows for most sign extensions to
be completely eliminated due to correctly typed loads. Existing optimisations
(such as subtraction of zero) then become more effective.
This removes more than 10,000 instructions from the Go binary and in particular,
a writeBarrier check only requires three instructions (AUIPC, LWU, BNEZ) instead
of the current four (AUIPC, LWU, NEGW, BNEZ).
Change-Id: I7afdc1921c4916ddbd414c3b3f5c2089107ec016
Reviewed-on: https://go-review.googlesource.com/c/go/+/274066
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Optimize small (s <= 32) zeroing/moving operations on riscv64.
Avoid generating unaligned memory accesses.
The code is almost one to one translation of the corresponding
mips64 rules with additional rule for s=32.
Change-Id: I753b0b8e53cb9efcf43c8080cab90f3d03539fb8
Reviewed-on: https://go-review.googlesource.com/c/go/+/266217
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Sign extension for consts is unnecessary and zero extension for consts can be avoided
via casts. This removes over 16,000 instructions from the Go binary, in part because it
allows for better zero const absorbtion in blocks - for example,
`(BEQ (MOVBU (MOVBconst [0])) cond yes no)` now becomes `(BEQZ cond yes no)` when
this change is combined with existing rules.
Change-Id: I27e791bfa84869639db653af6119f6e10369ba3d
Reviewed-on: https://go-review.googlesource.com/c/go/+/265041
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Also make canMergeSym take Syms instead of interface{}
Change-Id: I4926a1fc586aa90e198249d67e5b520404b40869
Reviewed-on: https://go-review.googlesource.com/c/go/+/265817
Trust: Alberto Donizetti <alb.donizetti@gmail.com>
Run-TryBot: Alberto Donizetti <alb.donizetti@gmail.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Implement runtime.duffzero and runtime.duffcopy for riscv64.
Use obj.ADUFFZERO/obj.ADUFFCOPY for medium size, word aligned
zeroing/moving.
Change-Id: I42ec622055630c94cb77e286d8d33dbe7c9f846c
Reviewed-on: https://go-review.googlesource.com/c/go/+/237797
Run-TryBot: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Add additional rules to eliminate unnecessary sign/zero extension for riscv64.
Also where possible, replace an extension following a load with a different typed
load. This removes almost another 8,000 instructions from the go binary.
Of particular note, change Eq16/Eq8/Neq16/Neq8 to zero extend each value before
subtraction, rather than zero extending after subtraction. While this appears to
double the number of zero extensions, it often lets us completely eliminate them
as the load can already be performed in a properly typed manner.
As an example, prior to this change runtime.memequal16 was:
0000000000013028 <runtime.memequal16>:
13028: 00813183 ld gp,8(sp)
1302c: 00019183 lh gp,0(gp)
13030: 01013283 ld t0,16(sp)
13034: 00029283 lh t0,0(t0)
13038: 405181b3 sub gp,gp,t0
1303c: 03019193 slli gp,gp,0x30
13040: 0301d193 srli gp,gp,0x30
13044: 0011b193 seqz gp,gp
13048: 00310c23 sb gp,24(sp)
1304c: 00008067 ret
Whereas it now becomes:
0000000000012fa8 <runtime.memequal16>:
12fa8: 00813183 ld gp,8(sp)
12fac: 0001d183 lhu gp,0(gp)
12fb0: 01013283 ld t0,16(sp)
12fb4: 0002d283 lhu t0,0(t0)
12fb8: 405181b3 sub gp,gp,t0
12fbc: 0011b193 seqz gp,gp
12fc0: 00310c23 sb gp,24(sp)
12fc4: 00008067 ret
Change-Id: I16321feb18381241cab121c0097a126104c56c2c
Reviewed-on: https://go-review.googlesource.com/c/go/+/264659
Trust: Joel Sing <joel@sing.id.au>
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Rather than handling sign and zero extension via rules, defer to the assembler
and use MOV pseudo-instructions. The instruction can also be omitted where the
type and size is already correct. This change results in more than 6,000
instructions being removed from the go binary (in part due to omitted
instructions, in part due to MOVBU having a more efficient implementation in
the assembler than what is used in the current ZeroExt8to{16,32,64} rules).
This will also allow for further rewriting to remove redundant sign/zero
extension.
Change-Id: I05e42fd9f09f40a69948be7de772cce8946c8744
Reviewed-on: https://go-review.googlesource.com/c/go/+/264658
Trust: Joel Sing <joel@sing.id.au>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Make use of multi-control values and branch pseudo-instructions to optimise
compiler generated branches.
Change-Id: I7a8bf754db3c2082a390bf6a662ccf18cbcbee39
Reviewed-on: https://go-review.googlesource.com/c/go/+/226400
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
This makes the intent clearer, allows for another ellipsis and will aid
in future rewriting. While here, document boolean loads to explain register
contents.
Change-Id: I933db2813826d88819366191fbbea8fcee5e4dda
Reviewed-on: https://go-review.googlesource.com/c/go/+/230120
Reviewed-by: Keith Randall <khr@golang.org>
Implement multi-control branches for riscv64, switching to using the BNEZ
pseudo-instruction when rewriting conditionals. This will allow for further
branch optimisations to later be performed via rewrites.
Change-Id: I7f2c69f3c77494b403f26058c6bc8432d8070ad0
Reviewed-on: https://go-review.googlesource.com/c/go/+/226399
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Joel Sing <joel@sing.id.au>
This reverts commit 98c32670fd454939794504225dca1d4ec55045d5.
Rolling-forward with trivial format-string fix
cmd/compile: adjust RISCV64 rewrite rules to use typed aux fields
Also add a typed version of mergeSym to rewrite.go to assist with a few
rules that used mergeSym in the untyped-form.
Remove a few extra int32 overflow checks that no longer make sense, as
adding two int8s or int16s should never overflow an int32.
Passes toolstash-check -all.
Original review: https://go-review.googlesource.com/c/go/+/228882
Change-Id: Ib63db4ee1687446f0f3d9f11575a40dd85cbce55
Reviewed-on: https://go-review.googlesource.com/c/go/+/229126
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Than McIntosh <thanm@google.com>
Also add a typed version of mergeSym to rewrite.go to assist with a few
rules that used mergeSym in the untyped-form.
Remove a few extra int32 overflow checks that no longer make sense, as
adding two int8s or int16s should never overflow an int32.
Passes toolstash-check -all.
Change-Id: I72ddd2b0d9001faa87ad0ab54f500057164661b7
Reviewed-on: https://go-review.googlesource.com/c/go/+/228882
Reviewed-by: Keith Randall <khr@golang.org>
Otherwise, just copying the aux and auxint fields doesn't make much sense.
(Although there's no bug - it just means it isn't typechecked correctly.)
Change-Id: I4e21ac67f0c7bfd04ed5af1713cd24bca08af092
Reviewed-on: https://go-review.googlesource.com/c/go/+/227962
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Log large copies in the riscv64 compiler.
This was missed in 47ade08141, resulting in
the new test added to cmd/compile/internal/logopt failing on riscv64.
Change-Id: I6f763e86f42834148e911d16928f9fbabcfa4290
Reviewed-on: https://go-review.googlesource.com/c/go/+/227804
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Extend CL 220417 (which removed the integer Greater and Geq ops) to
floating point comparisons. Greater and Geq can always be
implemented using Less and Leq.
Fixes#37316.
Change-Id: Ieaddb4877dd0ff9037a1dd11d0a9a9e45ced71e7
Reviewed-on: https://go-review.googlesource.com/c/go/+/222397
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Where possible, fold constants into versions of instructions that take
an immediate. This avoids the need to allocate a register and load the
immediate into it.
Change-Id: If911ca41235e218490679aed2ce5f48bf807a2b3
Reviewed-on: https://go-review.googlesource.com/c/go/+/222639
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Also rewrite subtraction of zero to NEG/NEGW.
Change-Id: I216e286d1860055f2a07fe2f772cd50f366ea097
Reviewed-on: https://go-review.googlesource.com/c/go/+/221691
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This allows for zero stores to be performed using the zero register, rather
than loading a separate register with zero.
Change-Id: Ic81d8dbcdacbb2ca2c3f77682ff5ad7cdc33d18d
Reviewed-on: https://go-review.googlesource.com/c/go/+/221684
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Convert subtraction of a constant into an ADDI with a negative immediate,
where possible.
Change-Id: Ie8d54b7538f0012e5f898abea233b2957fe31899
Reviewed-on: https://go-review.googlesource.com/c/go/+/221679
Reviewed-by: Cherry Zhang <cherryyz@google.com>
This:
* Simplifies and shortens the generated code for rewrite rules.
* Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile.
* Removes the stmt boundary code wrangling from Value.reset,
in favor of doing it in the one place where it actually does some work,
namely the writebarrier pass. (This was ascertained by inspecting the
code for cases in which notStmtBoundary values were generated.)
Passes toolstash-check -all.
Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07
Reviewed-on: https://go-review.googlesource.com/c/go/+/221421
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
SignExt32to64 can be implemented with a single ADDIW instruction, rather than
the two shifts that are in use currently.
Change-Id: Ie1bbaef4018f1ba5162773fc64fa5a887457cfc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/220922
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Instead of writing AuxInt during prove and then zeroing it during lower,
just don't write it in the first place.
Passes toolstash-check -all.
Change-Id: Iea4b555029a9d69332e835536f9cf3a42b8223db
Reviewed-on: https://go-review.googlesource.com/c/go/+/220682
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Slicemask can be performed with three immediate instructions, rather than the
six currently in use.
Change-Id: I3f8ca2d5affd1403db8fa79b356f248e6e9332c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/220923
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Use SUBW to perform a 32-bit subtraction, rather than zero extending from
32 to 64 bits. This reduces Eq32 and Neq32 to two instructions, rather than
the four instructions required previously.
Change-Id: Ib2798324881e9db842c864e91a0c1b1e48c4b67b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220921
Reviewed-by: Cherry Zhang <cherryyz@google.com>
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.
Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Also, explicitly zero AuxInt in some ops (like Div),
to make it clear why they do not use an ellipsis.
Passes toolstash-check -all.
Change-Id: Iefd8891fca5d7be8aa1bb91eb1fe2c99c8bf9c88
Reviewed-on: https://go-review.googlesource.com/c/go/+/217011
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.
However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).
Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.
This change reduces both sets of bounds checks.
Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.
And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.
The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.
This does increase register pressure, but the reduced bounds
checks more than compensate.
Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.
There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.
Passes toolstash-check -all.
name old time/op new time/op delta
Template 213ms ± 2% 211ms ± 2% -0.85% (p=0.000 n=92+96)
Unicode 83.5ms ± 2% 83.2ms ± 2% -0.41% (p=0.004 n=95+90)
GoTypes 737ms ± 2% 733ms ± 2% -0.51% (p=0.000 n=91+94)
Compiler 3.45s ± 2% 3.43s ± 2% -0.44% (p=0.000 n=99+100)
SSA 8.54s ± 1% 8.32s ± 2% -2.56% (p=0.000 n=96+99)
Flate 136ms ± 2% 135ms ± 1% -0.47% (p=0.000 n=96+96)
GoParser 169ms ± 1% 168ms ± 1% -0.33% (p=0.000 n=96+93)
Reflect 456ms ± 3% 455ms ± 3% ~ (p=0.261 n=95+94)
Tar 186ms ± 2% 185ms ± 2% -0.48% (p=0.000 n=94+95)
XML 251ms ± 1% 250ms ± 1% -0.51% (p=0.000 n=91+94)
[Geo mean] 424ms 421ms -0.68%
name old user-time/op new user-time/op delta
Template 275ms ± 1% 274ms ± 2% -0.55% (p=0.000 n=95+98)
Unicode 118ms ± 4% 118ms ± 4% ~ (p=0.642 n=98+90)
GoTypes 983ms ± 1% 980ms ± 1% -0.30% (p=0.000 n=93+93)
Compiler 4.56s ± 6% 4.52s ± 6% -0.72% (p=0.003 n=100+100)
SSA 11.4s ± 1% 11.1s ± 1% -2.50% (p=0.000 n=96+97)
Flate 168ms ± 1% 167ms ± 1% -0.49% (p=0.000 n=92+92)
GoParser 204ms ± 1% 204ms ± 2% -0.27% (p=0.003 n=99+96)
Reflect 599ms ± 2% 598ms ± 2% ~ (p=0.116 n=95+92)
Tar 227ms ± 2% 225ms ± 2% -0.57% (p=0.000 n=95+98)
XML 313ms ± 2% 312ms ± 1% -0.37% (p=0.000 n=89+95)
[Geo mean] 547ms 544ms -0.61%
file before after Δ %
compile 21113112 21109016 -4096 -0.019%
total 131704940 131700844 -4096 -0.003%
Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Prior to this change, we generated additional rules at rulegen time
for all possible combinations of args to commutative ops.
This is simple and works well, but leads to lots of generated rules.
This in turn has increased the size of the compiler,
made it hard to compile package ssa on small machines,
and provided a disincentive to mark some ops as commutative.
This change reworks how we handle commutative ops.
Instead of generating a rule per argument permutation,
we generate a series of nested loops, one for each commutative op.
Each loop tries both possible argument orderings.
I also considered attempting to canonicalize the inputs to the
rewrite rules. However, because either or both arguments might be
nothing more than an identifier, and because there can be arbitrary
conditions to evaluate during matching, I did not see how to proceed.
The duplicate rule detection now sorts arguments to commutative ops,
so that it can detect commutative-only duplicates.
There may be further optimizations to the new generated code.
In particular, we may not be removing as many bounds checks as before;
I have not investigated deeply. If more work here is needed,
we could do it with more hints or with improvements to the prove pass.
This change has almost no impact on the generated code.
It does not pass toolstash-check, however. In a handful of functions,
for reasons I do not understand, there are minor position changes.
For the entire series ending at this change,
there is negligible compiler performance impact.
The compiler binary shrinks by about 15%,
and package ssa shrinks by about 25%.
Package ssa also compiles ~25% faster with ~25% less memory.
Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4
Reviewed-on: https://go-review.googlesource.com/c/go/+/213703
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Based on riscv-go port.
Updates #27532
Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>