Commit graph

16 commits

Author SHA1 Message Date
Joel Sing
d28b8524a4 cmd/compile: optimize subtraction of zero on riscv64
Change-Id: I9a994b01e9fecb13077c30df4b7677d40d179cce
Reviewed-on: https://go-review.googlesource.com/c/go/+/221681
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-03 12:03:48 +00:00
Joel Sing
bd6f4cd886 cmd/compile: improve subtraction of constants on riscv64
Convert subtraction of a constant into an ADDI with a negative immediate,
where possible.

Change-Id: Ie8d54b7538f0012e5f898abea233b2957fe31899
Reviewed-on: https://go-review.googlesource.com/c/go/+/221679
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-03-03 11:36:47 +00:00
Josh Bleecher Snyder
5e4da0adac cmd/compile: add streamlined Block Reset+AddControl routines
For use in rewrite rules. Shrinks cmd/compile:

compile 20082104  19967416  -114688 -0.571%

Passes toolstash-check -all.

Change-Id: Ic856508b27ec5b7fb9b6ca63e955a7139ae7dc30
Reviewed-on: https://go-review.googlesource.com/c/go/+/221780
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-02 17:40:00 +00:00
Josh Bleecher Snyder
d7c073ecbf cmd/compile: add specialized Value reset for OpCopy
This:

* Simplifies and shortens the generated code for rewrite rules.
* Shrinks cmd/compile by 86k (0.4%) and makes it easier to compile.
* Removes the stmt boundary code wrangling from Value.reset,
  in favor of doing it in the one place where it actually does some work,
  namely the writebarrier pass. (This was ascertained by inspecting the
  code for cases in which notStmtBoundary values were generated.)

Passes toolstash-check -all.

Change-Id: I25671d4c4bbd772f235195d11da090878ea2cc07
Reviewed-on: https://go-review.googlesource.com/c/go/+/221421
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2020-03-02 16:24:47 +00:00
Josh Bleecher Snyder
7913f7dfcf cmd/compile: add specialized AddArgN functions for rewrite rules
This shrinks the compiler without impacting performance.
(The performance-sensitive part of rewrite rules is the non-match case.)
Passes toolstash-check -all.

Executable size:

file    before    after     Δ       %       
compile 20356168  20163960  -192208 -0.944% 
total   115599376 115407168 -192208 -0.166% 

Text size:

file                       before   after    Δ       %       
cmd/compile/internal/ssa.s 3928309  3778774  -149535 -3.807% 
total                      18862943 18713408 -149535 -0.793% 

Memory allocated compiling package SSA:

SSA               12.7M ± 0%        12.5M ± 0%  -1.74%  (p=0.008 n=5+5)

Compiler speed impact:

name        old time/op       new time/op       delta
Template          211ms ± 1%        211ms ± 2%    ~     (p=0.832 n=49+49)
Unicode          82.8ms ± 2%       83.2ms ± 2%  +0.44%  (p=0.022 n=46+49)
GoTypes           726ms ± 1%        728ms ± 2%    ~     (p=0.076 n=46+48)
Compiler          3.39s ± 2%        3.40s ± 2%    ~     (p=0.633 n=48+49)
SSA               7.71s ± 1%        7.65s ± 1%  -0.78%  (p=0.000 n=45+44)
Flate             134ms ± 1%        134ms ± 1%    ~     (p=0.195 n=50+49)
GoParser          167ms ± 1%        167ms ± 1%    ~     (p=0.390 n=47+47)
Reflect           453ms ± 3%        452ms ± 2%    ~     (p=0.492 n=48+49)
Tar               184ms ± 3%        184ms ± 2%    ~     (p=0.862 n=50+48)
XML               248ms ± 2%        248ms ± 2%    ~     (p=0.096 n=49+47)
[Geo mean]        415ms             415ms       -0.03%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 1%        273ms ± 2%    ~     (p=0.711 n=48+48)
Unicode           117ms ± 6%        117ms ± 5%    ~     (p=0.633 n=50+50)
GoTypes           972ms ± 2%        974ms ± 1%  +0.29%  (p=0.016 n=47+49)
Compiler          4.46s ± 6%        4.51s ± 6%    ~     (p=0.093 n=50+50)
SSA               10.4s ± 1%        10.3s ± 2%  -0.94%  (p=0.000 n=45+50)
Flate             166ms ± 2%        167ms ± 2%    ~     (p=0.148 n=49+48)
GoParser          202ms ± 1%        202ms ± 2%  -0.28%  (p=0.014 n=47+49)
Reflect           594ms ± 2%        594ms ± 2%    ~     (p=0.717 n=48+49)
Tar               224ms ± 2%        224ms ± 2%    ~     (p=0.805 n=50+49)
XML               311ms ± 1%        310ms ± 1%    ~     (p=0.177 n=49+48)
[Geo mean]        537ms             537ms       +0.01%


Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-01 15:27:58 +00:00
Josh Bleecher Snyder
d889f0cb10 cmd/compile: use correct types in phiopt
We try to preserve type correctness of generic ops.
phiopt modified a bool to be an int without a conversion.
Add a conversion. There are a few random fluctations in the
generated code as a result, but nothing noteworthy or systematic.

no binary size changes

file                        before   after    Δ       %       
math.s                      35966    35961    -5      -0.014% 
debug/dwarf.s               108141   108147   +6      +0.006% 
crypto/dsa.s                6047     6044     -3      -0.050% 
image/png.s                 42882    42885    +3      +0.007% 
go/parser.s                 80281    80278    -3      -0.004% 
cmd/internal/obj.s          115116   115113   -3      -0.003% 
go/types.s                  322130   322118   -12     -0.004% 
cmd/internal/obj/arm64.s    151679   151685   +6      +0.004% 
go/internal/gccgoimporter.s 56487    56493    +6      +0.011% 
cmd/test2json.s             1650     1647     -3      -0.182% 
cmd/link/internal/loadelf.s 35442    35443    +1      +0.003% 
cmd/go/internal/work.s      305039   305035   -4      -0.001% 
cmd/link/internal/ld.s      544835   544834   -1      -0.000% 
net/http.s                  558777   558774   -3      -0.001% 
cmd/compile/internal/ssa.s  3926551  3926994  +443    +0.011% 
cmd/compile/internal/gc.s   1552320  1552321  +1      +0.000% 
total                       18862241 18862670 +429    +0.002% 


Change-Id: I4289e773be6be534ea3f907d68f614441b8f9b46
Reviewed-on: https://go-review.googlesource.com/c/go/+/221607
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-29 17:02:29 +00:00
Joel Sing
8955a56da0 cmd/compile: improve SignExt32to64 on riscv64
SignExt32to64 can be implemented with a single ADDIW instruction, rather than
the two shifts that are in use currently.

Change-Id: Ie1bbaef4018f1ba5162773fc64fa5a887457cfc9
Reviewed-on: https://go-review.googlesource.com/c/go/+/220922
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2020-02-28 14:33:28 +00:00
Josh Bleecher Snyder
2c859eae1d cmd/compile: ignore div/mod in prove on non-x86 architectures
Instead of writing AuxInt during prove and then zeroing it during lower,
just don't write it in the first place.

Passes toolstash-check -all.

Change-Id: Iea4b555029a9d69332e835536f9cf3a42b8223db
Reviewed-on: https://go-review.googlesource.com/c/go/+/220682
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-27 20:34:49 +00:00
Joel Sing
025a4faf5f cmd/compile: simplify Slicemask on riscv64
Slicemask can be performed with three immediate instructions, rather than the
six currently in use.

Change-Id: I3f8ca2d5affd1403db8fa79b356f248e6e9332c5
Reviewed-on: https://go-review.googlesource.com/c/go/+/220923
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-26 18:00:53 +00:00
Joel Sing
c27dd0c9e5 cmd/compile: improve Eq32/Neq32 on riscv64
Use SUBW to perform a 32-bit subtraction, rather than zero extending from
32 to 64 bits. This reduces Eq32 and Neq32 to two instructions, rather than
the four instructions required previously.

Change-Id: Ib2798324881e9db842c864e91a0c1b1e48c4b67b
Reviewed-on: https://go-review.googlesource.com/c/go/+/220921
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-26 17:59:57 +00:00
Michael Munday
cb74dcc172 cmd/compile: remove Greater* and Geq* generic integer ops
The generic Greater and Geq ops can always be replaced with the Less and
Leq ops. This CL therefore removes them. This simplifies the compiler since
it reduces the number of operations that need handling in both code and in
rewrite rules. This will be especially true when adding control flow
optimizations such as the integer-in-range optimizations in CL 165998.

Change-Id: If0648b2b19998ac1bddccbf251283f3be4ec3040
Reviewed-on: https://go-review.googlesource.com/c/go/+/220417
Run-TryBot: Michael Munday <mike.munday@ibm.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-26 13:11:53 +00:00
Josh Bleecher Snyder
2ed96d0958 cmd/compile: use ellipses in RISCV64 rules
Also, explicitly zero AuxInt in some ops (like Div),
to make it clear why they do not use an ellipsis.

Passes toolstash-check -all.

Change-Id: Iefd8891fca5d7be8aa1bb91eb1fe2c99c8bf9c88
Reviewed-on: https://go-review.googlesource.com/c/go/+/217011
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2020-02-25 15:59:16 +00:00
Josh Bleecher Snyder
6dd11bcb35 cmd/compile: remove chunking of rewrite rules
We added chunking of rewrite rules to speed up compiling package SSA.
This series of changes has significantly shrunk the number of
rewrite rules, and they are no longer being added nearly as fast.
Now that we are sharing v.Args across multiple rewrite rules,
there is additional benefit to having more rules in a single function.
Removing chunking now has an incidental impact on compiling package SSA,
marginally speeds up other compilation, shrinks the cmd/compile binary,
and simplifies the code.

name        old time/op       new time/op       delta
Template          211ms ± 2%        210ms ± 2%  -0.50%  (p=0.000 n=91+97)
Unicode          81.9ms ± 3%       81.8ms ± 3%    ~     (p=0.179 n=96+91)
GoTypes           731ms ± 2%        731ms ± 1%    ~     (p=0.442 n=94+96)
Compiler          3.43s ± 2%        3.41s ± 2%  -0.36%  (p=0.001 n=98+94)
SSA               8.30s ± 2%        8.32s ± 2%  +0.19%  (p=0.034 n=94+95)
Flate             135ms ± 2%        134ms ± 1%  -0.30%  (p=0.006 n=98+94)
GoParser          167ms ± 1%        167ms ± 1%  -0.22%  (p=0.001 n=92+94)
Reflect           453ms ± 2%        453ms ± 3%    ~     (p=0.306 n=98+97)
Tar               184ms ± 2%        183ms ± 2%  -0.31%  (p=0.012 n=94+94)
XML               249ms ± 2%        248ms ± 1%  -0.26%  (p=0.002 n=96+92)
[Geo mean]        419ms             418ms       -0.21%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 2%        272ms ± 2%  -0.46%  (p=0.000 n=93+96)
Unicode           116ms ± 4%        117ms ± 4%    ~     (p=0.433 n=98+98)
GoTypes           977ms ± 2%        977ms ± 1%    ~     (p=0.971 n=92+99)
Compiler          4.56s ± 6%        4.53s ± 6%    ~     (p=0.081 n=100+100)
SSA               11.1s ± 2%        11.1s ± 2%    ~     (p=0.064 n=99+96)
Flate             167ms ± 2%        167ms ± 1%  -0.24%  (p=0.004 n=95+96)
GoParser          203ms ± 1%        203ms ± 2%  -0.14%  (p=0.049 n=96+97)
Reflect           595ms ± 2%        595ms ± 2%    ~     (p=0.544 n=95+92)
Tar               225ms ± 2%        224ms ± 2%    ~     (p=0.562 n=99+99)
XML               312ms ± 2%        311ms ± 1%    ~     (p=0.050 n=97+93)
[Geo mean]        543ms             542ms       -0.13%

Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/216220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder
a3f234c706 cmd/compile: reduce bounds checks in generated rewrite rules
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.

However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).

Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.

This change reduces both sets of bounds checks.

Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.

And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.

The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.

This does increase register pressure, but the reduced bounds
checks more than compensate.

Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.

There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.

Passes toolstash-check -all.

name        old time/op       new time/op       delta
Template          213ms ± 2%        211ms ± 2%  -0.85%  (p=0.000 n=92+96)
Unicode          83.5ms ± 2%       83.2ms ± 2%  -0.41%  (p=0.004 n=95+90)
GoTypes           737ms ± 2%        733ms ± 2%  -0.51%  (p=0.000 n=91+94)
Compiler          3.45s ± 2%        3.43s ± 2%  -0.44%  (p=0.000 n=99+100)
SSA               8.54s ± 1%        8.32s ± 2%  -2.56%  (p=0.000 n=96+99)
Flate             136ms ± 2%        135ms ± 1%  -0.47%  (p=0.000 n=96+96)
GoParser          169ms ± 1%        168ms ± 1%  -0.33%  (p=0.000 n=96+93)
Reflect           456ms ± 3%        455ms ± 3%    ~     (p=0.261 n=95+94)
Tar               186ms ± 2%        185ms ± 2%  -0.48%  (p=0.000 n=94+95)
XML               251ms ± 1%        250ms ± 1%  -0.51%  (p=0.000 n=91+94)
[Geo mean]        424ms             421ms       -0.68%

name        old user-time/op  new user-time/op  delta
Template          275ms ± 1%        274ms ± 2%  -0.55%  (p=0.000 n=95+98)
Unicode           118ms ± 4%        118ms ± 4%    ~     (p=0.642 n=98+90)
GoTypes           983ms ± 1%        980ms ± 1%  -0.30%  (p=0.000 n=93+93)
Compiler          4.56s ± 6%        4.52s ± 6%  -0.72%  (p=0.003 n=100+100)
SSA               11.4s ± 1%        11.1s ± 1%  -2.50%  (p=0.000 n=96+97)
Flate             168ms ± 1%        167ms ± 1%  -0.49%  (p=0.000 n=92+92)
GoParser          204ms ± 1%        204ms ± 2%  -0.27%  (p=0.003 n=99+96)
Reflect           599ms ± 2%        598ms ± 2%    ~     (p=0.116 n=95+92)
Tar               227ms ± 2%        225ms ± 2%  -0.57%  (p=0.000 n=95+98)
XML               313ms ± 2%        312ms ± 1%  -0.37%  (p=0.000 n=89+95)
[Geo mean]        547ms             544ms       -0.61%

file    before    after     Δ       %
compile 21113112  21109016  -4096   -0.019%
total   131704940 131700844 -4096   -0.003%

Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 00:55:58 +00:00
Josh Bleecher Snyder
bd6d78ef37 cmd/compile: use loops to handle commutative ops in rules
Prior to this change, we generated additional rules at rulegen time
for all possible combinations of args to commutative ops.
This is simple and works well, but leads to lots of generated rules.
This in turn has increased the size of the compiler,
made it hard to compile package ssa on small machines,
and provided a disincentive to mark some ops as commutative.

This change reworks how we handle commutative ops.
Instead of generating a rule per argument permutation,
we generate a series of nested loops, one for each commutative op.
Each loop tries both possible argument orderings.

I also considered attempting to canonicalize the inputs to the
rewrite rules. However, because either or both arguments might be
nothing more than an identifier, and because there can be arbitrary
conditions to evaluate during matching, I did not see how to proceed.

The duplicate rule detection now sorts arguments to commutative ops,
so that it can detect commutative-only duplicates.

There may be further optimizations to the new generated code.
In particular, we may not be removing as many bounds checks as before;
I have not investigated deeply. If more work here is needed,
we could do it with more hints or with improvements to the prove pass.

This change has almost no impact on the generated code.
It does not pass toolstash-check, however. In a handful of functions,
for reasons I do not understand, there are minor position changes.

For the entire series ending at this change,
there is negligible compiler performance impact.

The compiler binary shrinks by about 15%,
and package ssa shrinks by about 25%.
Package ssa also compiles ~25% faster with ~25% less memory.

Change-Id: Ia2ee9ceae7be08a17342319d4e31b0bb238a2ee4
Reviewed-on: https://go-review.googlesource.com/c/go/+/213703
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-20 17:34:07 +00:00
Joel Sing
98d2717499 cmd/compile: implement compiler for riscv64
Based on riscv-go port.

Updates #27532

Change-Id: Ia329daa243db63ff334053b8807ea96b97ce3acf
Reviewed-on: https://go-review.googlesource.com/c/go/+/204631
Run-TryBot: Joel Sing <joel@sing.id.au>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-01-18 14:41:40 +00:00