Commit graph

9 commits

Author SHA1 Message Date
Josh Bleecher Snyder
bf5b83a835 cmd/compile: convert splitload rules to typed aux
Passes toolstash-check -all.

Change-Id: Ia441582f7f67184eb831e184f9c3c0e3c11001bd
Reviewed-on: https://go-review.googlesource.com/c/go/+/229698
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2020-04-23 17:52:48 +00:00
Josh Bleecher Snyder
7913f7dfcf cmd/compile: add specialized AddArgN functions for rewrite rules
This shrinks the compiler without impacting performance.
(The performance-sensitive part of rewrite rules is the non-match case.)
Passes toolstash-check -all.

Executable size:

file    before    after     Δ       %       
compile 20356168  20163960  -192208 -0.944% 
total   115599376 115407168 -192208 -0.166% 

Text size:

file                       before   after    Δ       %       
cmd/compile/internal/ssa.s 3928309  3778774  -149535 -3.807% 
total                      18862943 18713408 -149535 -0.793% 

Memory allocated compiling package SSA:

SSA               12.7M ± 0%        12.5M ± 0%  -1.74%  (p=0.008 n=5+5)

Compiler speed impact:

name        old time/op       new time/op       delta
Template          211ms ± 1%        211ms ± 2%    ~     (p=0.832 n=49+49)
Unicode          82.8ms ± 2%       83.2ms ± 2%  +0.44%  (p=0.022 n=46+49)
GoTypes           726ms ± 1%        728ms ± 2%    ~     (p=0.076 n=46+48)
Compiler          3.39s ± 2%        3.40s ± 2%    ~     (p=0.633 n=48+49)
SSA               7.71s ± 1%        7.65s ± 1%  -0.78%  (p=0.000 n=45+44)
Flate             134ms ± 1%        134ms ± 1%    ~     (p=0.195 n=50+49)
GoParser          167ms ± 1%        167ms ± 1%    ~     (p=0.390 n=47+47)
Reflect           453ms ± 3%        452ms ± 2%    ~     (p=0.492 n=48+49)
Tar               184ms ± 3%        184ms ± 2%    ~     (p=0.862 n=50+48)
XML               248ms ± 2%        248ms ± 2%    ~     (p=0.096 n=49+47)
[Geo mean]        415ms             415ms       -0.03%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 1%        273ms ± 2%    ~     (p=0.711 n=48+48)
Unicode           117ms ± 6%        117ms ± 5%    ~     (p=0.633 n=50+50)
GoTypes           972ms ± 2%        974ms ± 1%  +0.29%  (p=0.016 n=47+49)
Compiler          4.46s ± 6%        4.51s ± 6%    ~     (p=0.093 n=50+50)
SSA               10.4s ± 1%        10.3s ± 2%  -0.94%  (p=0.000 n=45+50)
Flate             166ms ± 2%        167ms ± 2%    ~     (p=0.148 n=49+48)
GoParser          202ms ± 1%        202ms ± 2%  -0.28%  (p=0.014 n=47+49)
Reflect           594ms ± 2%        594ms ± 2%    ~     (p=0.717 n=48+49)
Tar               224ms ± 2%        224ms ± 2%    ~     (p=0.805 n=50+49)
XML               311ms ± 1%        310ms ± 1%    ~     (p=0.177 n=49+48)
[Geo mean]        537ms             537ms       +0.01%


Change-Id: I562b9f349b34ddcff01771769e6dbbc80604da7a
Reviewed-on: https://go-review.googlesource.com/c/go/+/221237
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-03-01 15:27:58 +00:00
Josh Bleecher Snyder
6dd11bcb35 cmd/compile: remove chunking of rewrite rules
We added chunking of rewrite rules to speed up compiling package SSA.
This series of changes has significantly shrunk the number of
rewrite rules, and they are no longer being added nearly as fast.
Now that we are sharing v.Args across multiple rewrite rules,
there is additional benefit to having more rules in a single function.
Removing chunking now has an incidental impact on compiling package SSA,
marginally speeds up other compilation, shrinks the cmd/compile binary,
and simplifies the code.

name        old time/op       new time/op       delta
Template          211ms ± 2%        210ms ± 2%  -0.50%  (p=0.000 n=91+97)
Unicode          81.9ms ± 3%       81.8ms ± 3%    ~     (p=0.179 n=96+91)
GoTypes           731ms ± 2%        731ms ± 1%    ~     (p=0.442 n=94+96)
Compiler          3.43s ± 2%        3.41s ± 2%  -0.36%  (p=0.001 n=98+94)
SSA               8.30s ± 2%        8.32s ± 2%  +0.19%  (p=0.034 n=94+95)
Flate             135ms ± 2%        134ms ± 1%  -0.30%  (p=0.006 n=98+94)
GoParser          167ms ± 1%        167ms ± 1%  -0.22%  (p=0.001 n=92+94)
Reflect           453ms ± 2%        453ms ± 3%    ~     (p=0.306 n=98+97)
Tar               184ms ± 2%        183ms ± 2%  -0.31%  (p=0.012 n=94+94)
XML               249ms ± 2%        248ms ± 1%  -0.26%  (p=0.002 n=96+92)
[Geo mean]        419ms             418ms       -0.21%

name        old user-time/op  new user-time/op  delta
Template          273ms ± 2%        272ms ± 2%  -0.46%  (p=0.000 n=93+96)
Unicode           116ms ± 4%        117ms ± 4%    ~     (p=0.433 n=98+98)
GoTypes           977ms ± 2%        977ms ± 1%    ~     (p=0.971 n=92+99)
Compiler          4.56s ± 6%        4.53s ± 6%    ~     (p=0.081 n=100+100)
SSA               11.1s ± 2%        11.1s ± 2%    ~     (p=0.064 n=99+96)
Flate             167ms ± 2%        167ms ± 1%  -0.24%  (p=0.004 n=95+96)
GoParser          203ms ± 1%        203ms ± 2%  -0.14%  (p=0.049 n=96+97)
Reflect           595ms ± 2%        595ms ± 2%    ~     (p=0.544 n=95+92)
Tar               225ms ± 2%        224ms ± 2%    ~     (p=0.562 n=99+99)
XML               312ms ± 2%        311ms ± 1%    ~     (p=0.050 n=97+93)
[Geo mean]        543ms             542ms       -0.13%

Change-Id: I8d34ab59f154b28f20c6f9e416b976bfce339baa
Reviewed-on: https://go-review.googlesource.com/c/go/+/216220
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 02:29:11 +00:00
Josh Bleecher Snyder
a3f234c706 cmd/compile: reduce bounds checks in generated rewrite rules
CL 213703 converted generated rewrite rules for commutative ops
to use loops instead of duplicated code.

However, it loaded args using expressions like
v.Args[i] and v.Args[i^1], which the compiler could
not eliminate bounds for (including with all outstanding
prove CLs).

Also, given a series of separate rewrite rules for the same op,
we generated bounds checks for every rewrite rule, even though
we were repeatedly loading the same set of args.

This change reduces both sets of bounds checks.

Instead of loading v.Args[i] and v.Args[i^1] for commutative loops,
we now preload v.Args[0] and v.Args[1] into local variables,
and then swap them (as needed) in the commutative loop post statement.

And we now load all top level v.Args into local variables
at the beginning of every rewrite rule function.

The second optimization is the more significant,
but the first helps a little, and they play together
nicely from the perspective of generating the code.

This does increase register pressure, but the reduced bounds
checks more than compensate.

Note that the vast majority of rewrite rules evaluated
are not applied, so the prologue is the most important
part of the rewrite rules.

There is one subtle aspect to the new generated code.
Because the top level v.Args are shared across rewrite rules,
and rule evaluation can swap v_0 and v_1, v_0 and v_1
can end up being swapped from one rule to the next.
That is OK, because any time a rule does not get applied,
they will have been swapped exactly twice.

Passes toolstash-check -all.

name        old time/op       new time/op       delta
Template          213ms ± 2%        211ms ± 2%  -0.85%  (p=0.000 n=92+96)
Unicode          83.5ms ± 2%       83.2ms ± 2%  -0.41%  (p=0.004 n=95+90)
GoTypes           737ms ± 2%        733ms ± 2%  -0.51%  (p=0.000 n=91+94)
Compiler          3.45s ± 2%        3.43s ± 2%  -0.44%  (p=0.000 n=99+100)
SSA               8.54s ± 1%        8.32s ± 2%  -2.56%  (p=0.000 n=96+99)
Flate             136ms ± 2%        135ms ± 1%  -0.47%  (p=0.000 n=96+96)
GoParser          169ms ± 1%        168ms ± 1%  -0.33%  (p=0.000 n=96+93)
Reflect           456ms ± 3%        455ms ± 3%    ~     (p=0.261 n=95+94)
Tar               186ms ± 2%        185ms ± 2%  -0.48%  (p=0.000 n=94+95)
XML               251ms ± 1%        250ms ± 1%  -0.51%  (p=0.000 n=91+94)
[Geo mean]        424ms             421ms       -0.68%

name        old user-time/op  new user-time/op  delta
Template          275ms ± 1%        274ms ± 2%  -0.55%  (p=0.000 n=95+98)
Unicode           118ms ± 4%        118ms ± 4%    ~     (p=0.642 n=98+90)
GoTypes           983ms ± 1%        980ms ± 1%  -0.30%  (p=0.000 n=93+93)
Compiler          4.56s ± 6%        4.52s ± 6%  -0.72%  (p=0.003 n=100+100)
SSA               11.4s ± 1%        11.1s ± 1%  -2.50%  (p=0.000 n=96+97)
Flate             168ms ± 1%        167ms ± 1%  -0.49%  (p=0.000 n=92+92)
GoParser          204ms ± 1%        204ms ± 2%  -0.27%  (p=0.003 n=99+96)
Reflect           599ms ± 2%        598ms ± 2%    ~     (p=0.116 n=95+92)
Tar               227ms ± 2%        225ms ± 2%  -0.57%  (p=0.000 n=95+98)
XML               313ms ± 2%        312ms ± 1%  -0.37%  (p=0.000 n=89+95)
[Geo mean]        547ms             544ms       -0.61%

file    before    after     Δ       %
compile 21113112  21109016  -4096   -0.019%
total   131704940 131700844 -4096   -0.003%

Change-Id: Id6c39e0367e597c0c75b8a4b1eb14cc3cbd11956
Reviewed-on: https://go-review.googlesource.com/c/go/+/216218
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2020-02-21 00:55:58 +00:00
Daniel Martí
870080752d cmd/compile: reduce rulegen's output by 200 KiB
First, renove unnecessary "// cond:" lines from the generated files.
This shaves off about ~7k lines.

Second, join "if cond { break }" statements via "||", which allows us to
deduplicate a large number of them. This shaves off another ~25k lines.

This change is not for readability or simplicity; but rather, to avoid
unnecessary verbosity that makes the generated files larger. All in all,
git reports that the generated files overall weigh ~200KiB less, or
about 2.7% less.

While at it, add a -trace flag to rulegen.

Updates #33644.

Change-Id: I3fac0290a6066070cc62400bf970a4ae0929470a
Reviewed-on: https://go-review.googlesource.com/c/go/+/196498
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-09-23 16:36:10 +00:00
Daniel Martí
79dee788ec cmd/compile: teach rulegen to remove unused decls
First, add cpu and memory profiling flags, as these are useful to see
where rulegen is spending its time. It now takes many seconds to run on
a recent laptop, so we have to keep an eye on what it's doing.

Second, stop writing '_ = var' lines to keep imports and variables used
at all times. Now that rulegen removes all such unused names, they're
unnecessary.

To perform the removal, lean on go/types to first detect what names are
unused. We can configure it to give us all the type-checking errors in a
file, so we can collect all "declared but not used" errors in a single
pass.

We then use astutil.Apply to remove the relevant nodes based on the line
information from each unused error. This allows us to apply the changes
without having to do extra parser+printer roundtrips to plaintext, which
are far too expensive.

We need to do multiple such passes, as removing an unused variable
declaration might then make another declaration unused. Two passes are
enough to clean every file at the moment, so add a limit of three passes
for now to avoid eating cpu uncontrollably by accident.

The resulting performance of the changes above is a ~30% loss across the
table, since go/types is fairly expensive. The numbers were obtained
with 'benchcmd Rulegen go run *.go', which involves compiling rulegen
itself, but that seems reflective of how the program is used.

	name     old time/op         new time/op         delta
	Rulegen          5.61s ± 0%          7.36s ± 0%  +31.17%  (p=0.016 n=5+4)

	name     old user-time/op    new user-time/op    delta
	Rulegen          7.20s ± 1%          9.92s ± 1%  +37.76%  (p=0.016 n=5+4)

	name     old sys-time/op     new sys-time/op     delta
	Rulegen          135ms ±19%          169ms ±17%  +25.66%  (p=0.032 n=5+5)

	name     old peak-RSS-bytes  new peak-RSS-bytes  delta
	Rulegen         71.0MB ± 2%         85.6MB ± 2%  +20.56%  (p=0.008 n=5+5)

We can live with a bit more resource usage, but the time/op getting
close to 10s isn't good. To win that back, introduce concurrency in
main.go. This further increases resource usage a bit, but the real time
on this quad-core laptop is greatly reduced. The final benchstat is as
follows:

	name     old time/op         new time/op         delta
	Rulegen          5.61s ± 0%          3.97s ± 1%   -29.26%  (p=0.008 n=5+5)

	name     old user-time/op    new user-time/op    delta
	Rulegen          7.20s ± 1%         13.91s ± 1%   +93.09%  (p=0.008 n=5+5)

	name     old sys-time/op     new sys-time/op     delta
	Rulegen          135ms ±19%          269ms ± 9%   +99.17%  (p=0.008 n=5+5)

	name     old peak-RSS-bytes  new peak-RSS-bytes  delta
	Rulegen         71.0MB ± 2%        226.3MB ± 1%  +218.72%  (p=0.008 n=5+5)

It might be possible to reduce the cpu or memory usage in the future,
such as configuring go/types to do less work, or taking shortcuts to
avoid having to run it many times. For now, ~2x cpu and ~4x memory usage
seems like a fair trade for a faster and better rulegen.

Finally, we can remove the old code that tried to remove some unused
variables in a hacky and unmaintainable way.

Change-Id: Iff9e83e3f253babf5a1bd48cc993033b8550cee6
Reviewed-on: https://go-review.googlesource.com/c/go/+/189798
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-08-27 17:04:18 +00:00
Daniel Martí
0fbf681840 cmd/compile: reduce rulegen's for loop verbosity
A lot of the naked for loops begin like:

	for {
		v := b.Control
		if v.Op != OpConstBool {
			break
		}
		...
		return true
	}

Instead, write them out in a more compact and readable way:

	for v.Op == OpConstBool {
		...
		return true
	}

This requires the addition of two bytes.Buffer writers, as this helps us
make a decision based on future pieces of generated code. This probably
makes rulegen slightly slower, but that's not noticeable; the code
generation still takes ~3.5s on my laptop, excluding build time.

The "v := b.Control" declaration can be moved to the top of each
function. Even though the rules can modify b.Control when firing, they
also make the function return, so v can't be used again.

While at it, remove three unnecessary lines from the top of each
rewriteBlock func.

In total, this results in ~4k lines removed from the generated code, and
a slight improvement in readability.

Change-Id: I317e4c6a4842c64df506f4513375475fad2aeec5
Reviewed-on: https://go-review.googlesource.com/c/go/+/167399
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-22 15:41:51 +00:00
Josh Bleecher Snyder
700e969d5b cmd/compile: regenerate rewrite rules
Change-Id: I7e921b7b4665ff76dc8bae493b2a49318690770b
Reviewed-on: https://go-review.googlesource.com/c/go/+/168637
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Michael Munday <mike.munday@ibm.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2019-03-21 17:04:11 +00:00
Josh Bleecher Snyder
80b6812d7b cmd/compile: move flagalloc op splitting to rewrite rules
Flagalloc has the unenviable task of splitting
flag-generating ops that have been merged with loads
when the flags need to "spilled" (i.e. regenerated).
Since there weren't very many of them, there was a hard-coded list
of ops and bespoke code written to split them.

This change migrates load splitting into rewrite rules,
to make them easier to maintain.

Change-Id: I7750eafb888a802206c410f9c341b3133e7748b8
Reviewed-on: https://go-review.googlesource.com/c/go/+/166978
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-19 16:55:40 +00:00