Count Values with side effects but no use as live, and don't fuse
branches that contain such Values. (This can happen e.g. when it
is followed by an infinite loop.) Otherwise this may lead to
miscompilation (side effect fired at wrong condition) or ICE (two
stores live simultaneously).
Fixes#36005.
Change-Id: If202eae4b37cb7f0311d6ca120ffa46609925157
Reviewed-on: https://go-review.googlesource.com/c/go/+/210179
Reviewed-by: Keith Randall <khr@golang.org>
The code in #29218 resulted in an If block containing only its control.
That block was then converted by fuseIf into a plain block;
as a result, that control value was dead.
However, the control value was still present in b.Values.
This prevented further fusing of that block.
This change beefs up the check in fuseIf to allow fusing
blocks that contain only dead values (if any).
In the case of #29218, this enables enough extra
fusing that the control value could be eliminated,
allowing all values in turn to be eliminated.
This change also fuses 34 new blocks during make.bash.
It is not clear that this fixes every variant of #29218,
but it is a reasonable standalone change.
And code like #29218 is rare and fundamentally buggy,
so we can handle new instances if/when they actually occur.
Fixes#29218
Negligible toolspeed impact.
name old time/op new time/op delta
Template 213ms ± 3% 213ms ± 2% ~ (p=0.914 n=97+88)
Unicode 89.8ms ± 2% 89.6ms ± 2% -0.22% (p=0.045 n=93+95)
GoTypes 712ms ± 3% 709ms ± 2% -0.35% (p=0.023 n=95+95)
Compiler 3.24s ± 2% 3.23s ± 2% -0.30% (p=0.020 n=98+97)
SSA 10.0s ± 1% 10.0s ± 1% ~ (p=0.382 n=98+99)
Flate 135ms ± 3% 135ms ± 2% ~ (p=0.983 n=98+98)
GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.170 n=99+99)
Reflect 447ms ± 3% 447ms ± 2% ~ (p=0.538 n=98+89)
Tar 189ms ± 2% 189ms ± 3% ~ (p=0.874 n=95+96)
XML 251ms ± 2% 251ms ± 2% ~ (p=0.434 n=94+96)
[Geo mean] 427ms 426ms -0.15%
name old user-time/op new user-time/op delta
Template 264ms ± 2% 265ms ± 2% ~ (p=0.075 n=96+90)
Unicode 119ms ± 6% 119ms ± 7% ~ (p=0.864 n=99+98)
GoTypes 926ms ± 2% 924ms ± 2% ~ (p=0.071 n=94+94)
Compiler 4.38s ± 2% 4.37s ± 2% -0.34% (p=0.001 n=98+97)
SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.693 n=90+93)
Flate 162ms ± 3% 161ms ± 2% ~ (p=0.163 n=99+99)
GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.130 n=96+100)
Reflect 572ms ± 3% 572ms ± 2% ~ (p=0.608 n=97+97)
Tar 239ms ± 2% 239ms ± 3% ~ (p=0.999 n=93+91)
XML 302ms ± 2% 302ms ± 2% ~ (p=0.627 n=91+97)
[Geo mean] 540ms 540ms -0.08%
file before after Δ %
asm 4862704 4858608 -4096 -0.084%
compile 24001568 24001680 +112 +0.000%
total 132520780 132516796 -3984 -0.003%
file before after Δ %
cmd/compile/internal/gc.a 8887638 8887596 -42 -0.000%
cmd/compile/internal/ssa.a 29995056 29998986 +3930 +0.013%
cmd/internal/obj/wasm.a 209444 203652 -5792 -2.765%
total 129471798 129469894 -1904 -0.001%
Change-Id: I2d18f9278e68b9766058ae8ca621e844f9d89dd8
Reviewed-on: https://go-review.googlesource.com/c/go/+/177140
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
The compiler appears to have a latent bug:
fusePlain calls invalidateCFG when it changes block structure,
but fuseIf does not.
Fix this by hoisting the call to invalidateCFG to the top level.
Change-Id: Ic960fb3ac963b15b4a225aad84863d58efa954e6
Reviewed-on: https://go-review.googlesource.com/c/go/+/177198
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
There's semantically-but-not-literally equivalent code in
two cases for joining blocks' value lists in ssa/fuse.go.
It can be made literally equivalent, then commoned up.
Updates #25426.
Change-Id: Id1819366c9d22e5126f9203dcd4c622423994110
Reviewed-on: https://go-review.googlesource.com/113719
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).
Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).
Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.
Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".
The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices. This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.
Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go. There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.
Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)
This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.
The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.
This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.
Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
When we go from a branch block to a plain block, reset the
branch prediction bit. Downstream passes asssume that if the
branch prediction is set, then the block has 2 successors.
Fixes#23504
Change-Id: I2898ec002228b2e34fe80ce420c6939201c0a5aa
Reviewed-on: https://go-review.googlesource.com/88955
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
fuseBlockPlain was accidentally quadratic.
If you had plain blocks b1 -> b2 -> b3 -> b4,
each containing single values v1, v2, v3, and v4 respectively,
fuseBlockPlain would move v1 from b1 to b2 to b3 to b4,
then v2 from b2 to b3 to b4, etc.
There are two obvious fixes.
* Look for runs of blocks in fuseBlockPlain
and handle them in a single go.
* Fuse from end to beginning; any given value in a run
of blocks to fuse then moves only once.
The latter is much simpler, so that's what this CL does.
Somewhat surprisingly, this change does not pass toolstash-check.
The resulting set of blocks is the same,
and the values in them are the same,
but the order of values in them differ,
and that order of values (while arbitrary)
is enough to change the compiler's output.
This may be due to #20178; deadstore is the next pass after fuse.
Adding basic sorting to the beginning of deadstore
is enough to make this CL pass toolstash-check:
for _, b := range f.Blocks {
obj.SortSlice(b.Values, func(i, j int) bool { return b.Values[i].ID < b.Values[j].ID })
}
Happily, this CL appears to result in better code on average,
if only by accident. It cuts 4k off of cmd/go; go1 benchmarks
are noisy as always but don't regress (numbers below).
No impact on the standard compilebench benchmarks.
For the code in #13554, this speeds up compilation dramatically:
name old time/op new time/op delta
Pkg 53.1s ± 2% 12.8s ± 3% -75.92% (p=0.008 n=5+5)
name old user-time/op new user-time/op delta
Pkg 55.0s ± 2% 14.9s ± 3% -73.00% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Pkg 2.04GB ± 0% 2.04GB ± 0% +0.18% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Pkg 6.21M ± 0% 6.21M ± 0% ~ (p=0.222 n=5+5)
name old object-bytes new object-bytes delta
Pkg 28.4M ± 0% 28.4M ± 0% +0.00% (p=0.008 n=5+5)
name old export-bytes new export-bytes delta
Pkg 208 ± 0% 208 ± 0% ~ (all equal)
Updates #13554
go1 benchmarks:
name old time/op new time/op delta
BinaryTree17-8 2.29s ± 2% 2.26s ± 2% -1.43% (p=0.000 n=48+50)
Fannkuch11-8 2.74s ± 2% 2.79s ± 2% +1.63% (p=0.000 n=50+49)
FmtFprintfEmpty-8 36.6ns ± 3% 34.6ns ± 4% -5.29% (p=0.000 n=49+50)
FmtFprintfString-8 58.3ns ± 3% 59.1ns ± 3% +1.35% (p=0.000 n=50+49)
FmtFprintfInt-8 62.4ns ± 2% 63.2ns ± 3% +1.19% (p=0.000 n=49+49)
FmtFprintfIntInt-8 95.1ns ± 2% 96.7ns ± 3% +1.61% (p=0.000 n=49+50)
FmtFprintfPrefixedInt-8 118ns ± 3% 113ns ± 2% -4.00% (p=0.000 n=50+49)
FmtFprintfFloat-8 191ns ± 2% 192ns ± 2% +0.40% (p=0.034 n=50+50)
FmtManyArgs-8 419ns ± 2% 420ns ± 2% ~ (p=0.228 n=49+49)
GobDecode-8 5.26ms ± 3% 5.19ms ± 2% -1.33% (p=0.000 n=50+49)
GobEncode-8 4.12ms ± 2% 4.15ms ± 3% +0.68% (p=0.007 n=49+50)
Gzip-8 198ms ± 2% 197ms ± 2% -0.50% (p=0.018 n=48+48)
Gunzip-8 31.9ms ± 3% 31.8ms ± 3% -0.47% (p=0.024 n=50+50)
HTTPClientServer-8 64.4µs ± 0% 64.0µs ± 0% -0.55% (p=0.000 n=43+46)
JSONEncode-8 10.6ms ± 2% 10.6ms ± 3% ~ (p=0.543 n=49+49)
JSONDecode-8 43.3ms ± 3% 43.1ms ± 2% ~ (p=0.079 n=50+50)
Mandelbrot200-8 3.70ms ± 2% 3.70ms ± 2% ~ (p=0.553 n=47+50)
GoParse-8 2.70ms ± 2% 2.71ms ± 3% ~ (p=0.843 n=49+50)
RegexpMatchEasy0_32-8 70.5ns ± 4% 70.4ns ± 4% ~ (p=0.867 n=48+50)
RegexpMatchEasy0_1K-8 162ns ± 3% 162ns ± 2% ~ (p=0.739 n=48+48)
RegexpMatchEasy1_32-8 66.1ns ± 5% 66.2ns ± 4% ~ (p=0.970 n=50+50)
RegexpMatchEasy1_1K-8 297ns ± 7% 296ns ± 7% ~ (p=0.406 n=50+50)
RegexpMatchMedium_32-8 105ns ± 5% 105ns ± 5% ~ (p=0.702 n=50+50)
RegexpMatchMedium_1K-8 32.3µs ± 4% 32.2µs ± 3% ~ (p=0.614 n=49+49)
RegexpMatchHard_32-8 1.75µs ±18% 1.74µs ±12% ~ (p=0.738 n=50+48)
RegexpMatchHard_1K-8 52.2µs ±14% 51.3µs ±13% ~ (p=0.230 n=50+50)
Revcomp-8 366ms ± 3% 367ms ± 3% ~ (p=0.745 n=49+49)
Template-8 48.5ms ± 4% 48.5ms ± 4% ~ (p=0.824 n=50+48)
TimeParse-8 263ns ± 2% 256ns ± 2% -2.98% (p=0.000 n=48+49)
TimeFormat-8 265ns ± 3% 262ns ± 3% -1.35% (p=0.000 n=48+49)
[Geo mean] 41.1µs 40.9µs -0.48%
Change-Id: Ib35fa15b54282abb39c077d150beee27f610891a
Reviewed-on: https://go-review.googlesource.com/43570
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Instead of always appending to c.Values,
choose whichever slice is larger;
b.Values will be set to nil anyway.
Appending once instead of in a loop also
limits slice growth to once per function call
and is more efficient.
Reduces max rss for the program in #18602 by 6.5%,
and eliminates fuseBlockPlain from the alloc_space
pprof output. fuseBlockPlain previously accounted
for 16.74% of allocated memory.
Updates #18602.
Change-Id: I417b03722d011a59a679157da43dc91f4425210e
Reviewed-on: https://go-review.googlesource.com/35114
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Happens occasionally for boolean phis was used as a control.
Change-Id: Ie0f2483e9004c1706751d8dfb25ee2e5106d917e
Reviewed-on: https://go-review.googlesource.com/21310
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Keep track of how many uses each Value has. Each appearance in
Value.Args and in Block.Control counts once.
The number of uses of a value is generically useful to
constrain rewrite rules. For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.
But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use. Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races. In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice. (I have a separate
CL which triggers the mutex failure. This CL has a test which
demonstrates a similar failure.)
Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
* In cases where we end up with empty branches like in
if a then jmp b else jmp b;
the flow can be replaced by a; jmp b.
The following functions is optimized as follows:
func f(a bool, x int) int {
v := 0
if a {
v = -1
} else {
v = -1
}
return x | v
}
Before this change:
02819 (arith_ssa.go:362) VARDEF "".~r2+16(FP)
02820 (arith_ssa.go:362) MOVQ $0, "".~r2+16(FP)
02821 (arith_ssa.go:362) MOVB "".a(FP), AX
02822 (arith_ssa.go:362) TESTB AX, AX
02823 (arith_ssa.go:364) JEQ 2824
02824 (arith_ssa.go:369) VARDEF "".~r2+16(FP)
02825 (arith_ssa.go:369) MOVQ $-1, "".~r2+16(FP)
02826 (arith_ssa.go:369) RET
After this change:
02819 (arith_ssa.go:362) VARDEF "".~r2+16(FP)
02820 (arith_ssa.go:369) VARDEF "".~r2+16(FP)
02821 (arith_ssa.go:369) MOVQ $-1, "".~r2+16(FP)
02822 (arith_ssa.go:369) RET
Updates #14277
Change-Id: Ibe7d284f43406c704903632a4fcf2a4a64059686
Reviewed-on: https://go-review.googlesource.com/19464
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Basic ops, no particular optimization in the pattern
matching yet (e.g. x!=x for Nan detection, x cmp constant,
etc.)
Change-Id: I0043564081d6dc0eede876c4a9eb3c33cbd1521c
Reviewed-on: https://go-review.googlesource.com/13704
Reviewed-by: Keith Randall <khr@golang.org>
Semi-regular merge of tip to dev.ssa.
Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.
Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d