When we go from a branch block to a plain block, reset the
branch prediction bit. Downstream passes asssume that if the
branch prediction is set, then the block has 2 successors.
Fixes#23504
Change-Id: I2898ec002228b2e34fe80ce420c6939201c0a5aa
Reviewed-on: https://go-review.googlesource.com/88955
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
fuseBlockPlain was accidentally quadratic.
If you had plain blocks b1 -> b2 -> b3 -> b4,
each containing single values v1, v2, v3, and v4 respectively,
fuseBlockPlain would move v1 from b1 to b2 to b3 to b4,
then v2 from b2 to b3 to b4, etc.
There are two obvious fixes.
* Look for runs of blocks in fuseBlockPlain
and handle them in a single go.
* Fuse from end to beginning; any given value in a run
of blocks to fuse then moves only once.
The latter is much simpler, so that's what this CL does.
Somewhat surprisingly, this change does not pass toolstash-check.
The resulting set of blocks is the same,
and the values in them are the same,
but the order of values in them differ,
and that order of values (while arbitrary)
is enough to change the compiler's output.
This may be due to #20178; deadstore is the next pass after fuse.
Adding basic sorting to the beginning of deadstore
is enough to make this CL pass toolstash-check:
for _, b := range f.Blocks {
obj.SortSlice(b.Values, func(i, j int) bool { return b.Values[i].ID < b.Values[j].ID })
}
Happily, this CL appears to result in better code on average,
if only by accident. It cuts 4k off of cmd/go; go1 benchmarks
are noisy as always but don't regress (numbers below).
No impact on the standard compilebench benchmarks.
For the code in #13554, this speeds up compilation dramatically:
name old time/op new time/op delta
Pkg 53.1s ± 2% 12.8s ± 3% -75.92% (p=0.008 n=5+5)
name old user-time/op new user-time/op delta
Pkg 55.0s ± 2% 14.9s ± 3% -73.00% (p=0.008 n=5+5)
name old alloc/op new alloc/op delta
Pkg 2.04GB ± 0% 2.04GB ± 0% +0.18% (p=0.008 n=5+5)
name old allocs/op new allocs/op delta
Pkg 6.21M ± 0% 6.21M ± 0% ~ (p=0.222 n=5+5)
name old object-bytes new object-bytes delta
Pkg 28.4M ± 0% 28.4M ± 0% +0.00% (p=0.008 n=5+5)
name old export-bytes new export-bytes delta
Pkg 208 ± 0% 208 ± 0% ~ (all equal)
Updates #13554
go1 benchmarks:
name old time/op new time/op delta
BinaryTree17-8 2.29s ± 2% 2.26s ± 2% -1.43% (p=0.000 n=48+50)
Fannkuch11-8 2.74s ± 2% 2.79s ± 2% +1.63% (p=0.000 n=50+49)
FmtFprintfEmpty-8 36.6ns ± 3% 34.6ns ± 4% -5.29% (p=0.000 n=49+50)
FmtFprintfString-8 58.3ns ± 3% 59.1ns ± 3% +1.35% (p=0.000 n=50+49)
FmtFprintfInt-8 62.4ns ± 2% 63.2ns ± 3% +1.19% (p=0.000 n=49+49)
FmtFprintfIntInt-8 95.1ns ± 2% 96.7ns ± 3% +1.61% (p=0.000 n=49+50)
FmtFprintfPrefixedInt-8 118ns ± 3% 113ns ± 2% -4.00% (p=0.000 n=50+49)
FmtFprintfFloat-8 191ns ± 2% 192ns ± 2% +0.40% (p=0.034 n=50+50)
FmtManyArgs-8 419ns ± 2% 420ns ± 2% ~ (p=0.228 n=49+49)
GobDecode-8 5.26ms ± 3% 5.19ms ± 2% -1.33% (p=0.000 n=50+49)
GobEncode-8 4.12ms ± 2% 4.15ms ± 3% +0.68% (p=0.007 n=49+50)
Gzip-8 198ms ± 2% 197ms ± 2% -0.50% (p=0.018 n=48+48)
Gunzip-8 31.9ms ± 3% 31.8ms ± 3% -0.47% (p=0.024 n=50+50)
HTTPClientServer-8 64.4µs ± 0% 64.0µs ± 0% -0.55% (p=0.000 n=43+46)
JSONEncode-8 10.6ms ± 2% 10.6ms ± 3% ~ (p=0.543 n=49+49)
JSONDecode-8 43.3ms ± 3% 43.1ms ± 2% ~ (p=0.079 n=50+50)
Mandelbrot200-8 3.70ms ± 2% 3.70ms ± 2% ~ (p=0.553 n=47+50)
GoParse-8 2.70ms ± 2% 2.71ms ± 3% ~ (p=0.843 n=49+50)
RegexpMatchEasy0_32-8 70.5ns ± 4% 70.4ns ± 4% ~ (p=0.867 n=48+50)
RegexpMatchEasy0_1K-8 162ns ± 3% 162ns ± 2% ~ (p=0.739 n=48+48)
RegexpMatchEasy1_32-8 66.1ns ± 5% 66.2ns ± 4% ~ (p=0.970 n=50+50)
RegexpMatchEasy1_1K-8 297ns ± 7% 296ns ± 7% ~ (p=0.406 n=50+50)
RegexpMatchMedium_32-8 105ns ± 5% 105ns ± 5% ~ (p=0.702 n=50+50)
RegexpMatchMedium_1K-8 32.3µs ± 4% 32.2µs ± 3% ~ (p=0.614 n=49+49)
RegexpMatchHard_32-8 1.75µs ±18% 1.74µs ±12% ~ (p=0.738 n=50+48)
RegexpMatchHard_1K-8 52.2µs ±14% 51.3µs ±13% ~ (p=0.230 n=50+50)
Revcomp-8 366ms ± 3% 367ms ± 3% ~ (p=0.745 n=49+49)
Template-8 48.5ms ± 4% 48.5ms ± 4% ~ (p=0.824 n=50+48)
TimeParse-8 263ns ± 2% 256ns ± 2% -2.98% (p=0.000 n=48+49)
TimeFormat-8 265ns ± 3% 262ns ± 3% -1.35% (p=0.000 n=48+49)
[Geo mean] 41.1µs 40.9µs -0.48%
Change-Id: Ib35fa15b54282abb39c077d150beee27f610891a
Reviewed-on: https://go-review.googlesource.com/43570
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Instead of always appending to c.Values,
choose whichever slice is larger;
b.Values will be set to nil anyway.
Appending once instead of in a loop also
limits slice growth to once per function call
and is more efficient.
Reduces max rss for the program in #18602 by 6.5%,
and eliminates fuseBlockPlain from the alloc_space
pprof output. fuseBlockPlain previously accounted
for 16.74% of allocated memory.
Updates #18602.
Change-Id: I417b03722d011a59a679157da43dc91f4425210e
Reviewed-on: https://go-review.googlesource.com/35114
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Happens occasionally for boolean phis was used as a control.
Change-Id: Ie0f2483e9004c1706751d8dfb25ee2e5106d917e
Reviewed-on: https://go-review.googlesource.com/21310
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
Keep track of how many uses each Value has. Each appearance in
Value.Args and in Block.Control counts once.
The number of uses of a value is generically useful to
constrain rewrite rules. For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.
But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use. Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races. In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice. (I have a separate
CL which triggers the mutex failure. This CL has a test which
demonstrates a similar failure.)
Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
* In cases where we end up with empty branches like in
if a then jmp b else jmp b;
the flow can be replaced by a; jmp b.
The following functions is optimized as follows:
func f(a bool, x int) int {
v := 0
if a {
v = -1
} else {
v = -1
}
return x | v
}
Before this change:
02819 (arith_ssa.go:362) VARDEF "".~r2+16(FP)
02820 (arith_ssa.go:362) MOVQ $0, "".~r2+16(FP)
02821 (arith_ssa.go:362) MOVB "".a(FP), AX
02822 (arith_ssa.go:362) TESTB AX, AX
02823 (arith_ssa.go:364) JEQ 2824
02824 (arith_ssa.go:369) VARDEF "".~r2+16(FP)
02825 (arith_ssa.go:369) MOVQ $-1, "".~r2+16(FP)
02826 (arith_ssa.go:369) RET
After this change:
02819 (arith_ssa.go:362) VARDEF "".~r2+16(FP)
02820 (arith_ssa.go:369) VARDEF "".~r2+16(FP)
02821 (arith_ssa.go:369) MOVQ $-1, "".~r2+16(FP)
02822 (arith_ssa.go:369) RET
Updates #14277
Change-Id: Ibe7d284f43406c704903632a4fcf2a4a64059686
Reviewed-on: https://go-review.googlesource.com/19464
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Basic ops, no particular optimization in the pattern
matching yet (e.g. x!=x for Nan detection, x cmp constant,
etc.)
Change-Id: I0043564081d6dc0eede876c4a9eb3c33cbd1521c
Reviewed-on: https://go-review.googlesource.com/13704
Reviewed-by: Keith Randall <khr@golang.org>
Semi-regular merge of tip to dev.ssa.
Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.
Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d