go/src/cmd/compile/internal/ssa/fuse.go

237 lines
5.7 KiB
Go
Raw Normal View History

// Copyright 2015 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package ssa
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
import (
"cmd/internal/src"
)
cmd/compile: fuse before branchelim The branchelim pass works better after fuse. Running fuse before branchelim also increases the stability of generated code amidst other compiler changes, which was the original motivation behind this change. The fuse pass is not cheap enough to run in its entirety before branchelim, but the most important half of it is. This change makes it possible to run "plain fuse" independently and does so before branchelim. During make.bash, elimIf occurrences increase from 4244 to 4288 (1%), and elimIfElse occurrences increase from 989 to 1079 (9%). Toolspeed impact is marginal; plain fuse pays for itself. name old time/op new time/op delta Template 189ms ± 2% 189ms ± 2% ~ (p=0.890 n=45+46) Unicode 93.2ms ± 5% 93.4ms ± 7% ~ (p=0.790 n=48+48) GoTypes 662ms ± 4% 660ms ± 4% ~ (p=0.186 n=48+49) Compiler 2.89s ± 4% 2.91s ± 3% +0.89% (p=0.050 n=49+44) SSA 8.23s ± 2% 8.21s ± 1% ~ (p=0.165 n=46+44) Flate 123ms ± 4% 123ms ± 3% +0.58% (p=0.031 n=47+49) GoParser 154ms ± 4% 154ms ± 4% ~ (p=0.492 n=49+48) Reflect 430ms ± 4% 429ms ± 4% ~ (p=1.000 n=48+48) Tar 171ms ± 3% 170ms ± 4% ~ (p=0.122 n=48+48) XML 232ms ± 3% 232ms ± 2% ~ (p=0.850 n=46+49) [Geo mean] 394ms 394ms +0.02% name old user-time/op new user-time/op delta Template 236ms ± 5% 236ms ± 4% ~ (p=0.934 n=50+50) Unicode 132ms ± 7% 130ms ± 9% ~ (p=0.087 n=50+50) GoTypes 861ms ± 3% 867ms ± 4% ~ (p=0.124 n=48+50) Compiler 3.93s ± 4% 3.94s ± 3% ~ (p=0.584 n=49+44) SSA 12.2s ± 2% 12.3s ± 1% ~ (p=0.610 n=46+45) Flate 149ms ± 4% 150ms ± 4% ~ (p=0.194 n=48+49) GoParser 193ms ± 5% 191ms ± 6% ~ (p=0.239 n=49+50) Reflect 553ms ± 5% 556ms ± 5% ~ (p=0.091 n=49+49) Tar 218ms ± 5% 218ms ± 5% ~ (p=0.359 n=49+50) XML 299ms ± 5% 298ms ± 4% ~ (p=0.482 n=50+49) [Geo mean] 516ms 516ms -0.01% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% -0.02% (p=0.000 n=49+49) Unicode 29.7MB ± 0% 29.7MB ± 0% ~ (p=0.270 n=50+50) GoTypes 126MB ± 0% 126MB ± 0% -0.34% (p=0.000 n=50+49) Compiler 534MB ± 0% 531MB ± 0% -0.50% (p=0.000 n=50+50) SSA 1.98GB ± 0% 1.98GB ± 0% -0.06% (p=0.000 n=49+49) Flate 24.6MB ± 0% 24.6MB ± 0% -0.29% (p=0.000 n=50+50) GoParser 29.5MB ± 0% 29.4MB ± 0% -0.15% (p=0.000 n=49+50) Reflect 87.3MB ± 0% 87.2MB ± 0% -0.13% (p=0.000 n=49+50) Tar 35.6MB ± 0% 35.5MB ± 0% -0.17% (p=0.000 n=50+50) XML 48.2MB ± 0% 48.0MB ± 0% -0.30% (p=0.000 n=48+50) [Geo mean] 83.1MB 82.9MB -0.20% name old allocs/op new allocs/op delta Template 352k ± 0% 352k ± 0% -0.01% (p=0.004 n=49+49) Unicode 341k ± 0% 341k ± 0% ~ (p=0.341 n=48+50) GoTypes 1.28M ± 0% 1.28M ± 0% -0.03% (p=0.000 n=50+49) Compiler 4.96M ± 0% 4.96M ± 0% -0.05% (p=0.000 n=50+49) SSA 15.5M ± 0% 15.5M ± 0% -0.01% (p=0.000 n=50+49) Flate 233k ± 0% 233k ± 0% +0.01% (p=0.032 n=49+49) GoParser 294k ± 0% 294k ± 0% ~ (p=0.052 n=46+48) Reflect 1.04M ± 0% 1.04M ± 0% ~ (p=0.171 n=50+47) Tar 343k ± 0% 343k ± 0% -0.03% (p=0.000 n=50+50) XML 429k ± 0% 429k ± 0% -0.04% (p=0.000 n=50+50) [Geo mean] 812k 812k -0.02% Object files grow slightly; branchelim often increases binary size, at least on amd64. name old object-bytes new object-bytes delta Template 509kB ± 0% 509kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.84MB ± 0% 1.84MB ± 0% +0.00% (p=0.008 n=5+5) Compiler 6.71MB ± 0% 6.71MB ± 0% +0.01% (p=0.008 n=5+5) SSA 21.2MB ± 0% 21.2MB ± 0% +0.01% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% -0.00% (p=0.008 n=5+5) GoParser 404kB ± 0% 404kB ± 0% -0.02% (p=0.008 n=5+5) Reflect 1.40MB ± 0% 1.40MB ± 0% +0.09% (p=0.008 n=5+5) Tar 452kB ± 0% 452kB ± 0% +0.06% (p=0.008 n=5+5) XML 596kB ± 0% 596kB ± 0% +0.00% (p=0.008 n=5+5) [Geo mean] 1.04MB 1.04MB +0.01% Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f Reviewed-on: https://go-review.googlesource.com/c/129378 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-27 09:03:45 -07:00
// fusePlain runs fuse(f, fuseTypePlain).
func fusePlain(f *Func) { fuse(f, fuseTypePlain) }
// fuseAll runs fuse(f, fuseTypeAll).
func fuseAll(f *Func) { fuse(f, fuseTypeAll) }
type fuseType uint8
const (
fuseTypePlain fuseType = 1 << iota
fuseTypeIf
fuseTypeAll = fuseTypePlain | fuseTypeIf
)
// fuse simplifies control flow by joining basic blocks.
cmd/compile: fuse before branchelim The branchelim pass works better after fuse. Running fuse before branchelim also increases the stability of generated code amidst other compiler changes, which was the original motivation behind this change. The fuse pass is not cheap enough to run in its entirety before branchelim, but the most important half of it is. This change makes it possible to run "plain fuse" independently and does so before branchelim. During make.bash, elimIf occurrences increase from 4244 to 4288 (1%), and elimIfElse occurrences increase from 989 to 1079 (9%). Toolspeed impact is marginal; plain fuse pays for itself. name old time/op new time/op delta Template 189ms ± 2% 189ms ± 2% ~ (p=0.890 n=45+46) Unicode 93.2ms ± 5% 93.4ms ± 7% ~ (p=0.790 n=48+48) GoTypes 662ms ± 4% 660ms ± 4% ~ (p=0.186 n=48+49) Compiler 2.89s ± 4% 2.91s ± 3% +0.89% (p=0.050 n=49+44) SSA 8.23s ± 2% 8.21s ± 1% ~ (p=0.165 n=46+44) Flate 123ms ± 4% 123ms ± 3% +0.58% (p=0.031 n=47+49) GoParser 154ms ± 4% 154ms ± 4% ~ (p=0.492 n=49+48) Reflect 430ms ± 4% 429ms ± 4% ~ (p=1.000 n=48+48) Tar 171ms ± 3% 170ms ± 4% ~ (p=0.122 n=48+48) XML 232ms ± 3% 232ms ± 2% ~ (p=0.850 n=46+49) [Geo mean] 394ms 394ms +0.02% name old user-time/op new user-time/op delta Template 236ms ± 5% 236ms ± 4% ~ (p=0.934 n=50+50) Unicode 132ms ± 7% 130ms ± 9% ~ (p=0.087 n=50+50) GoTypes 861ms ± 3% 867ms ± 4% ~ (p=0.124 n=48+50) Compiler 3.93s ± 4% 3.94s ± 3% ~ (p=0.584 n=49+44) SSA 12.2s ± 2% 12.3s ± 1% ~ (p=0.610 n=46+45) Flate 149ms ± 4% 150ms ± 4% ~ (p=0.194 n=48+49) GoParser 193ms ± 5% 191ms ± 6% ~ (p=0.239 n=49+50) Reflect 553ms ± 5% 556ms ± 5% ~ (p=0.091 n=49+49) Tar 218ms ± 5% 218ms ± 5% ~ (p=0.359 n=49+50) XML 299ms ± 5% 298ms ± 4% ~ (p=0.482 n=50+49) [Geo mean] 516ms 516ms -0.01% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% -0.02% (p=0.000 n=49+49) Unicode 29.7MB ± 0% 29.7MB ± 0% ~ (p=0.270 n=50+50) GoTypes 126MB ± 0% 126MB ± 0% -0.34% (p=0.000 n=50+49) Compiler 534MB ± 0% 531MB ± 0% -0.50% (p=0.000 n=50+50) SSA 1.98GB ± 0% 1.98GB ± 0% -0.06% (p=0.000 n=49+49) Flate 24.6MB ± 0% 24.6MB ± 0% -0.29% (p=0.000 n=50+50) GoParser 29.5MB ± 0% 29.4MB ± 0% -0.15% (p=0.000 n=49+50) Reflect 87.3MB ± 0% 87.2MB ± 0% -0.13% (p=0.000 n=49+50) Tar 35.6MB ± 0% 35.5MB ± 0% -0.17% (p=0.000 n=50+50) XML 48.2MB ± 0% 48.0MB ± 0% -0.30% (p=0.000 n=48+50) [Geo mean] 83.1MB 82.9MB -0.20% name old allocs/op new allocs/op delta Template 352k ± 0% 352k ± 0% -0.01% (p=0.004 n=49+49) Unicode 341k ± 0% 341k ± 0% ~ (p=0.341 n=48+50) GoTypes 1.28M ± 0% 1.28M ± 0% -0.03% (p=0.000 n=50+49) Compiler 4.96M ± 0% 4.96M ± 0% -0.05% (p=0.000 n=50+49) SSA 15.5M ± 0% 15.5M ± 0% -0.01% (p=0.000 n=50+49) Flate 233k ± 0% 233k ± 0% +0.01% (p=0.032 n=49+49) GoParser 294k ± 0% 294k ± 0% ~ (p=0.052 n=46+48) Reflect 1.04M ± 0% 1.04M ± 0% ~ (p=0.171 n=50+47) Tar 343k ± 0% 343k ± 0% -0.03% (p=0.000 n=50+50) XML 429k ± 0% 429k ± 0% -0.04% (p=0.000 n=50+50) [Geo mean] 812k 812k -0.02% Object files grow slightly; branchelim often increases binary size, at least on amd64. name old object-bytes new object-bytes delta Template 509kB ± 0% 509kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.84MB ± 0% 1.84MB ± 0% +0.00% (p=0.008 n=5+5) Compiler 6.71MB ± 0% 6.71MB ± 0% +0.01% (p=0.008 n=5+5) SSA 21.2MB ± 0% 21.2MB ± 0% +0.01% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% -0.00% (p=0.008 n=5+5) GoParser 404kB ± 0% 404kB ± 0% -0.02% (p=0.008 n=5+5) Reflect 1.40MB ± 0% 1.40MB ± 0% +0.09% (p=0.008 n=5+5) Tar 452kB ± 0% 452kB ± 0% +0.06% (p=0.008 n=5+5) XML 596kB ± 0% 596kB ± 0% +0.00% (p=0.008 n=5+5) [Geo mean] 1.04MB 1.04MB +0.01% Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f Reviewed-on: https://go-review.googlesource.com/c/129378 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-27 09:03:45 -07:00
func fuse(f *Func, typ fuseType) {
for changed := true; changed; {
changed = false
cmd/compile: fuse from end to beginning fuseBlockPlain was accidentally quadratic. If you had plain blocks b1 -> b2 -> b3 -> b4, each containing single values v1, v2, v3, and v4 respectively, fuseBlockPlain would move v1 from b1 to b2 to b3 to b4, then v2 from b2 to b3 to b4, etc. There are two obvious fixes. * Look for runs of blocks in fuseBlockPlain and handle them in a single go. * Fuse from end to beginning; any given value in a run of blocks to fuse then moves only once. The latter is much simpler, so that's what this CL does. Somewhat surprisingly, this change does not pass toolstash-check. The resulting set of blocks is the same, and the values in them are the same, but the order of values in them differ, and that order of values (while arbitrary) is enough to change the compiler's output. This may be due to #20178; deadstore is the next pass after fuse. Adding basic sorting to the beginning of deadstore is enough to make this CL pass toolstash-check: for _, b := range f.Blocks { obj.SortSlice(b.Values, func(i, j int) bool { return b.Values[i].ID < b.Values[j].ID }) } Happily, this CL appears to result in better code on average, if only by accident. It cuts 4k off of cmd/go; go1 benchmarks are noisy as always but don't regress (numbers below). No impact on the standard compilebench benchmarks. For the code in #13554, this speeds up compilation dramatically: name old time/op new time/op delta Pkg 53.1s ± 2% 12.8s ± 3% -75.92% (p=0.008 n=5+5) name old user-time/op new user-time/op delta Pkg 55.0s ± 2% 14.9s ± 3% -73.00% (p=0.008 n=5+5) name old alloc/op new alloc/op delta Pkg 2.04GB ± 0% 2.04GB ± 0% +0.18% (p=0.008 n=5+5) name old allocs/op new allocs/op delta Pkg 6.21M ± 0% 6.21M ± 0% ~ (p=0.222 n=5+5) name old object-bytes new object-bytes delta Pkg 28.4M ± 0% 28.4M ± 0% +0.00% (p=0.008 n=5+5) name old export-bytes new export-bytes delta Pkg 208 ± 0% 208 ± 0% ~ (all equal) Updates #13554 go1 benchmarks: name old time/op new time/op delta BinaryTree17-8 2.29s ± 2% 2.26s ± 2% -1.43% (p=0.000 n=48+50) Fannkuch11-8 2.74s ± 2% 2.79s ± 2% +1.63% (p=0.000 n=50+49) FmtFprintfEmpty-8 36.6ns ± 3% 34.6ns ± 4% -5.29% (p=0.000 n=49+50) FmtFprintfString-8 58.3ns ± 3% 59.1ns ± 3% +1.35% (p=0.000 n=50+49) FmtFprintfInt-8 62.4ns ± 2% 63.2ns ± 3% +1.19% (p=0.000 n=49+49) FmtFprintfIntInt-8 95.1ns ± 2% 96.7ns ± 3% +1.61% (p=0.000 n=49+50) FmtFprintfPrefixedInt-8 118ns ± 3% 113ns ± 2% -4.00% (p=0.000 n=50+49) FmtFprintfFloat-8 191ns ± 2% 192ns ± 2% +0.40% (p=0.034 n=50+50) FmtManyArgs-8 419ns ± 2% 420ns ± 2% ~ (p=0.228 n=49+49) GobDecode-8 5.26ms ± 3% 5.19ms ± 2% -1.33% (p=0.000 n=50+49) GobEncode-8 4.12ms ± 2% 4.15ms ± 3% +0.68% (p=0.007 n=49+50) Gzip-8 198ms ± 2% 197ms ± 2% -0.50% (p=0.018 n=48+48) Gunzip-8 31.9ms ± 3% 31.8ms ± 3% -0.47% (p=0.024 n=50+50) HTTPClientServer-8 64.4µs ± 0% 64.0µs ± 0% -0.55% (p=0.000 n=43+46) JSONEncode-8 10.6ms ± 2% 10.6ms ± 3% ~ (p=0.543 n=49+49) JSONDecode-8 43.3ms ± 3% 43.1ms ± 2% ~ (p=0.079 n=50+50) Mandelbrot200-8 3.70ms ± 2% 3.70ms ± 2% ~ (p=0.553 n=47+50) GoParse-8 2.70ms ± 2% 2.71ms ± 3% ~ (p=0.843 n=49+50) RegexpMatchEasy0_32-8 70.5ns ± 4% 70.4ns ± 4% ~ (p=0.867 n=48+50) RegexpMatchEasy0_1K-8 162ns ± 3% 162ns ± 2% ~ (p=0.739 n=48+48) RegexpMatchEasy1_32-8 66.1ns ± 5% 66.2ns ± 4% ~ (p=0.970 n=50+50) RegexpMatchEasy1_1K-8 297ns ± 7% 296ns ± 7% ~ (p=0.406 n=50+50) RegexpMatchMedium_32-8 105ns ± 5% 105ns ± 5% ~ (p=0.702 n=50+50) RegexpMatchMedium_1K-8 32.3µs ± 4% 32.2µs ± 3% ~ (p=0.614 n=49+49) RegexpMatchHard_32-8 1.75µs ±18% 1.74µs ±12% ~ (p=0.738 n=50+48) RegexpMatchHard_1K-8 52.2µs ±14% 51.3µs ±13% ~ (p=0.230 n=50+50) Revcomp-8 366ms ± 3% 367ms ± 3% ~ (p=0.745 n=49+49) Template-8 48.5ms ± 4% 48.5ms ± 4% ~ (p=0.824 n=50+48) TimeParse-8 263ns ± 2% 256ns ± 2% -2.98% (p=0.000 n=48+49) TimeFormat-8 265ns ± 3% 262ns ± 3% -1.35% (p=0.000 n=48+49) [Geo mean] 41.1µs 40.9µs -0.48% Change-Id: Ib35fa15b54282abb39c077d150beee27f610891a Reviewed-on: https://go-review.googlesource.com/43570 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com> Reviewed-by: Keith Randall <khr@golang.org>
2017-05-16 21:57:18 -07:00
// Fuse from end to beginning, to avoid quadratic behavior in fuseBlockPlain. See issue 13554.
for i := len(f.Blocks) - 1; i >= 0; i-- {
b := f.Blocks[i]
cmd/compile: fuse before branchelim The branchelim pass works better after fuse. Running fuse before branchelim also increases the stability of generated code amidst other compiler changes, which was the original motivation behind this change. The fuse pass is not cheap enough to run in its entirety before branchelim, but the most important half of it is. This change makes it possible to run "plain fuse" independently and does so before branchelim. During make.bash, elimIf occurrences increase from 4244 to 4288 (1%), and elimIfElse occurrences increase from 989 to 1079 (9%). Toolspeed impact is marginal; plain fuse pays for itself. name old time/op new time/op delta Template 189ms ± 2% 189ms ± 2% ~ (p=0.890 n=45+46) Unicode 93.2ms ± 5% 93.4ms ± 7% ~ (p=0.790 n=48+48) GoTypes 662ms ± 4% 660ms ± 4% ~ (p=0.186 n=48+49) Compiler 2.89s ± 4% 2.91s ± 3% +0.89% (p=0.050 n=49+44) SSA 8.23s ± 2% 8.21s ± 1% ~ (p=0.165 n=46+44) Flate 123ms ± 4% 123ms ± 3% +0.58% (p=0.031 n=47+49) GoParser 154ms ± 4% 154ms ± 4% ~ (p=0.492 n=49+48) Reflect 430ms ± 4% 429ms ± 4% ~ (p=1.000 n=48+48) Tar 171ms ± 3% 170ms ± 4% ~ (p=0.122 n=48+48) XML 232ms ± 3% 232ms ± 2% ~ (p=0.850 n=46+49) [Geo mean] 394ms 394ms +0.02% name old user-time/op new user-time/op delta Template 236ms ± 5% 236ms ± 4% ~ (p=0.934 n=50+50) Unicode 132ms ± 7% 130ms ± 9% ~ (p=0.087 n=50+50) GoTypes 861ms ± 3% 867ms ± 4% ~ (p=0.124 n=48+50) Compiler 3.93s ± 4% 3.94s ± 3% ~ (p=0.584 n=49+44) SSA 12.2s ± 2% 12.3s ± 1% ~ (p=0.610 n=46+45) Flate 149ms ± 4% 150ms ± 4% ~ (p=0.194 n=48+49) GoParser 193ms ± 5% 191ms ± 6% ~ (p=0.239 n=49+50) Reflect 553ms ± 5% 556ms ± 5% ~ (p=0.091 n=49+49) Tar 218ms ± 5% 218ms ± 5% ~ (p=0.359 n=49+50) XML 299ms ± 5% 298ms ± 4% ~ (p=0.482 n=50+49) [Geo mean] 516ms 516ms -0.01% name old alloc/op new alloc/op delta Template 36.3MB ± 0% 36.3MB ± 0% -0.02% (p=0.000 n=49+49) Unicode 29.7MB ± 0% 29.7MB ± 0% ~ (p=0.270 n=50+50) GoTypes 126MB ± 0% 126MB ± 0% -0.34% (p=0.000 n=50+49) Compiler 534MB ± 0% 531MB ± 0% -0.50% (p=0.000 n=50+50) SSA 1.98GB ± 0% 1.98GB ± 0% -0.06% (p=0.000 n=49+49) Flate 24.6MB ± 0% 24.6MB ± 0% -0.29% (p=0.000 n=50+50) GoParser 29.5MB ± 0% 29.4MB ± 0% -0.15% (p=0.000 n=49+50) Reflect 87.3MB ± 0% 87.2MB ± 0% -0.13% (p=0.000 n=49+50) Tar 35.6MB ± 0% 35.5MB ± 0% -0.17% (p=0.000 n=50+50) XML 48.2MB ± 0% 48.0MB ± 0% -0.30% (p=0.000 n=48+50) [Geo mean] 83.1MB 82.9MB -0.20% name old allocs/op new allocs/op delta Template 352k ± 0% 352k ± 0% -0.01% (p=0.004 n=49+49) Unicode 341k ± 0% 341k ± 0% ~ (p=0.341 n=48+50) GoTypes 1.28M ± 0% 1.28M ± 0% -0.03% (p=0.000 n=50+49) Compiler 4.96M ± 0% 4.96M ± 0% -0.05% (p=0.000 n=50+49) SSA 15.5M ± 0% 15.5M ± 0% -0.01% (p=0.000 n=50+49) Flate 233k ± 0% 233k ± 0% +0.01% (p=0.032 n=49+49) GoParser 294k ± 0% 294k ± 0% ~ (p=0.052 n=46+48) Reflect 1.04M ± 0% 1.04M ± 0% ~ (p=0.171 n=50+47) Tar 343k ± 0% 343k ± 0% -0.03% (p=0.000 n=50+50) XML 429k ± 0% 429k ± 0% -0.04% (p=0.000 n=50+50) [Geo mean] 812k 812k -0.02% Object files grow slightly; branchelim often increases binary size, at least on amd64. name old object-bytes new object-bytes delta Template 509kB ± 0% 509kB ± 0% -0.01% (p=0.008 n=5+5) Unicode 224kB ± 0% 224kB ± 0% ~ (all equal) GoTypes 1.84MB ± 0% 1.84MB ± 0% +0.00% (p=0.008 n=5+5) Compiler 6.71MB ± 0% 6.71MB ± 0% +0.01% (p=0.008 n=5+5) SSA 21.2MB ± 0% 21.2MB ± 0% +0.01% (p=0.008 n=5+5) Flate 324kB ± 0% 324kB ± 0% -0.00% (p=0.008 n=5+5) GoParser 404kB ± 0% 404kB ± 0% -0.02% (p=0.008 n=5+5) Reflect 1.40MB ± 0% 1.40MB ± 0% +0.09% (p=0.008 n=5+5) Tar 452kB ± 0% 452kB ± 0% +0.06% (p=0.008 n=5+5) XML 596kB ± 0% 596kB ± 0% +0.00% (p=0.008 n=5+5) [Geo mean] 1.04MB 1.04MB +0.01% Change-Id: I535c711b85380ff657fc0f022bebd9cb14ddd07f Reviewed-on: https://go-review.googlesource.com/c/129378 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
2018-05-27 09:03:45 -07:00
if typ&fuseTypeIf != 0 {
changed = fuseBlockIf(b) || changed
}
if typ&fuseTypePlain != 0 {
changed = fuseBlockPlain(b) || changed
}
}
if changed {
f.invalidateCFG()
}
}
}
// fuseBlockIf handles the following cases where s0 and s1 are empty blocks.
//
// b b b b
// / \ | \ / | | |
// s0 s1 | s1 s0 | | |
// \ / | / \ | | |
// ss ss ss ss
//
// If all Phi ops in ss have identical variables for slots corresponding to
// s0, s1 and b then the branch can be dropped.
// This optimization often comes up in switch statements with multiple
// expressions in a case clause:
// switch n {
// case 1,2,3: return 4
// }
// TODO: If ss doesn't contain any OpPhis, are s0 and s1 dead code anyway.
func fuseBlockIf(b *Block) bool {
if b.Kind != BlockIf {
return false
}
var ss0, ss1 *Block
s0 := b.Succs[0].b
i0 := b.Succs[0].i
cmd/compile: process blocks containing only dead values in fuseIf The code in #29218 resulted in an If block containing only its control. That block was then converted by fuseIf into a plain block; as a result, that control value was dead. However, the control value was still present in b.Values. This prevented further fusing of that block. This change beefs up the check in fuseIf to allow fusing blocks that contain only dead values (if any). In the case of #29218, this enables enough extra fusing that the control value could be eliminated, allowing all values in turn to be eliminated. This change also fuses 34 new blocks during make.bash. It is not clear that this fixes every variant of #29218, but it is a reasonable standalone change. And code like #29218 is rare and fundamentally buggy, so we can handle new instances if/when they actually occur. Fixes #29218 Negligible toolspeed impact. name old time/op new time/op delta Template 213ms ± 3% 213ms ± 2% ~ (p=0.914 n=97+88) Unicode 89.8ms ± 2% 89.6ms ± 2% -0.22% (p=0.045 n=93+95) GoTypes 712ms ± 3% 709ms ± 2% -0.35% (p=0.023 n=95+95) Compiler 3.24s ± 2% 3.23s ± 2% -0.30% (p=0.020 n=98+97) SSA 10.0s ± 1% 10.0s ± 1% ~ (p=0.382 n=98+99) Flate 135ms ± 3% 135ms ± 2% ~ (p=0.983 n=98+98) GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.170 n=99+99) Reflect 447ms ± 3% 447ms ± 2% ~ (p=0.538 n=98+89) Tar 189ms ± 2% 189ms ± 3% ~ (p=0.874 n=95+96) XML 251ms ± 2% 251ms ± 2% ~ (p=0.434 n=94+96) [Geo mean] 427ms 426ms -0.15% name old user-time/op new user-time/op delta Template 264ms ± 2% 265ms ± 2% ~ (p=0.075 n=96+90) Unicode 119ms ± 6% 119ms ± 7% ~ (p=0.864 n=99+98) GoTypes 926ms ± 2% 924ms ± 2% ~ (p=0.071 n=94+94) Compiler 4.38s ± 2% 4.37s ± 2% -0.34% (p=0.001 n=98+97) SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.693 n=90+93) Flate 162ms ± 3% 161ms ± 2% ~ (p=0.163 n=99+99) GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.130 n=96+100) Reflect 572ms ± 3% 572ms ± 2% ~ (p=0.608 n=97+97) Tar 239ms ± 2% 239ms ± 3% ~ (p=0.999 n=93+91) XML 302ms ± 2% 302ms ± 2% ~ (p=0.627 n=91+97) [Geo mean] 540ms 540ms -0.08% file before after Δ % asm 4862704 4858608 -4096 -0.084% compile 24001568 24001680 +112 +0.000% total 132520780 132516796 -3984 -0.003% file before after Δ % cmd/compile/internal/gc.a 8887638 8887596 -42 -0.000% cmd/compile/internal/ssa.a 29995056 29998986 +3930 +0.013% cmd/internal/obj/wasm.a 209444 203652 -5792 -2.765% total 129471798 129469894 -1904 -0.001% Change-Id: I2d18f9278e68b9766058ae8ca621e844f9d89dd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/177140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2019-05-14 10:11:23 -07:00
if s0.Kind != BlockPlain || len(s0.Preds) != 1 || !isEmpty(s0) {
s0, ss0 = b, s0
} else {
ss0 = s0.Succs[0].b
i0 = s0.Succs[0].i
}
s1 := b.Succs[1].b
i1 := b.Succs[1].i
cmd/compile: process blocks containing only dead values in fuseIf The code in #29218 resulted in an If block containing only its control. That block was then converted by fuseIf into a plain block; as a result, that control value was dead. However, the control value was still present in b.Values. This prevented further fusing of that block. This change beefs up the check in fuseIf to allow fusing blocks that contain only dead values (if any). In the case of #29218, this enables enough extra fusing that the control value could be eliminated, allowing all values in turn to be eliminated. This change also fuses 34 new blocks during make.bash. It is not clear that this fixes every variant of #29218, but it is a reasonable standalone change. And code like #29218 is rare and fundamentally buggy, so we can handle new instances if/when they actually occur. Fixes #29218 Negligible toolspeed impact. name old time/op new time/op delta Template 213ms ± 3% 213ms ± 2% ~ (p=0.914 n=97+88) Unicode 89.8ms ± 2% 89.6ms ± 2% -0.22% (p=0.045 n=93+95) GoTypes 712ms ± 3% 709ms ± 2% -0.35% (p=0.023 n=95+95) Compiler 3.24s ± 2% 3.23s ± 2% -0.30% (p=0.020 n=98+97) SSA 10.0s ± 1% 10.0s ± 1% ~ (p=0.382 n=98+99) Flate 135ms ± 3% 135ms ± 2% ~ (p=0.983 n=98+98) GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.170 n=99+99) Reflect 447ms ± 3% 447ms ± 2% ~ (p=0.538 n=98+89) Tar 189ms ± 2% 189ms ± 3% ~ (p=0.874 n=95+96) XML 251ms ± 2% 251ms ± 2% ~ (p=0.434 n=94+96) [Geo mean] 427ms 426ms -0.15% name old user-time/op new user-time/op delta Template 264ms ± 2% 265ms ± 2% ~ (p=0.075 n=96+90) Unicode 119ms ± 6% 119ms ± 7% ~ (p=0.864 n=99+98) GoTypes 926ms ± 2% 924ms ± 2% ~ (p=0.071 n=94+94) Compiler 4.38s ± 2% 4.37s ± 2% -0.34% (p=0.001 n=98+97) SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.693 n=90+93) Flate 162ms ± 3% 161ms ± 2% ~ (p=0.163 n=99+99) GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.130 n=96+100) Reflect 572ms ± 3% 572ms ± 2% ~ (p=0.608 n=97+97) Tar 239ms ± 2% 239ms ± 3% ~ (p=0.999 n=93+91) XML 302ms ± 2% 302ms ± 2% ~ (p=0.627 n=91+97) [Geo mean] 540ms 540ms -0.08% file before after Δ % asm 4862704 4858608 -4096 -0.084% compile 24001568 24001680 +112 +0.000% total 132520780 132516796 -3984 -0.003% file before after Δ % cmd/compile/internal/gc.a 8887638 8887596 -42 -0.000% cmd/compile/internal/ssa.a 29995056 29998986 +3930 +0.013% cmd/internal/obj/wasm.a 209444 203652 -5792 -2.765% total 129471798 129469894 -1904 -0.001% Change-Id: I2d18f9278e68b9766058ae8ca621e844f9d89dd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/177140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2019-05-14 10:11:23 -07:00
if s1.Kind != BlockPlain || len(s1.Preds) != 1 || !isEmpty(s1) {
s1, ss1 = b, s1
} else {
ss1 = s1.Succs[0].b
i1 = s1.Succs[0].i
}
if ss0 != ss1 {
return false
}
ss := ss0
// s0 and s1 are equal with b if the corresponding block is missing
// (2nd, 3rd and 4th case in the figure).
for _, v := range ss.Values {
if v.Op == OpPhi && v.Uses > 0 && v.Args[i0] != v.Args[i1] {
return false
}
}
// Now we have two of following b->ss, b->s0->ss and b->s1->ss,
// with s0 and s1 empty if exist.
// We can replace it with b->ss without if all OpPhis in ss
// have identical predecessors (verified above).
// No critical edge is introduced because b will have one successor.
if s0 != b && s1 != b {
// Replace edge b->s0->ss with b->ss.
// We need to keep a slot for Phis corresponding to b.
b.Succs[0] = Edge{ss, i0}
ss.Preds[i0] = Edge{b, 0}
b.removeEdge(1)
s1.removeEdge(0)
} else if s0 != b {
b.removeEdge(0)
s0.removeEdge(0)
} else if s1 != b {
b.removeEdge(1)
s1.removeEdge(0)
} else {
b.removeEdge(1)
}
b.Kind = BlockPlain
b.Likely = BranchUnknown
cmd/compile: allow multiple SSA block control values Control values are used to choose which successor of a block is jumped to. Typically a control value takes the form of a 'flags' value that represents the result of a comparison. Some architectures however use a variable in a register as a control value. Up until now we have managed with a single control value per block. However some architectures (e.g. s390x and riscv64) have combined compare-and-branch instructions that take two variables in registers as parameters. To generate these instructions we need to support 2 control values per block. This CL allows up to 2 control values to be used in a block in order to support the addition of compare-and-branch instructions. I have implemented s390x compare-and-branch instructions in a different CL. Passes toolstash-check -all. Results of compilebench: name old time/op new time/op delta Template 208ms ± 1% 209ms ± 1% ~ (p=0.289 n=20+20) Unicode 83.7ms ± 1% 83.3ms ± 3% -0.49% (p=0.017 n=18+18) GoTypes 748ms ± 1% 748ms ± 0% ~ (p=0.460 n=20+18) Compiler 3.47s ± 1% 3.48s ± 1% ~ (p=0.070 n=19+18) SSA 11.5s ± 1% 11.7s ± 1% +1.64% (p=0.000 n=19+18) Flate 130ms ± 1% 130ms ± 1% ~ (p=0.588 n=19+20) GoParser 160ms ± 1% 161ms ± 1% ~ (p=0.211 n=20+20) Reflect 465ms ± 1% 467ms ± 1% +0.42% (p=0.007 n=20+20) Tar 184ms ± 1% 185ms ± 2% ~ (p=0.087 n=18+20) XML 253ms ± 1% 253ms ± 1% ~ (p=0.377 n=20+18) LinkCompiler 769ms ± 2% 774ms ± 2% ~ (p=0.070 n=19+19) ExternalLinkCompiler 3.59s ±11% 3.68s ± 6% ~ (p=0.072 n=20+20) LinkWithoutDebugCompiler 446ms ± 5% 454ms ± 3% +1.79% (p=0.002 n=19+20) StdCmd 26.0s ± 2% 26.0s ± 2% ~ (p=0.799 n=20+20) name old user-time/op new user-time/op delta Template 238ms ± 5% 240ms ± 5% ~ (p=0.142 n=20+20) Unicode 105ms ±11% 106ms ±10% ~ (p=0.512 n=20+20) GoTypes 876ms ± 2% 873ms ± 4% ~ (p=0.647 n=20+19) Compiler 4.17s ± 2% 4.19s ± 1% ~ (p=0.093 n=20+18) SSA 13.9s ± 1% 14.1s ± 1% +1.45% (p=0.000 n=18+18) Flate 145ms ±13% 146ms ± 5% ~ (p=0.851 n=20+18) GoParser 185ms ± 5% 188ms ± 7% ~ (p=0.174 n=20+20) Reflect 534ms ± 3% 538ms ± 2% ~ (p=0.105 n=20+18) Tar 215ms ± 4% 211ms ± 9% ~ (p=0.079 n=19+20) XML 295ms ± 6% 295ms ± 5% ~ (p=0.968 n=20+20) LinkCompiler 832ms ± 4% 837ms ± 7% ~ (p=0.707 n=17+20) ExternalLinkCompiler 1.58s ± 8% 1.60s ± 4% ~ (p=0.296 n=20+19) LinkWithoutDebugCompiler 478ms ±12% 489ms ±10% ~ (p=0.429 n=20+20) name old object-bytes new object-bytes delta Template 559kB ± 0% 559kB ± 0% ~ (all equal) Unicode 216kB ± 0% 216kB ± 0% ~ (all equal) GoTypes 2.03MB ± 0% 2.03MB ± 0% ~ (all equal) Compiler 8.07MB ± 0% 8.07MB ± 0% -0.06% (p=0.000 n=20+20) SSA 27.1MB ± 0% 27.3MB ± 0% +0.89% (p=0.000 n=20+20) Flate 343kB ± 0% 343kB ± 0% ~ (all equal) GoParser 441kB ± 0% 441kB ± 0% ~ (all equal) Reflect 1.36MB ± 0% 1.36MB ± 0% ~ (all equal) Tar 487kB ± 0% 487kB ± 0% ~ (all equal) XML 632kB ± 0% 632kB ± 0% ~ (all equal) name old export-bytes new export-bytes delta Template 18.5kB ± 0% 18.5kB ± 0% ~ (all equal) Unicode 7.92kB ± 0% 7.92kB ± 0% ~ (all equal) GoTypes 35.0kB ± 0% 35.0kB ± 0% ~ (all equal) Compiler 109kB ± 0% 110kB ± 0% +0.72% (p=0.000 n=20+20) SSA 137kB ± 0% 138kB ± 0% +0.58% (p=0.000 n=20+20) Flate 4.89kB ± 0% 4.89kB ± 0% ~ (all equal) GoParser 8.49kB ± 0% 8.49kB ± 0% ~ (all equal) Reflect 11.4kB ± 0% 11.4kB ± 0% ~ (all equal) Tar 10.5kB ± 0% 10.5kB ± 0% ~ (all equal) XML 16.7kB ± 0% 16.7kB ± 0% ~ (all equal) name old text-bytes new text-bytes delta HelloSize 761kB ± 0% 761kB ± 0% ~ (all equal) CmdGoSize 10.8MB ± 0% 10.8MB ± 0% ~ (all equal) name old data-bytes new data-bytes delta HelloSize 10.7kB ± 0% 10.7kB ± 0% ~ (all equal) CmdGoSize 312kB ± 0% 312kB ± 0% ~ (all equal) name old bss-bytes new bss-bytes delta HelloSize 122kB ± 0% 122kB ± 0% ~ (all equal) CmdGoSize 146kB ± 0% 146kB ± 0% ~ (all equal) name old exe-bytes new exe-bytes delta HelloSize 1.13MB ± 0% 1.13MB ± 0% ~ (all equal) CmdGoSize 15.1MB ± 0% 15.1MB ± 0% ~ (all equal) Change-Id: I3cc2f9829a109543d9a68be4a21775d2d3e9801f Reviewed-on: https://go-review.googlesource.com/c/go/+/196557 Run-TryBot: Michael Munday <mike.munday@ibm.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Daniel Martí <mvdan@mvdan.cc> Reviewed-by: Keith Randall <khr@golang.org>
2019-08-12 20:19:58 +01:00
b.ResetControls()
cmd/compile: process blocks containing only dead values in fuseIf The code in #29218 resulted in an If block containing only its control. That block was then converted by fuseIf into a plain block; as a result, that control value was dead. However, the control value was still present in b.Values. This prevented further fusing of that block. This change beefs up the check in fuseIf to allow fusing blocks that contain only dead values (if any). In the case of #29218, this enables enough extra fusing that the control value could be eliminated, allowing all values in turn to be eliminated. This change also fuses 34 new blocks during make.bash. It is not clear that this fixes every variant of #29218, but it is a reasonable standalone change. And code like #29218 is rare and fundamentally buggy, so we can handle new instances if/when they actually occur. Fixes #29218 Negligible toolspeed impact. name old time/op new time/op delta Template 213ms ± 3% 213ms ± 2% ~ (p=0.914 n=97+88) Unicode 89.8ms ± 2% 89.6ms ± 2% -0.22% (p=0.045 n=93+95) GoTypes 712ms ± 3% 709ms ± 2% -0.35% (p=0.023 n=95+95) Compiler 3.24s ± 2% 3.23s ± 2% -0.30% (p=0.020 n=98+97) SSA 10.0s ± 1% 10.0s ± 1% ~ (p=0.382 n=98+99) Flate 135ms ± 3% 135ms ± 2% ~ (p=0.983 n=98+98) GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.170 n=99+99) Reflect 447ms ± 3% 447ms ± 2% ~ (p=0.538 n=98+89) Tar 189ms ± 2% 189ms ± 3% ~ (p=0.874 n=95+96) XML 251ms ± 2% 251ms ± 2% ~ (p=0.434 n=94+96) [Geo mean] 427ms 426ms -0.15% name old user-time/op new user-time/op delta Template 264ms ± 2% 265ms ± 2% ~ (p=0.075 n=96+90) Unicode 119ms ± 6% 119ms ± 7% ~ (p=0.864 n=99+98) GoTypes 926ms ± 2% 924ms ± 2% ~ (p=0.071 n=94+94) Compiler 4.38s ± 2% 4.37s ± 2% -0.34% (p=0.001 n=98+97) SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.693 n=90+93) Flate 162ms ± 3% 161ms ± 2% ~ (p=0.163 n=99+99) GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.130 n=96+100) Reflect 572ms ± 3% 572ms ± 2% ~ (p=0.608 n=97+97) Tar 239ms ± 2% 239ms ± 3% ~ (p=0.999 n=93+91) XML 302ms ± 2% 302ms ± 2% ~ (p=0.627 n=91+97) [Geo mean] 540ms 540ms -0.08% file before after Δ % asm 4862704 4858608 -4096 -0.084% compile 24001568 24001680 +112 +0.000% total 132520780 132516796 -3984 -0.003% file before after Δ % cmd/compile/internal/gc.a 8887638 8887596 -42 -0.000% cmd/compile/internal/ssa.a 29995056 29998986 +3930 +0.013% cmd/internal/obj/wasm.a 209444 203652 -5792 -2.765% total 129471798 129469894 -1904 -0.001% Change-Id: I2d18f9278e68b9766058ae8ca621e844f9d89dd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/177140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2019-05-14 10:11:23 -07:00
// Trash the empty blocks s0 and s1.
blocks := [...]*Block{s0, s1}
for _, s := range &blocks {
if s == b {
continue
}
// Move any (dead) values in s0 or s1 to b,
// where they will be eliminated by the next deadcode pass.
for _, v := range s.Values {
v.Block = b
}
b.Values = append(b.Values, s.Values...)
// Clear s.
s.Kind = BlockInvalid
s.Values = nil
s.Succs = nil
s.Preds = nil
}
cmd/compile: process blocks containing only dead values in fuseIf The code in #29218 resulted in an If block containing only its control. That block was then converted by fuseIf into a plain block; as a result, that control value was dead. However, the control value was still present in b.Values. This prevented further fusing of that block. This change beefs up the check in fuseIf to allow fusing blocks that contain only dead values (if any). In the case of #29218, this enables enough extra fusing that the control value could be eliminated, allowing all values in turn to be eliminated. This change also fuses 34 new blocks during make.bash. It is not clear that this fixes every variant of #29218, but it is a reasonable standalone change. And code like #29218 is rare and fundamentally buggy, so we can handle new instances if/when they actually occur. Fixes #29218 Negligible toolspeed impact. name old time/op new time/op delta Template 213ms ± 3% 213ms ± 2% ~ (p=0.914 n=97+88) Unicode 89.8ms ± 2% 89.6ms ± 2% -0.22% (p=0.045 n=93+95) GoTypes 712ms ± 3% 709ms ± 2% -0.35% (p=0.023 n=95+95) Compiler 3.24s ± 2% 3.23s ± 2% -0.30% (p=0.020 n=98+97) SSA 10.0s ± 1% 10.0s ± 1% ~ (p=0.382 n=98+99) Flate 135ms ± 3% 135ms ± 2% ~ (p=0.983 n=98+98) GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.170 n=99+99) Reflect 447ms ± 3% 447ms ± 2% ~ (p=0.538 n=98+89) Tar 189ms ± 2% 189ms ± 3% ~ (p=0.874 n=95+96) XML 251ms ± 2% 251ms ± 2% ~ (p=0.434 n=94+96) [Geo mean] 427ms 426ms -0.15% name old user-time/op new user-time/op delta Template 264ms ± 2% 265ms ± 2% ~ (p=0.075 n=96+90) Unicode 119ms ± 6% 119ms ± 7% ~ (p=0.864 n=99+98) GoTypes 926ms ± 2% 924ms ± 2% ~ (p=0.071 n=94+94) Compiler 4.38s ± 2% 4.37s ± 2% -0.34% (p=0.001 n=98+97) SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.693 n=90+93) Flate 162ms ± 3% 161ms ± 2% ~ (p=0.163 n=99+99) GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.130 n=96+100) Reflect 572ms ± 3% 572ms ± 2% ~ (p=0.608 n=97+97) Tar 239ms ± 2% 239ms ± 3% ~ (p=0.999 n=93+91) XML 302ms ± 2% 302ms ± 2% ~ (p=0.627 n=91+97) [Geo mean] 540ms 540ms -0.08% file before after Δ % asm 4862704 4858608 -4096 -0.084% compile 24001568 24001680 +112 +0.000% total 132520780 132516796 -3984 -0.003% file before after Δ % cmd/compile/internal/gc.a 8887638 8887596 -42 -0.000% cmd/compile/internal/ssa.a 29995056 29998986 +3930 +0.013% cmd/internal/obj/wasm.a 209444 203652 -5792 -2.765% total 129471798 129469894 -1904 -0.001% Change-Id: I2d18f9278e68b9766058ae8ca621e844f9d89dd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/177140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2019-05-14 10:11:23 -07:00
return true
}
// isEmpty reports whether b contains any live values.
// There may be false positives.
func isEmpty(b *Block) bool {
for _, v := range b.Values {
if v.Uses > 0 || v.Op.IsCall() || v.Op.HasSideEffects() || v.Type.IsVoid() {
cmd/compile: process blocks containing only dead values in fuseIf The code in #29218 resulted in an If block containing only its control. That block was then converted by fuseIf into a plain block; as a result, that control value was dead. However, the control value was still present in b.Values. This prevented further fusing of that block. This change beefs up the check in fuseIf to allow fusing blocks that contain only dead values (if any). In the case of #29218, this enables enough extra fusing that the control value could be eliminated, allowing all values in turn to be eliminated. This change also fuses 34 new blocks during make.bash. It is not clear that this fixes every variant of #29218, but it is a reasonable standalone change. And code like #29218 is rare and fundamentally buggy, so we can handle new instances if/when they actually occur. Fixes #29218 Negligible toolspeed impact. name old time/op new time/op delta Template 213ms ± 3% 213ms ± 2% ~ (p=0.914 n=97+88) Unicode 89.8ms ± 2% 89.6ms ± 2% -0.22% (p=0.045 n=93+95) GoTypes 712ms ± 3% 709ms ± 2% -0.35% (p=0.023 n=95+95) Compiler 3.24s ± 2% 3.23s ± 2% -0.30% (p=0.020 n=98+97) SSA 10.0s ± 1% 10.0s ± 1% ~ (p=0.382 n=98+99) Flate 135ms ± 3% 135ms ± 2% ~ (p=0.983 n=98+98) GoParser 158ms ± 2% 158ms ± 2% ~ (p=0.170 n=99+99) Reflect 447ms ± 3% 447ms ± 2% ~ (p=0.538 n=98+89) Tar 189ms ± 2% 189ms ± 3% ~ (p=0.874 n=95+96) XML 251ms ± 2% 251ms ± 2% ~ (p=0.434 n=94+96) [Geo mean] 427ms 426ms -0.15% name old user-time/op new user-time/op delta Template 264ms ± 2% 265ms ± 2% ~ (p=0.075 n=96+90) Unicode 119ms ± 6% 119ms ± 7% ~ (p=0.864 n=99+98) GoTypes 926ms ± 2% 924ms ± 2% ~ (p=0.071 n=94+94) Compiler 4.38s ± 2% 4.37s ± 2% -0.34% (p=0.001 n=98+97) SSA 13.4s ± 1% 13.4s ± 1% ~ (p=0.693 n=90+93) Flate 162ms ± 3% 161ms ± 2% ~ (p=0.163 n=99+99) GoParser 186ms ± 2% 186ms ± 3% ~ (p=0.130 n=96+100) Reflect 572ms ± 3% 572ms ± 2% ~ (p=0.608 n=97+97) Tar 239ms ± 2% 239ms ± 3% ~ (p=0.999 n=93+91) XML 302ms ± 2% 302ms ± 2% ~ (p=0.627 n=91+97) [Geo mean] 540ms 540ms -0.08% file before after Δ % asm 4862704 4858608 -4096 -0.084% compile 24001568 24001680 +112 +0.000% total 132520780 132516796 -3984 -0.003% file before after Δ % cmd/compile/internal/gc.a 8887638 8887596 -42 -0.000% cmd/compile/internal/ssa.a 29995056 29998986 +3930 +0.013% cmd/internal/obj/wasm.a 209444 203652 -5792 -2.765% total 129471798 129469894 -1904 -0.001% Change-Id: I2d18f9278e68b9766058ae8ca621e844f9d89dd8 Reviewed-on: https://go-review.googlesource.com/c/go/+/177140 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org> Reviewed-by: Cuong Manh Le <cuong.manhle.vn@gmail.com>
2019-05-14 10:11:23 -07:00
return false
}
}
return true
}
func fuseBlockPlain(b *Block) bool {
if b.Kind != BlockPlain {
return false
}
c := b.Succs[0].b
if len(c.Preds) != 1 {
return false
}
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
// If a block happened to end in a statement marker,
// try to preserve it.
if b.Pos.IsStmt() == src.PosIsStmt {
l := b.Pos.Line()
for _, v := range c.Values {
if v.Pos.IsStmt() == src.PosNotStmt {
continue
}
if l == v.Pos.Line() {
v.Pos = v.Pos.WithIsStmt()
l = 0
break
}
}
if l != 0 && c.Pos.Line() == l {
c.Pos = c.Pos.WithIsStmt()
}
}
// move all of b's values to c.
for _, v := range b.Values {
v.Block = c
}
// Use whichever value slice is larger, in the hopes of avoiding growth.
// However, take care to avoid c.Values pointing to b.valstorage.
// See golang.org/issue/18602.
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
// It's important to keep the elements in the same order; maintenance of
// debugging information depends on the order of *Values in Blocks.
// This can also cause changes in the order (which may affect other
// optimizations and possibly compiler output) for 32-vs-64 bit compilation
// platforms (word size affects allocation bucket size affects slice capacity).
if cap(c.Values) >= cap(b.Values) || len(b.Values) <= len(b.valstorage) {
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
bl := len(b.Values)
cl := len(c.Values)
var t []*Value // construct t = b.Values followed-by c.Values, but with attention to allocation.
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
if cap(c.Values) < bl+cl {
// reallocate
t = make([]*Value, bl+cl)
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
} else {
// in place.
t = c.Values[0 : bl+cl]
cmd/compile: assign and preserve statement boundaries. A new pass run after ssa building (before any other optimization) identifies the "first" ssa node for each statement. Other "noise" nodes are tagged as being never appropriate for a statement boundary (e.g., VarKill, VarDef, Phi). Rewrite, deadcode, cse, and nilcheck are modified to move the statement boundaries forward whenever possible if a boundary-tagged ssa value is removed; never-boundary nodes are ignored in this search (some operations involving constants are also tagged as never-boundary and also ignored because they are likely to be moved or removed during optimization). Code generation treats all nodes except those explicitly marked as statement boundaries as "not statement" nodes, and floats statement boundaries to the beginning of each same-line run of instructions found within a basic block. Line number html conversion was modified to make statement boundary nodes a bit more obvious by prepending a "+". The code in fuse.go that glued together the value slices of two blocks produced a result that depended on the former capacities (not lengths) of the two slices. This causes differences in the 386 bootstrap, and also can sometimes put values into an order that does a worse job of preserving statement boundaries when values are removed. Portions of two delve tests that had caught problems were incorporated into ssa/debug_test.go. There are some opportunities to do better with optimized code, but the next-ing is not lying or overly jumpy. Over 4 CLs, compilebench geomean measured binary size increase of 3.5% and compile user time increase of 3.8% (this is after optimization to reuse a sparse map instead of creating multiple maps.) This CL worsens the optimized-debugging experience with Delve; we need to work with the delve team so that they can use the is_stmt marks that we're emitting now. The reference output changes from time to time depending on other changes in the compiler, sometimes better, sometimes worse. This CL now includes a test ensuring that 99+% of the lines in the Go command itself (a handy optimized binary) include is_stmt markers. Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a Reviewed-on: https://go-review.googlesource.com/102435 Run-TryBot: David Chase <drchase@google.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Austin Clements <austin@google.com>
2018-03-23 22:46:06 -04:00
}
copy(t[bl:], c.Values) // possibly in-place
c.Values = t
copy(c.Values, b.Values)
} else {
c.Values = append(b.Values, c.Values...)
}
// replace b->c edge with preds(b) -> c
c.predstorage[0] = Edge{}
if len(b.Preds) > len(b.predstorage) {
c.Preds = b.Preds
} else {
c.Preds = append(c.predstorage[:0], b.Preds...)
}
for i, e := range c.Preds {
p := e.b
p.Succs[e.i] = Edge{c, i}
}
cmd/compile: improve tighten pass Move a value to the block which is the lowest common ancestor in the dominator tree of all of its uses. Make sure not to move a value into a loop. Makes the tighten pass on average (across go1 benchmarks) 40% slower. Still not a big contributor to overall compile time. Binary size is just a tad smaller. name old time/op new time/op delta BinaryTree17-12 2.77s ± 9% 2.76s ± 9% ~ (p=0.878 n=8+8) Fannkuch11-12 2.75s ± 1% 2.74s ± 1% ~ (p=0.232 n=8+7) FmtFprintfEmpty-12 48.9ns ± 9% 47.7ns ± 0% ~ (p=0.431 n=8+8) FmtFprintfString-12 143ns ± 8% 142ns ± 1% ~ (p=0.257 n=8+7) FmtFprintfInt-12 123ns ± 1% 122ns ± 1% -1.04% (p=0.026 n=7+8) FmtFprintfIntInt-12 195ns ± 7% 185ns ± 0% -5.32% (p=0.000 n=8+8) FmtFprintfPrefixedInt-12 194ns ± 4% 195ns ± 0% +0.81% (p=0.015 n=7+7) FmtFprintfFloat-12 267ns ± 0% 268ns ± 0% +0.37% (p=0.001 n=7+6) FmtManyArgs-12 800ns ± 0% 762ns ± 1% -4.78% (p=0.000 n=8+8) GobDecode-12 7.67ms ± 2% 7.60ms ± 2% ~ (p=0.234 n=8+8) GobEncode-12 6.55ms ± 0% 6.57ms ± 1% ~ (p=0.336 n=7+8) Gzip-12 237ms ± 0% 238ms ± 0% +0.40% (p=0.017 n=7+7) Gunzip-12 40.8ms ± 0% 40.2ms ± 0% -1.52% (p=0.000 n=7+8) HTTPClientServer-12 208µs ± 3% 209µs ± 3% ~ (p=0.955 n=8+7) JSONEncode-12 16.2ms ± 1% 17.2ms ±11% +5.80% (p=0.001 n=7+8) JSONDecode-12 57.3ms ±12% 55.5ms ± 3% ~ (p=0.867 n=8+7) Mandelbrot200-12 4.68ms ± 6% 4.46ms ± 1% ~ (p=0.442 n=8+8) GoParse-12 4.27ms ±44% 3.42ms ± 1% -19.95% (p=0.005 n=8+8) RegexpMatchEasy0_32-12 75.1ns ± 0% 75.8ns ± 1% +0.99% (p=0.002 n=7+7) RegexpMatchEasy0_1K-12 963ns ± 0% 1021ns ± 6% +5.98% (p=0.001 n=7+7) RegexpMatchEasy1_32-12 72.4ns ±11% 70.8ns ± 1% ~ (p=0.368 n=8+8) RegexpMatchEasy1_1K-12 394ns ± 1% 399ns ± 0% +1.23% (p=0.000 n=8+7) RegexpMatchMedium_32-12 114ns ± 0% 115ns ± 1% +0.63% (p=0.021 n=7+7) RegexpMatchMedium_1K-12 35.9µs ± 0% 37.6µs ± 1% +4.72% (p=0.000 n=7+8) RegexpMatchHard_32-12 1.93µs ± 2% 1.91µs ± 0% -0.91% (p=0.001 n=7+7) RegexpMatchHard_1K-12 60.2µs ± 3% 61.2µs ±10% ~ (p=0.442 n=8+8) Revcomp-12 404ms ± 1% 406ms ± 1% ~ (p=0.054 n=8+7) Template-12 64.6ms ± 1% 63.5ms ± 1% -1.66% (p=0.000 n=8+8) TimeParse-12 347ns ± 8% 309ns ± 0% -11.13% (p=0.000 n=8+7) TimeFormat-12 343ns ± 4% 331ns ± 0% -3.34% (p=0.000 n=8+7) Change-Id: Id6da1239ddd4d0cb074ff29cffb06302d1c6d08f Reviewed-on: https://go-review.googlesource.com/28712 Run-TryBot: Keith Randall <khr@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Chase <drchase@google.com>
2016-09-07 14:04:31 -07:00
f := b.Func
if f.Entry == b {
f.Entry = c
}
// trash b, just in case
b.Kind = BlockInvalid
b.Values = nil
b.Preds = nil
b.Succs = nil
return true
}