Commit graph

37 commits

Author SHA1 Message Date
Keith Randall
25e0a367da [dev.ssa] cmd/compile: clean up tuple types and selects
Make tuple types and their SelectX ops fully generic.
These ops no longer need to be lowered.
Regalloc understands them and their tuple-generating arguments.
We can now have opcodes returning arbitrary pairs of results.
(And it would be easy to move to >2 results if needed.)

Update arm implementation to the new standard.
Implement just enough in 386 port to do 64-bit add.

Change-Id: I370ed5aacce219c82e1954c61d1f63af76c16f79
Reviewed-on: https://go-review.googlesource.com/24976
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-07-18 16:11:36 +00:00
Cherry Zhang
8eadb89266 [dev.ssa] cmd/compile: move tuple selectors to generator's block in CSE
CSE may substitute a tuple generator with another one in a different
block. In this case, since we want tuple selectors to stay together
with the tuple generator, copy the selector to the new generator's
block and rewrite its use.

Op.isTupleGenerator and Op.isTupleSelector are introduced to assert
tuple ops. Use it in tighten as well.

Updates #15365.

Change-Id: Ia9e8c734b9cc3bc9fca4a2750041eef9cdfac5a5
Reviewed-on: https://go-review.googlesource.com/24137
Reviewed-by: David Chase <drchase@google.com>
2016-06-24 17:33:39 +00:00
Josh Bleecher Snyder
13a5b1faee cmd/compile: improve domorder documentation
domorder has some non-obvious useful properties
that we’re relying on in cse.
Document them and provide an argument that they hold.
While we’re here, do some minor renaming.

The argument is a re-working of a private email
exchange with Todd Neal and David Chase.

Change-Id: Ie154e0521bde642f5f11e67fc542c5eb938258be
Reviewed-on: https://go-review.googlesource.com/23449
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-05-26 20:01:24 +00:00
David Chase
6b99fb5bea cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.

Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)

Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs.  This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.

Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.

Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:

	 #blocks	#vars	    product
 max	6171	28180	173,898,780
99.9%	1641	 6548	 10,401,878
  99%	 463	 1909	    873,721
  95%	 152	  639	     95,235
  90%	  84	  359	     30,021

The old algorithm is indeed usually fastest, for 99%ile
values of usually.

The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.

With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	4m35.200s
user	13m16.644s
sys	0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles

With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	10m36.569s
user	25m52.286s
sys	4m3.696s
and pprof reports 8.3GB allocated in the same larger profile

With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).

Fixes #15537.

Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-16 21:08:05 +00:00
Keith Randall
b64c7fc683 cmd/compile: never CSE two memories
It never makes sense to CSE two ops that generate memory.
We might as well start those ops off in their own partition.

Fixes #15520

Change-Id: I0091ed51640f2c10cd0117f290b034dde7a86721
Reviewed-on: https://go-review.googlesource.com/22741
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-03 22:45:53 +00:00
Josh Bleecher Snyder
2563b6f9fe cmd/compile/internal/ssa: use Compare instead of Equal
They have different semantics.

Equal is stricter and is designed for the front-end.
Compare is looser and cheaper and is designed for the back-end.
To avoid possible regression, remove Equal from ssa.Type.

Updates #15043

Change-Id: Ie23ce75ff6b4d01b7982e0a89e6f81b5d099d8d6
Reviewed-on: https://go-review.googlesource.com/21483
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-17 04:50:45 +00:00
Todd Neal
77d374940e cmd/compile: speed up dom checking in cse
Process a slice of equivalent values by setting replaced values to nil
instead of removing them from the slice to eliminate copying.  Also take
advantage of the entry number sort to break early once we reach a value
in a block that is not dominated.

For the code in issue #15112:

Before:
real    0m52.603s
user    0m56.957s
sys     0m1.213s

After:
real    0m22.048s
user    0m26.445s
sys     0m0.939s

Updates #15112

Change-Id: I06d9e1e1f1ad85d7fa196c5d51f0dc163907376d
Reviewed-on: https://go-review.googlesource.com/22068
Reviewed-by: David Chase <drchase@google.com>
2016-04-15 00:30:39 +00:00
Todd Neal
3ea7cfabbb cmd/compile: sort partitions by dom to speed up cse
We do two O(n) scans of all values in an eqclass when computing
substitutions for CSE.

In unfortunate cases, like those found in #15112, we can have a large
eqclass composed of values found in blocks none of whom dominate the
other.  This leads to O(n^2) behavior. The elements are removed one at a
time, with O(n) scans each time.

This CL removes the linear scan by sorting the eqclass so that dominant
values will be sorted first.  As long as we also ensure we don't disturb
the sort order, then we no longer need to scan for the maximally
dominant value.

For the code in issue #15112:

Before:
real    1m26.094s
user    1m30.776s
sys     0m1.125s

Aefter:
real    0m52.099s
user    0m56.829s
sys     0m1.092s

Updates #15112

Change-Id: Ic4f8680ed172e716232436d31963209c146ef850
Reviewed-on: https://go-review.googlesource.com/21981
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13 19:55:15 +00:00
Alexandru Moșoi
e0611b1664 cmd/compile: use shared dom tree for cse, too
Missed this in the previous CL where the shared
dom tree was introduced.

Change-Id: If0bd85d4b4567d7e87814ed511603b1303ab3903
Reviewed-on: https://go-review.googlesource.com/21970
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-13 12:42:44 +00:00
Josh Bleecher Snyder
db5338f879 cmd/compile: teach CSE that new objects are bespoke
runtime.newobject never returns the same thing twice,
so the resulting value will never be a common subexpression.

This helps when compiling large static data structures
that include pointers, such as maps and slices.
No clear performance impact on other code. (See below.)

For the code in issue #15112:

Before:
  real	1m14.238s
  user	1m18.985s
  sys	0m0.787s

After:
  real	0m47.172s
  user	0m52.248s
  sys	0m0.767s

For the code in issue #15235, size 10k:

Before:
  real	0m44.916s
  user	0m46.577s
  sys	0m0.304s

After:
  real	0m7.703s
  user	0m9.041s
  sys	0m0.316s

Still more work to be done, particularly for #15112.

Updates #15112
Updates #15235


name       old time/op      new time/op      delta
Template        330ms ±11%       333ms ±13%    ~           (p=0.749 n=20+19)
Unicode         148ms ± 6%       152ms ± 8%    ~           (p=0.072 n=18+20)
GoTypes         1.01s ± 7%       1.01s ± 3%    ~           (p=0.583 n=20+20)
Compiler        5.04s ± 2%       5.06s ± 2%    ~           (p=0.314 n=20+20)

name       old user-ns/op   new user-ns/op   delta
Template   444user-ms ±11%  445user-ms ±10%    ~           (p=0.738 n=20+20)
Unicode    215user-ms ± 5%  218user-ms ± 5%    ~           (p=0.239 n=18+18)
GoTypes    1.45user-s ± 3%  1.45user-s ± 4%    ~           (p=0.620 n=20+20)
Compiler   7.23user-s ± 2%  7.22user-s ± 2%    ~           (p=0.901 n=20+19)

name       old alloc/op     new alloc/op     delta
Template       55.0MB ± 0%      55.0MB ± 0%    ~           (p=0.547 n=20+20)
Unicode        37.6MB ± 0%      37.6MB ± 0%    ~           (p=0.301 n=20+20)
GoTypes         177MB ± 0%       177MB ± 0%    ~           (p=0.065 n=20+19)
Compiler        798MB ± 0%       797MB ± 0%  -0.05%        (p=0.000 n=19+20)

name       old allocs/op    new allocs/op    delta
Template         492k ± 0%        493k ± 0%  +0.03%        (p=0.030 n=20+20)
Unicode          377k ± 0%        377k ± 0%    ~           (p=0.423 n=20+19)
GoTypes         1.40M ± 0%       1.40M ± 0%    ~           (p=0.102 n=20+20)
Compiler        5.53M ± 0%       5.53M ± 0%    ~           (p=0.094 n=17+18)

name       old text-bytes   new text-bytes   delta
HelloSize        561k ± 0%        561k ± 0%    ~     (all samples are equal)
CmdGoSize       6.13M ± 0%       6.13M ± 0%    ~     (all samples are equal)

name       old data-bytes   new data-bytes   delta
HelloSize        128k ± 0%        128k ± 0%    ~     (all samples are equal)
CmdGoSize        306k ± 0%        306k ± 0%    ~     (all samples are equal)

name       old exe-bytes    new exe-bytes    delta
HelloSize        905k ± 0%        905k ± 0%    ~     (all samples are equal)
CmdGoSize       9.64M ± 0%       9.64M ± 0%    ~     (all samples are equal)

Change-Id: Id774e2901d7701a3ec7a1c1d1cf1d9327a4107fc
Reviewed-on: https://go-review.googlesource.com/21937
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-04-13 02:11:56 +00:00
Keith Randall
56e0ecc5ea cmd/compile: keep value use counts in SSA
Keep track of how many uses each Value has.  Each appearance in
Value.Args and in Block.Control counts once.

The number of uses of a value is generically useful to
constrain rewrite rules.  For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.

But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use.  Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races.  In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice.  (I have a separate
CL which triggers the mutex failure.  This CL has a test which
demonstrates a similar failure.)

Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17 04:20:02 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
David Chase
378a863682 [dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats

Spaces in the phase names can be specified with an
underscore.  Flags currently parsed (not necessarily
recognized by the phases yet) are:

   on, off, mem, time, debug, stats, and test

On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.

The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output.  Output fields
are separated by tabs to ease digestion by awk and
spreadsheets.  For example,
	if f.pass.stats > 0 {
		f.logStat("CSE REWRITES", rewrites)
	}

Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 20:32:15 +00:00
Todd Neal
b96189d1a0 [dev.ssa] cmd/compile: speed up cse
Construct better initial partitions by recursively comparing values and
their arguments.  This saves one second on compile of arithConst_ssa.go
(4.3s to 3.3s) and shows a 3-5% increase with compilebench.

name       old time/op     new time/op     delta
Template       266ms ± 3%      253ms ± 4%  -5.08%          (p=0.032 n=5+5)
GoTypes        927ms ± 3%      885ms ± 2%  -4.55%          (p=0.016 n=5+5)
Compiler       3.91s ± 3%      3.73s ± 2%  -4.49%          (p=0.008 n=5+5)
MakeBash       31.6s ± 1%      30.5s ± 3%  -3.51%          (p=0.016 n=5+5)

Change-Id: I6ede31ff459131ccfed69531acfbd06b19837700
Reviewed-on: https://go-review.googlesource.com/19838
Reviewed-by: David Chase <drchase@google.com>
2016-02-24 16:57:05 +00:00
David Chase
c3db6c95b6 [dev.ssa] cmd/compile: double speed of CSE phase
Replaced comparison based on (*Type).String() with an
allocation-free structural comparison.  Roughly doubles
speed of CSE, also reduces allocations.

Checked that roughly the same number of CSEs were detected
during make.bash (about a million) and that "new" CSEs
were caused by the effect described above.

Change-Id: Id205a9f6986efd518043e12d651f0b01206aeb1b
Reviewed-on: https://go-review.googlesource.com/19471
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-22 17:15:41 +00:00
Alexandru Moșoi
88c1ef5b45 [dev.ssa] cmd/compile/internal/ssa: handle commutative operations in cse
* If a operation is commutative order the parameters
in a canonical way.

Size of pkg/tool/linux_amd64/* excluding compile:
before: 95882288
 after: 95868152
change: 14136 ~0.015%

I tried something similar with Leq and Geq, but the results were
not great because it confuses the 'lowered cse' pass too much
which can no longer remove redundant comparisons from IsInBounds.

Change-Id: I2f928663a11320bfc51c7fa47e384b7411c420ba
Reviewed-on: https://go-review.googlesource.com/19727
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-22 16:52:05 +00:00
Todd Neal
94f0245114 [dev.ssa] cmd/compile: add a zero arg cse pass
Add an initial cse pass that only operates on zero argument
values.  This removes the need for a special case in cse for removing
OpSB and speeds up arithConst_ssa.go compilation by 9% while slowing
"test -c net/http" by 1.5%.

Change-Id: Id1500482485426f66c6c2eba75eeaf4f19c8a889
Reviewed-on: https://go-review.googlesource.com/19454
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-22 13:06:18 +00:00
David Chase
58cfa40419 [dev.ssa] cmd/compile: fix for bug in cse speed improvements
Problem was caused by use of Args[].Aux differences
in early partitioning.  This artificially separated
two equivalent expressions because sort ignores the
Aux field, hence things can end with equal things
separated by unequal things and thus the equal things
are split into more than one partition.  For example:
SliceLen(a), SliceLen(b), SliceLen(a).

Fix: don't use Args[].Aux in initial partitioning.

Left in a debugging flag and some debugging Fprintf's;
not sure if that is house style or not.  We'll probably
want to be more systematic in our naming conventions,
e.g. ssa.cse, ssa.scc, etc.

Change-Id: Ib1412539cc30d91ea542c0ac7b2f9b504108ca7f
Reviewed-on: https://go-review.googlesource.com/19316
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-08 18:44:03 +00:00
Todd Neal
bc07922843 [dev.ssa] cmd/compile: speed up cse
Examine both Aux and AuxInt to form more precise initial partitions.
Restructure loop to avoid repeated type.Equal() call.  Speeds up
compilation of testdata/gen/arithConst_ssa by 25%.

Change-Id: I3cfb1d254adf0601ee69239e1885b0cf2a23575b
Reviewed-on: https://go-review.googlesource.com/19313
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-08 02:43:19 +00:00
Todd Neal
955749c45f [dev.ssa] cmd/compile: remove dead code
Change-Id: I1738e3af7de0972c54d74325d80781059d0796d8
Reviewed-on: https://go-review.googlesource.com/19186
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-04 00:53:55 +00:00
Keith Randall
8a961aee28 [dev.ssa] cmd/compile: fix -N build
The OpSB hack didn't quite work.  We need to really
CSE these ops to make regalloc happy.

Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:27:12 +00:00
Keith Randall
6a96a2fe5a [dev.ssa] cmd/compile: make cse faster
It is one of the slowest compiler phases right now, and we
run two of them.

Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.

Do a few optimizations to avoid work for size-1 equivalence classes.

Implement -N.

Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
2016-01-28 20:59:20 +00:00
Keith Randall
4989337192 [dev.ssa] cmd/compile: allow control values to be CSEd
With the separate flagalloc pass, it should be fine to
allow CSE of control values.  The worst that can happen
is that the comparison gets un-CSEd by flagalloc.

Fix bug in flagalloc where flag restores were getting
clobbered by rematerialization during register allocation.

Change-Id: If476cf98b69973e8f1a8eb29441136dd12fab8ad
Reviewed-on: https://go-review.googlesource.com/17760
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2015-12-12 06:41:05 +00:00
Josh Bleecher Snyder
9552295833 [dev.ssa] cmd/compile: minor CSE cleanup
Remove unnecessary local var split.

Change-Id: I907ef682b5fd9b3a67771edd1fe90c558f8937ea
Reviewed-on: https://go-review.googlesource.com/14523
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-11 20:49:18 +00:00
David Chase
2a29576562 [dev.ssa] cmd/compile: fix N^2 dominator queries in CSE
Added tree numbering data structure.
Changed dominator query in CSE.
Removed skip-for-too-big patch in CSE.
Passes all.bash.

Change-Id: I98d7c61b6015c81f5edab553615db17bc7a58d68
Reviewed-on: https://go-review.googlesource.com/14326
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-09 23:31:42 +00:00
Josh Bleecher Snyder
95bb89f6dd [dev.ssa] cmd/compile: fix build
CL 14337 made SSA support fixedbugs/issue9604b.go.
That test contains > 40k blocks.
This made the O(n^2) dom algorithm fail to terminate
in a reasonable length of time, breaking the build.

For the moment, cap the number of blocks
to fix the build.

This will be reverted when a more efficient
dom algorithm is put in place,
which will be soon.

Change-Id: Ia66c2629481d29d06655ec54d1deff076b0422c6
Reviewed-on: https://go-review.googlesource.com/14342
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-09-06 22:08:14 +00:00
Todd Neal
19447a66d6 [dev.ssa] cmd/compile: store floats in AuxInt
Store floats in AuxInt to reduce allocations.

Change-Id: I101e6322530b4a0b2ea3591593ad022c992e8df8
Reviewed-on: https://go-review.googlesource.com/14320
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-04 15:15:21 +00:00
Todd Neal
ce4317266c [dev.ssa] cmd/compile: cse should treat -0.0 and 0.0 as different
cse was incorrectly classifying -0.0 and 0.0 as equivalent. This lead
to invalid code as ssa uses PXOR -0.0, reg to negate a floating point.

Fixes math.

Change-Id: Id7eb10c71749eaed897f29b02c33891cf5820acf
Reviewed-on: https://go-review.googlesource.com/14205
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-03 03:05:02 +00:00
Josh Bleecher Snyder
851ceebceb [dev.ssa] cmd/compile: don't alloc new CSE classes
This reduces the time to compile
test/slice3.go on my laptop from ~12s to ~3.8s.
It reduces the max memory use from ~4.8gb to
~450mb.

This is still considerably worse than tip,
at 1s and 300mb respectively, but it's
getting closer.

Hopefully this will fix the build at long last.

Change-Id: Iac26b52023f408438cba3ea1b81dcd82ca402b90
Reviewed-on: https://go-review.googlesource.com/12566
Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23 18:13:32 +00:00
Josh Bleecher Snyder
317226e61c [dev.ssa] cmd/compile: use v.Args[x].Op in CSE key
Experimentally, the Ops of v.Args do a good job
of differentiating values that will end up in
different partitions.

Most values have at most two args, so use them.

This reduces the wall time to run test/slice3.go
on my laptop from ~20s to ~12s.

Credit to Todd Neal for the idea.

Change-Id: I55d08f09eb678bbe8366924ca2fabcd32526bf41
Reviewed-on: https://go-review.googlesource.com/12565
Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23 18:12:44 +00:00
Josh Bleecher Snyder
8c954d5780 [dev.ssa] cmd/compile: speed up cse
By walking only the current set of partitions
at any given point, the cse pass ended up doing
lots of extraneous, effectively O(n^2) work.

Using a regular for loop allows each cse pass to
make as much progress as possible by processing
each new class as it is introduced.

This can and should be optimized further,
but it already reduces by 75% cse time on test/slice3.go.

The overall time to compile test/slice3.go is still
dominated by the O(n^2) work in the liveness pass.
However, Keith is rewriting regalloc anyway.

Change-Id: I8be020b2f69352234587eeadeba923481bf43fcc
Reviewed-on: https://go-review.googlesource.com/12244
Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23 18:06:04 +00:00
Josh Bleecher Snyder
00437ebe73 [dev.ssa] cmd/compile: don't combine phi vars from different blocks in CSE
Here is a concrete case in which this goes wrong.

func f_ssa() int {
	var n int
Next:
	for j := 0; j < 3; j++ {
		for i := 0; i < 10; i++ {
			if i == 6 {
				continue Next
			}
			n = i
		}
		n += j + j + j + j + j + j + j + j + j + j // j * 10
	}
	return n
}

What follows is the function printout before and after CSE.

Note blocks b8 and b10 in the before case.

b8 is the inner loop's condition: i < 10.
b10 is the inner loop's increment: i++.
v82 is i. On entry to b8, it is either 0 (v19) the first time,
or the result of incrementing v82, by way of v29.

The CSE pass considered v82 and v49 to be common subexpressions,
and eliminated v82 in favor of v49.

In the after case, v82 is now dead and will shortly be eliminated.
As a result, v29 is also dead, and we have lost the increment.
The loop runs forever.

BEFORE CSE

f_ssa <nil>
  b1:
    v1 = Arg <mem>
    v2 = SP <uint64>
    v4 = Addr <*int> {~r0} v2
    v13 = Zero <mem> [8] v4 v1
    v14 = Const <int>
    v15 = Const <int>
    v17 = Const <int> [3]
    v19 = Const <int>
    v21 = Const <int> [10]
    v24 = Const <int> [6]
    v28 = Const <int> [1]
    v43 = Const <int> [1]
    Plain -> b3
  b2: <- b7
    Exit v47
  b3: <- b1
    Plain -> b4
  b4: <- b3 b6
    v49 = Phi <int> v15 v44
    v68 = Phi <int> v14 v67
    v81 = Phi <mem> v13 v81
    v18 = Less <bool> v49 v17
    If v18 -> b5 b7
  b5: <- b4
    Plain -> b8
  b6: <- b12 b11
    v67 = Phi <int> v66 v41
    v44 = Add <int> v49 v43
    Plain -> b4
  b7: <- b4
    v47 = Store <mem> v4 v68 v81
    Plain -> b2
  b8: <- b5 b10
    v66 = Phi <int> v68 v82
    v82 = Phi <int> v19 v29
    v22 = Less <bool> v82 v21
    If v22 -> b9 b11
  b9: <- b8
    v25 = Eq <bool> v82 v24
    If v25 -> b12 b13
  b10: <- b13
    v29 = Add <int> v82 v28
    Plain -> b8
  b11: <- b8
    v32 = Add <int> v49 v49
    v33 = Add <int> v32 v49
    v34 = Add <int> v33 v49
    v35 = Add <int> v34 v49
    v36 = Add <int> v35 v49
    v37 = Add <int> v36 v49
    v38 = Add <int> v37 v49
    v39 = Add <int> v38 v49
    v40 = Add <int> v39 v49
    v41 = Add <int> v66 v40
    Plain -> b6
  b12: <- b9
    Plain -> b6
  b13: <- b9
    Plain -> b10

AFTER CSE

f_ssa <nil>
  b1:
    v1 = Arg <mem>
    v2 = SP <uint64>
    v4 = Addr <*int> {~r0} v2
    v13 = Zero <mem> [8] v4 v1
    v14 = Const <int>
    v15 = Const <int>
    v17 = Const <int> [3]
    v19 = Const <int>
    v21 = Const <int> [10]
    v24 = Const <int> [6]
    v28 = Const <int> [1]
    v43 = Const <int> [1]
    Plain -> b3
  b2: <- b7
    Exit v47
  b3: <- b1
    Plain -> b4
  b4: <- b3 b6
    v49 = Phi <int> v19 v44
    v68 = Phi <int> v19 v67
    v81 = Phi <mem> v13 v81
    v18 = Less <bool> v49 v17
    If v18 -> b5 b7
  b5: <- b4
    Plain -> b8
  b6: <- b12 b11
    v67 = Phi <int> v66 v41
    v44 = Add <int> v49 v43
    Plain -> b4
  b7: <- b4
    v47 = Store <mem> v4 v68 v81
    Plain -> b2
  b8: <- b5 b10
    v66 = Phi <int> v68 v49
    v82 = Phi <int> v19 v29
    v22 = Less <bool> v49 v21
    If v22 -> b9 b11
  b9: <- b8
    v25 = Eq <bool> v49 v24
    If v25 -> b12 b13
  b10: <- b13
    v29 = Add <int> v49 v43
    Plain -> b8
  b11: <- b8
    v32 = Add <int> v49 v49
    v33 = Add <int> v32 v49
    v34 = Add <int> v33 v49
    v35 = Add <int> v34 v49
    v36 = Add <int> v35 v49
    v37 = Add <int> v36 v49
    v38 = Add <int> v37 v49
    v39 = Add <int> v38 v49
    v40 = Add <int> v39 v49
    v41 = Add <int> v66 v40
    Plain -> b6
  b12: <- b9
    Plain -> b6
  b13: <- b9
    Plain -> b10

Change-Id: I16fc4ec527ec63f24f7d0d79d1a4a59bf37269de
Reviewed-on: https://go-review.googlesource.com/12444
Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23 18:05:33 +00:00
Josh Bleecher Snyder
d9a704cd40 [dev.ssa] cmd/compile/ssa: refine type equality in cse
The correct way to compare gc.Types is Eqtype,
rather than pointer equality.
Introduce an Equal method for ssa.Type to allow
us to use it.

In the cse pass, use a type's string to build
the coarse partition, and then use Type.Equal
during refinement.

This lets the cse pass do a better job.
In the ~20% of the standard library that SSA
can compile, the number of common subexpressions
recognized by the cse pass increases from
27,550 to 32,199 (+17%). The number of nil checks
eliminated increases from 75 to 115 (+50%).

Change-Id: I0bdbfcf613ca6bc2ec987eb19b6b1217b51f3008
Reviewed-on: https://go-review.googlesource.com/11451
Reviewed-by: Keith Randall <khr@golang.org>
2015-06-29 03:27:30 +00:00
Josh Bleecher Snyder
d779b20cd2 [dev.ssa] cmd/compile/ssa: improve comments, logging, and debug output
Change-Id: Id949db82ddaf802c1aa245a337081d4d46fd914f
Reviewed-on: https://go-review.googlesource.com/11380
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-06-24 20:44:57 +00:00
Josh Bleecher Snyder
2a846d2bd3 [dev.ssa] cmd/compile/ssa: add nilcheckelim pass
The nilcheckelim pass eliminates unnecessary nil checks.
The initial implementation removes redundant nil checks.
See the comments in nilcheck.go for ideas for future
improvements.

The efficacy of the cse pass has a significant impact
on this efficacy of this pass.

There are 886 nil checks in the parts of the standard
library that SSA can currently compile (~20%).

This pass eliminates 75 (~8.5%) of them.

As a data point, with a more aggressive but unsound
cse pass that treats many more types as identical,
this pass eliminates 115 (~13%) of the nil checks.

Change-Id: I13e567a39f5f6909fc33434d55c17a7e3884a704
Reviewed-on: https://go-review.googlesource.com/11430
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-06-24 19:56:51 +00:00
Daniel Morsing
3b817ef8f8 [dev.ssa] fix equivalence class after aux/auxint refactor.
This caused the following code snippet to be miscompiled

	var f int
	x := g(&f)
	f = 10

Moving the store of 10 above the function call.

Change-Id: Ic6951f5e7781b122cd881df324a38e519d6d66f0
Reviewed-on: https://go-review.googlesource.com/11073
Reviewed-by: Keith Randall <khr@golang.org>
2015-06-15 12:06:05 +00:00
Keith Randall
067e8dfd82 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge of tip to dev.ssa.

Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.

Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d
2015-05-28 13:51:18 -07:00