Commit graph

49 commits

Author SHA1 Message Date
Alexandru Moșoi
8b20fd000d cmd/compile: transform some Phis into Or8.
func f(a, b bool) bool {
          return a || b
}

is now a single instructions (excluding loading and unloading the arguments):
      v10 = ORB <bool> v11 v12 : AX

Change-Id: Iff63399410cb46909f4318ea1c3f45a029f4aa5e
Reviewed-on: https://go-review.googlesource.com/21872
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-19 22:04:30 +00:00
Alexandru Moșoi
e0611b1664 cmd/compile: use shared dom tree for cse, too
Missed this in the previous CL where the shared
dom tree was introduced.

Change-Id: If0bd85d4b4567d7e87814ed511603b1303ab3903
Reviewed-on: https://go-review.googlesource.com/21970
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-13 12:42:44 +00:00
Alexandru Moșoi
9743e4b031 cmd/compile: share dominator tree among many passes
These passes do not modify the dominator tree too much.

% benchstat old.txt new.txt
name       old time/op     new time/op     delta
Template       335ms ± 3%      325ms ± 8%    ~             (p=0.074 n=8+9)
GoTypes        1.05s ± 1%      1.05s ± 3%    ~            (p=0.095 n=9+10)
Compiler       5.37s ± 4%      5.29s ± 1%  -1.42%         (p=0.022 n=9+10)
MakeBash       34.9s ± 3%      34.4s ± 2%    ~            (p=0.095 n=9+10)

name       old alloc/op    new alloc/op    delta
Template      55.4MB ± 0%     54.9MB ± 0%  -0.81%        (p=0.000 n=10+10)
GoTypes        179MB ± 0%      178MB ± 0%  -0.89%        (p=0.000 n=10+10)
Compiler       807MB ± 0%      798MB ± 0%  -1.10%        (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        498k ± 0%       496k ± 0%  -0.29%          (p=0.000 n=9+9)
GoTypes        1.42M ± 0%      1.41M ± 0%  -0.24%        (p=0.000 n=10+10)
Compiler       5.61M ± 0%      5.60M ± 0%  -0.12%        (p=0.000 n=10+10)

Change-Id: I4cd20cfba3f132ebf371e16046ab14d7e42799ec
Reviewed-on: https://go-review.googlesource.com/21806
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-12 14:44:26 +00:00
Keith Randall
7f53391f6b cmd/compile: fix -N build
The decomposer of builtin types is confused by having structs
still around from the user-type decomposer.  They're all dead though,
so just enabling a deadcode pass fixes things.

Change-Id: I2df6bc7e829be03eabfd24c8dda1bff96f3d7091
Reviewed-on: https://go-review.googlesource.com/21839
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-11 19:43:47 +00:00
Alexandru Moșoi
1747788c56 cmd/compile: add a pass to print bound checks
Since BCE happens over several passes (opt, loopbce, prove)
it's easy to regress especially with rewriting.

The pass is only activated with special debug flag.

Change-Id: I46205982e7a2751156db8e875d69af6138068f59
Reviewed-on: https://go-review.googlesource.com/21510
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-05 12:46:59 +00:00
Alexandru Moșoi
b91cc53033 cmd/compile/internal/ssa: BCE for induction variables
There are 5293 loop in the main go repository.
A survey of the top most common for loops:

     18 for __k__ := 0; i < len(sa.Addr); i++ {
     19 for __k__ := 0; ; i++ {
     19 for __k__ := 0; i < 16; i++ {
     25 for __k__ := 0; i < length; i++ {
     30 for __k__ := 0; i < 8; i++ {
     49 for __k__ := 0; i < len(s); i++ {
     67 for __k__ := 0; i < n; i++ {
    376 for __k__ := range __slice__ {
    685 for __k__, __v__ := range __slice__ {
   2074 for __, __v__ := range __slice__ {

The algorithm to find induction variables handles all cases
with an upper limit. It currently doesn't find related induction
variables such as c * ind or c + ind.

842 out of 22954 bound checks are removed for src/make.bash.
1957 out of 42952 bounds checks are removed for src/all.bash.

Things to do in follow-up CLs:
* Find the associated pointer for `for _, v := range a {}`
* Drop the NilChecks on the pointer.
* Replace the implicit induction variable by a loop over the pointer

Generated garbage can be reduced if we share the sdom between passes.

% benchstat old.txt new.txt
name       old time/op     new time/op     delta
Template       337ms ± 3%      333ms ± 3%    ~             (p=0.258 n=9+9)
GoTypes        1.11s ± 2%      1.10s ± 2%    ~           (p=0.912 n=10+10)
Compiler       5.25s ± 1%      5.29s ± 2%    ~             (p=0.077 n=9+9)
MakeBash       33.5s ± 1%      34.1s ± 2%  +1.85%          (p=0.011 n=9+9)

name       old alloc/op    new alloc/op    delta
Template      63.6MB ± 0%     63.9MB ± 0%  +0.52%         (p=0.000 n=10+9)
GoTypes        218MB ± 0%      219MB ± 0%  +0.59%         (p=0.000 n=10+9)
Compiler       978MB ± 0%      985MB ± 0%  +0.69%        (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        582k ± 0%       583k ± 0%  +0.10%        (p=0.000 n=10+10)
GoTypes        1.78M ± 0%      1.78M ± 0%  +0.12%        (p=0.000 n=10+10)
Compiler       7.68M ± 0%      7.69M ± 0%  +0.05%        (p=0.000 n=10+10)

name       old text-bytes  new text-bytes  delta
HelloSize       581k ± 0%       581k ± 0%  -0.08%        (p=0.000 n=10+10)
CmdGoSize      6.40M ± 0%      6.39M ± 0%  -0.08%        (p=0.000 n=10+10)

name       old data-bytes  new data-bytes  delta
HelloSize      3.66k ± 0%      3.66k ± 0%    ~     (all samples are equal)
CmdGoSize       134k ± 0%       134k ± 0%    ~     (all samples are equal)

name       old bss-bytes   new bss-bytes   delta
HelloSize       126k ± 0%       126k ± 0%    ~     (all samples are equal)
CmdGoSize       149k ± 0%       149k ± 0%    ~     (all samples are equal)

name       old exe-bytes   new exe-bytes   delta
HelloSize       947k ± 0%       946k ± 0%  -0.01%        (p=0.000 n=10+10)
CmdGoSize      9.92M ± 0%      9.91M ± 0%  -0.06%        (p=0.000 n=10+10)

Change-Id: Ie74bdff46fd602db41bb457333d3a762a0c3dc4d
Reviewed-on: https://go-review.googlesource.com/20517
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
2016-04-01 09:37:58 +00:00
David Chase
8eec2bbfbc cmd/compile: added some intrinsics to SSA back end
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have.  I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.

These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )

All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.

Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 16:29:59 +00:00
David Chase
09a9ce60c7 cmd/compile: get gcflags to bootstrap; ssa debug opts for "all"
This is intended to help debug compiler problems that pop
up in the bootstrap phase of make.bash.  GO_GCFLAGS does not
normally apply there.  Options-for-all phases is intended
to allow crude tracing (and full timing) by turning on timing
for all phases, not just one.

Phase names can also be specified using a regular expression,
for example
BOOT_GO_GCFLAGS=-d='ssa/~^.*scc$/off' \
GO_GCFLAGS='-d=ssa/~^.*scc$/off' ./make.bash

I just added this because it was the fastest way to get
me to a place where I could easily debug the compiler.

Change-Id: I0781f3e7c19651ae7452fa25c2d54c9a245ef62d
Reviewed-on: https://go-review.googlesource.com/20775
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 01:04:47 +00:00
David Chase
2d56dee61b cmd/compile: escape analysis explanations added to -m -m output
This should probably be considered "experimental" at this stage, but
what it needs is feedback from adventurous adopters.  I think the data
structure used for describing escape reasons might be extendable to
allow a cleanup of the underlying algorithms, which suffers from
insufficiently separated concerns (the graph does not deal well with
escape level adjustments, so it is augmented by a second custom-walk
portion of the "flood" phase. It would be better to put it all,
including level adjustments, in a single graph structure, and then
simply flood the graph.

Tweaked to avoid allocations in the no-logging case.

Modified run.go to ignore lines with leading "#" in the output (since
it can never match a line), and in -update_errors to ignore leading
tabs in output lines and to normalize embedded filenames.

Currently requires -m -m because otherwise the noise/update
burden for the other escape tests is considerable.

There is a partial test.  Existing escape analysis tests seem to
cover all except the panic case and what looks like it might be
unreachable code in escape analysis.

Fixes #10526.

Change-Id: I2524fdec54facae48b00b2548e25d9e46fcaf832
Reviewed-on: https://go-review.googlesource.com/18041
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-17 13:29:48 +00:00
Keith Randall
5305a329d8 cmd/compile: turn off SSA internal consistency checks
They've been on for a few weeks of general use and nothing
has tripped up on them yet.

Makes the compiler ~18% faster.

Change-Id: I42d7bbc0581597f9cf4fb28989847814c81b08a2
Reviewed-on: https://go-review.googlesource.com/20741
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-15 22:25:48 +00:00
Alexandru Moșoi
13f74db304 cmd/compile: fix no-opt build after moving decomposing user functions
decompose-builtin pass requires an opt pass, but -N disables
late-opt, the only opt pass (out of two) that happens
after decompose-builtin.  This CL enables both 'opt' and 'late opt'
passes. The extra compile time for 'late opt' in negligible
since most rewrites were already done in the first 'opt'
(also measured before). We should put some effort in splitting the
generic rules into required and optional.

Also update generic.rules comments about lowering
of StringMake and SliceMake.

Tested with GO_GCFLAGS=-N ./all.bash

Change-Id: I92999681aaa02587b6dc6e32ce997a91f1fc9499
Reviewed-on: https://go-review.googlesource.com/20682
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-14 18:44:10 +00:00
Alexandru Moșoi
8ec80176d4 cmd/compile: move decompose builtin closer to late opt
* Shaves about 10k from pkg/tools/linux_amd64.
* Was suggested by drchase before
* Found by looking at ssa output of #14758

Change-Id: If2c4ddf3b2603d4dfd8fb4d9199b9a3dcb05b17d
Reviewed-on: https://go-review.googlesource.com/20570
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-13 22:12:01 +00:00
Alexandru Moșoi
dfcb853d9d cmd/compile/internal/ssa: lower builtins much later
* Move lowering into a separate pass.
* SliceLen/SliceCap is now available to various intermediate passes
which use useful for bounds checking.
* Add a second opt pass to handle the new opportunities

Decreases the code size of binaries in pkg/tool/linux_amd64
by ~45K.

Updates #14564 #14606

Change-Id: I5b2bd6202181c50623a3585fbf15c0d6db6d4685
Reviewed-on: https://go-review.googlesource.com/20172
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-09 11:08:59 +00:00
Keith Randall
c0740fed37 cmd/compile: more ssa config flags
To turn ssa compilation on or off altogether, use
-ssa=1 or -ssa=0.  Default is on.

To turn on or off consistency checks, do
-d=ssa/check/on or -d=ssa/check/off.  Default is on for now.

Change-Id: I277e0311f538981c8b9c62e7b7382a0c8755ce4c
Reviewed-on: https://go-review.googlesource.com/20217
Reviewed-by: David Chase <drchase@google.com>
2016-03-04 19:03:12 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
David Chase
6b3462c784 [dev.ssa] cmd/compile: adjust branch likeliness for calls/loops
Static branch predictions (which guide block ordering) are
adjusted based on:

loop/not-loop     (favor looping)
abnormal-exit/not (avoid panic)
call/not-call     (avoid call)
ret/default       (treat returns as rare)

This appears to make no difference in performance of real
code, meaning the compiler itself.  The earlier version of
this has been stripped down to help make the cost of this
only-aesthetic-on-Intel phase be as cheap as possible (we
probably want information about inner loops for improving
register allocation, but because register allocation follows
close behind this pass, conceivably the information could be
reused -- so we might do this anyway just to normalize
output).

For a ./make.bash that takes 200 user seconds, about .75
second is reported in likelyadjust (summing nanoseconds
reported with -d=ssa/likelyadjust/time ).

Upstream predictions are respected.
Includes test, limited to build on amd64 only.
Did several iterations on the debugging output to allow
some rough checks on behavior.
Debug=1 logging notes agree/disagree with earlier passes,
allowing analysis like the following:

Run on make.bash:
GO_GCFLAGS=-d=ssa/likelyadjust/debug \
   ./make.bash >& lkly5.log

grep 'ranch prediction' lkly5.log | wc -l
   78242 // 78k predictions

grep 'ranch predi' lkly5.log | egrep -v 'agrees with' | wc -l
   29633 // 29k NEW predictions

grep 'disagrees' lkly5.log | wc -l
     444 // contradicted 444 times

grep '< exit' lkly5.log | wc -l
   10212 // 10k exit predictions

grep '< exit' lkly5.log | egrep 'disagrees' | wc -l
       5 // 5 contradicted by previous prediction

grep '< exit' lkly5.log | egrep -v 'agrees' | wc -l
     702 // 702-5 redundant with previous prediction

grep '< call' lkly5.log | egrep -v 'agrees' | wc -l
   16699 // 16k new call predictions

grep 'stay in loop' lkly5.log | egrep -v 'agrees' | wc -l
    3951 // 4k new "remain in loop" predictions

Fixes #11451.

Change-Id: Iafb0504f7030d304ef4b6dc1aba9a5789151a593
Reviewed-on: https://go-review.googlesource.com/19995
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-01 20:09:41 +00:00
Alexandru Moșoi
e197f467d5 [dev.ssa] cmd/compile/internal/ssa: simplify boolean phis
* Decreases the generated code slightly.
* Similar to phiopt pass from gcc, except it only handles
booleans. Handling Eq/Neq had no impact on the generated code.

name       old time/op     new time/op     delta
Template       453ms ± 4%      451ms ± 4%    ~           (p=0.468 n=24+24)
GoTypes        1.55s ± 1%      1.55s ± 2%    ~           (p=0.287 n=24+25)
Compiler       6.53s ± 2%      6.56s ± 1%  +0.46%        (p=0.050 n=23+23)
MakeBash       45.8s ± 2%      45.7s ± 2%    ~           (p=0.866 n=24+25)

name       old text-bytes  new text-bytes  delta
HelloSize       676k ± 0%       676k ± 0%    ~     (all samples are equal)
CmdGoSize      8.07M ± 0%      8.07M ± 0%  -0.03%        (p=0.000 n=25+25)

Change-Id: Ia62477b7554127958a14cb27f85849b095d63663
Reviewed-on: https://go-review.googlesource.com/20090
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01 17:56:13 +00:00
Alexandru Moșoi
bdea1d58cf [dev.ssa] cmd/compile/internal/ssa: remove proven redundant controls.
* It does very simple bounds checking elimination. E.g.
removes the second check in for i := range a { a[i]++; a[i++]; }
* Improves on the following redundant expression:
return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
* Linear in the number of block edges.

I patched in CL 12960 that does bounds, nil and constant propagation
to make sure this CL is not just redundant. Size of pkg/tool/linux_amd64/*
(excluding compile which is affected by this change):

With IsInBounds and IsSliceInBounds
-this -12960 92285080
+this -12960 91947416
-this +12960 91978976
+this +12960 91923088

Gain is ~110% of 12960.

Without IsInBounds and IsSliceInBounds (older run)
-this -12960 95515512
+this -12960 95492536
-this +12960 95216920
+this +12960 95204440

Shaves 22k on its own.

* Can we handle IsInBounds better with this? In
for i := range a { a[i]++; } the bounds checking at a[i]
is not eliminated.

Change-Id: I98957427399145fb33693173fd4d5a8d71c7cc20
Reviewed-on: https://go-review.googlesource.com/19710
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-28 19:48:20 +00:00
David Chase
378a863682 [dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats

Spaces in the phase names can be specified with an
underscore.  Flags currently parsed (not necessarily
recognized by the phases yet) are:

   on, off, mem, time, debug, stats, and test

On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.

The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output.  Output fields
are separated by tabs to ease digestion by awk and
spreadsheets.  For example,
	if f.pass.stats > 0 {
		f.logStat("CSE REWRITES", rewrites)
	}

Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 20:32:15 +00:00
Todd Neal
94f0245114 [dev.ssa] cmd/compile: add a zero arg cse pass
Add an initial cse pass that only operates on zero argument
values.  This removes the need for a special case in cse for removing
OpSB and speeds up arithConst_ssa.go compilation by 9% while slowing
"test -c net/http" by 1.5%.

Change-Id: Id1500482485426f66c6c2eba75eeaf4f19c8a889
Reviewed-on: https://go-review.googlesource.com/19454
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-22 13:06:18 +00:00
Keith Randall
7f7f7cddec [dev.ssa] cmd/compile: split decompose pass in two
A first pass to decompose user types (structs, maybe
arrays someday), and a second pass to decompose builtin
types (strings, interfaces, slices, complex).  David wants
this for value range analysis so he can have structs decomposed
but slices and friends will still be intact and he can deduce
things like the length of a slice is >= 0.

Change-Id: Ia2300d07663329b51ed6270cfed21d31980daa7c
Reviewed-on: https://go-review.googlesource.com/19340
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
2016-02-09 02:18:31 +00:00
David Chase
58cfa40419 [dev.ssa] cmd/compile: fix for bug in cse speed improvements
Problem was caused by use of Args[].Aux differences
in early partitioning.  This artificially separated
two equivalent expressions because sort ignores the
Aux field, hence things can end with equal things
separated by unequal things and thus the equal things
are split into more than one partition.  For example:
SliceLen(a), SliceLen(b), SliceLen(a).

Fix: don't use Args[].Aux in initial partitioning.

Left in a debugging flag and some debugging Fprintf's;
not sure if that is house style or not.  We'll probably
want to be more systematic in our naming conventions,
e.g. ssa.cse, ssa.scc, etc.

Change-Id: Ib1412539cc30d91ea542c0ac7b2f9b504108ca7f
Reviewed-on: https://go-review.googlesource.com/19316
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-08 18:44:03 +00:00
David Chase
c87a62f32b [dev.ssa] cmd/compile: reducing alloc footprint of dominator calc
Converted working slices of pointer into slices of pointer
index.  Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.

TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.

Minor optimization in regalloc that cuts allocation count.

Minor optimization in compile.go that cuts calls to Sprintf.

Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:20:25 +00:00
David Chase
88b230eaa6 [dev.ssa] cmd/compile: exposed do-log boolean to reduce allocations
From memory profiling, about 3% reduction in allocation count.

Change-Id: I4b662d55b8a94fe724759a2b22f05a08d0bf40f8
Reviewed-on: https://go-review.googlesource.com/19103
Reviewed-by: Keith Randall <khr@golang.org>
2016-01-29 21:30:29 +00:00
Keith Randall
d8a65672f8 [dev.ssa] cmd/compile: optimization for && and || expressions
Compiling && and || expressions often leads to control
flow of the following form:

p:
  If a goto b else c
b: <- p ...
  x = phi(a, ...)
  If x goto t else u

Note that if we take the edge p->b, then we are guaranteed
to take the edge b->t also.  So in this situation, we might
as well go directly from p to t.

Change-Id: I6974f1e6367119a2ddf2014f9741fdb490edcc12
Reviewed-on: https://go-review.googlesource.com/18910
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:49:45 +00:00
Keith Randall
8a961aee28 [dev.ssa] cmd/compile: fix -N build
The OpSB hack didn't quite work.  We need to really
CSE these ops to make regalloc happy.

Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:27:12 +00:00
Keith Randall
6a96a2fe5a [dev.ssa] cmd/compile: make cse faster
It is one of the slowest compiler phases right now, and we
run two of them.

Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.

Do a few optimizations to avoid work for size-1 equivalence classes.

Implement -N.

Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
2016-01-28 20:59:20 +00:00
Keith Randall
3c26c0db39 [dev.ssa] cmd/compile: short-circuit empty blocks
Empty blocks are introduced to remove critical edges.
After regalloc, we can remove any of the added blocks
that are still empty.

Change-Id: I0b40e95ac3a6cc1e632a479443479532b6c5ccd9
Reviewed-on: https://go-review.googlesource.com/18833
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-22 22:12:12 +00:00
Keith Randall
b5c5efd5de [dev.ssa] cmd/compile: optimize phi ops
Redo how we keep track of forward references when building SSA.
When the forward reference is resolved, update the Value node
in place.

Improve the phi elimination pass so it can simplify phis of phis.

Give SSA package access to decoded line numbers.  Fix line numbers
for constant booleans.

Change-Id: I3dc9896148d260be2f3dd14cbe5db639ec9fa6b7
Reviewed-on: https://go-review.googlesource.com/18674
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2016-01-20 21:45:37 +00:00
Keith Randall
7d9f1067d1 [dev.ssa] cmd/compile: better register allocator
Reorder how register & stack allocation is done.  We used to allocate
registers, then fix up merge edges, then allocate stack slots.  This
lead to lots of unnecessary copies on merge edges:

v2 = LoadReg v1
v3 = StoreReg v2

If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary.  But at regalloc time we didn't know the homes of v1 and
v3.

To fix this problem, allocate all the stack slots before fixing up the
merge edges.  That way, we know what stack slots values use so we know
what copies are required.

Use a good technique for shuffling values around on merge edges.

Improves performance of the go1 TimeParse benchmark by ~12%

Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
2015-12-21 23:12:05 +00:00
Keith Randall
c140df0326 [dev.ssa] cmd/compile: allocate the flag register in a separate pass
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass.  Regalloc then
operates normally on any flag recomputation instructions.

Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
2015-12-11 21:08:15 +00:00
Keith Randall
02f4d0a130 [dev.ssa] cmd/compile: start arguments as spilled
Declare a function's arguments as having already been
spilled so their use just requires a restore.

Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.

Rename the memory input to InputMem.  Use Arg for the
pre-spilled argument values.

Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-03 17:29:40 +00:00
Todd Neal
cd01c0be26 [dev.ssa] cmd/compile/internal/ssa: reorder fuse and dse
deadstore elimination currently works in a block, fusing before
performing dse eliminates ~1% more stores for make.bash

Change-Id: If5bbddac76bf42616938a8e8e84cb7441fa02f73
Reviewed-on: https://go-review.googlesource.com/16350
Reviewed-by: Keith Randall <khr@golang.org>
2015-10-27 22:16:12 +00:00
Josh Bleecher Snyder
a3f72956f1 [dev.ssa] cmd/compile: add allocs to pass stats
Also, improve HTML formatting.

Change-Id: I07e2482a30862e2091707f260a2c43d6e9a85d97
Reviewed-on: https://go-review.googlesource.com/14333
Reviewed-by: Todd Neal <todd@tneal.org>
2015-09-05 02:25:42 +00:00
Todd Neal
ec8a597cd2 [dev.ssa] cmd/compile: rewrite user nil check as OpIsNonNil
Rewite user nil checks as OpIsNonNil so our nil check elimination pass
can take advantage and remove redundant checks.

With make.bash this removes 10% more nilchecks (34110 vs 31088).

Change-Id: Ifb01d1b6d2d759f5e2a5aaa0470e1d5a2a680212
Reviewed-on: https://go-review.googlesource.com/14321
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-09-05 01:06:30 +00:00
Todd Neal
24dcede1c0 [dev.ssa] cmd/compile/ssa: add timing to compiler passes
Add timing/allocation information to each compiler pass for both the
console and html output.

Change-Id: I75833003b806a09b4fb1bbf63983258612cdb7b0
Reviewed-on: https://go-review.googlesource.com/14277
Reviewed-by: Keith Randall <khr@golang.org>
2015-09-04 15:19:12 +00:00
Keith Randall
9f954db170 [dev.ssa] cmd/compile: add decompose pass
Decompose breaks compound objects up into pieces that can be
operated on by the target architecture.  The decompose pass only
does phi ops, the rest is done by the rewrite rules in generic.rules.

Compound objects include strings,slices,interfaces,structs,arrays.

Arrays aren't decomposed because of indexing (we could support
constant indexes, but dynamic indexes can't be handled using SSA).
Structs will come in a subsequent CL.

TODO: after this pass we have lost the association between, e.g.,
a string's pointer and its size.  It would be nice if we could keep
that information around for debugging info somehow.

Change-Id: I6379ab962a7beef62297d0f68c421f22aa0a0901
Reviewed-on: https://go-review.googlesource.com/13683
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-20 21:09:24 +00:00
Josh Bleecher Snyder
35fb514596 [dev.ssa] cmd/compile: add HTML SSA printer
This is an initial implementation.
There are many rough edges and TODOs,
which will hopefully be polished out
with use.

Fixes #12071.

Change-Id: I1d6fd5a343063b5200623bceef2c2cfcc885794e
Reviewed-on: https://go-review.googlesource.com/13472
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-13 21:56:06 +00:00
Josh Bleecher Snyder
f91ff1a509 [dev.ssa] cmd/compile: add SSA pass to move values closer to uses
Even this very simple, restricted initial implementation helps.

While running make.bash, it moves 84437 values
to new, closer homes.

As a concrete example:

func f_ssa(i, j int, b bool) int {
	if !b {
		return 0
	}
	return i + j
}

It cuts off one stack slot and two instructions:

Before:

"".f_ssa t=1 size=96 value=0 args=0x20 locals=0x18
	0x0000 00000 (x.go:3)	TEXT	"".f_ssa(SB), $24-32
	0x0000 00000 (x.go:3)	SUBQ	$24, SP
	0x0004 00004 (x.go:3)	FUNCDATA	$0, "".gcargs·0(SB)
	0x0004 00004 (x.go:3)	FUNCDATA	$1, "".gclocals·1(SB)
	0x0004 00004 (x.go:5)	MOVQ	$0, AX
	0x0006 00006 (x.go:3)	MOVQ	32(SP), CX
	0x000b 00011 (x.go:3)	MOVQ	40(SP), DX
	0x0010 00016 (x.go:3)	LEAQ	48(SP), BX
	0x0015 00021 (x.go:3)	MOVB	(BX), BPB
	0x0018 00024 (x.go:3)	MOVQ	$0, SI
	0x001a 00026 (x.go:3)	MOVQ	SI, 56(SP)
	0x001f 00031 (x.go:3)	TESTB	BPB, BPB
	0x0022 00034 (x.go:5)	MOVQ	AX, (SP)
	0x0026 00038 (x.go:3)	MOVQ	CX, 8(SP)
	0x002b 00043 (x.go:3)	MOVQ	DX, 16(SP)
	0x0030 00048 (x.go:4)	JEQ	74
	0x0032 00050 (x.go:3)	MOVQ	8(SP), AX
	0x0037 00055 (x.go:3)	MOVQ	16(SP), CX
	0x003c 00060 (x.go:7)	LEAQ	(AX)(CX*1), DX
	0x0040 00064 (x.go:7)	MOVQ	DX, 56(SP)
	0x0045 00069 (x.go:3)	ADDQ	$24, SP
	0x0049 00073 (x.go:3)	RET
	0x004a 00074 (x.go:5)	MOVQ	(SP), AX
	0x004e 00078 (x.go:5)	MOVQ	AX, 56(SP)
	0x0053 00083 (x.go:3)	JMP	69

After:

"".f_ssa t=1 size=80 value=0 args=0x20 locals=0x10
	0x0000 00000 (x.go:3)	TEXT	"".f_ssa(SB), $16-32
	0x0000 00000 (x.go:3)	SUBQ	$16, SP
	0x0004 00004 (x.go:3)	FUNCDATA	$0, "".gcargs·0(SB)
	0x0004 00004 (x.go:3)	FUNCDATA	$1, "".gclocals·1(SB)
	0x0004 00004 (x.go:3)	MOVQ	32(SP), AX
	0x0009 00009 (x.go:3)	MOVQ	24(SP), CX
	0x000e 00014 (x.go:3)	LEAQ	40(SP), DX
	0x0013 00019 (x.go:3)	MOVB	(DX), BL
	0x0015 00021 (x.go:3)	MOVQ	$0, BP
	0x0017 00023 (x.go:3)	MOVQ	BP, 48(SP)
	0x001c 00028 (x.go:3)	TESTB	BL, BL
	0x001e 00030 (x.go:3)	MOVQ	AX, (SP)
	0x0022 00034 (x.go:3)	MOVQ	CX, 8(SP)
	0x0027 00039 (x.go:4)	JEQ	64
	0x0029 00041 (x.go:3)	MOVQ	8(SP), AX
	0x002e 00046 (x.go:3)	MOVQ	(SP), CX
	0x0032 00050 (x.go:7)	LEAQ	(AX)(CX*1), DX
	0x0036 00054 (x.go:7)	MOVQ	DX, 48(SP)
	0x003b 00059 (x.go:3)	ADDQ	$16, SP
	0x003f 00063 (x.go:3)	RET
	0x0040 00064 (x.go:5)	MOVQ	$0, AX
	0x0042 00066 (x.go:5)	MOVQ	AX, 48(SP)
	0x0047 00071 (x.go:3)	JMP	59

Of course, the old backend is still well ahead:

"".f_ssa t=1 size=48 value=0 args=0x20 locals=0x0
	0x0000 00000 (x.go:3)	TEXT	"".f_ssa(SB), $0-32
	0x0000 00000 (x.go:3)	NOP
	0x0000 00000 (x.go:3)	NOP
	0x0000 00000 (x.go:3)	FUNCDATA	$0, gclocals·a8eabfc4a4514ed6b3b0c61e9680e440(SB)
	0x0000 00000 (x.go:3)	FUNCDATA	$1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB)
	0x0000 00000 (x.go:4)	CMPB	"".b+24(FP), $0
	0x0005 00005 (x.go:4)	JNE	17
	0x0007 00007 (x.go:5)	MOVQ	$0, "".~r3+32(FP)
	0x0010 00016 (x.go:5)	RET
	0x0011 00017 (x.go:7)	MOVQ	"".i+8(FP), BX
	0x0016 00022 (x.go:7)	MOVQ	"".j+16(FP), BP
	0x001b 00027 (x.go:7)	ADDQ	BP, BX
	0x001e 00030 (x.go:7)	MOVQ	BX, "".~r3+32(FP)
	0x0023 00035 (x.go:7)	RET

Some regalloc improvements should help considerably.

Change-Id: I95bb5dd83e56afd70ae4e983f1d32dffd0c3d46a
Reviewed-on: https://go-review.googlesource.com/13142
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-05 02:37:24 +00:00
Keith Randall
d1c15a0e3e [dev.ssa] cmd/compile/internal/ssa: implement ITAB
Implement ITAB, selecting the itable field of an interface.

Soften the lowering check to allow lowerings that leave
generic but dead ops behind.  (The ITAB lowering does this.)

Change-Id: Icc84961dd4060d143602f001311aa1d8be0d7fc0
Reviewed-on: https://go-review.googlesource.com/13144
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-04 23:32:42 +00:00
Josh Bleecher Snyder
165c1c16d1 [dev.ssa] cmd/compile: provide stack trace for caught panics
Change-Id: I9cbb6d53a8c2302222b13d2f33b081b704208b8a
Reviewed-on: https://go-review.googlesource.com/12932
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2015-07-30 20:31:35 +00:00
Josh Bleecher Snyder
61aa0953e5 [dev.ssa] cmd/compile: implement control flow handling
Add label and goto checks and improve test coverage.

Implement OSWITCH and OSELECT.

Implement OBREAK and OCONTINUE.

Allow generation of code in dead blocks.

Change-Id: Ibebb7c98b4b2344f46d38db7c9dce058c56beaac
Reviewed-on: https://go-review.googlesource.com/12445
Reviewed-by: Keith Randall <khr@golang.org>
2015-07-23 00:45:26 +00:00
Josh Bleecher Snyder
9b048527db [dev.ssa] cmd/compile/ssa: handle nested dead blocks
removePredecessor can change which blocks are live.
However, it cannot remove dead blocks from the function's
slice of blocks because removePredecessor may have been
called from within a function doing a walk of the blocks.

CL 11879 did not handle this correctly and broke the build.

To fix this, mark the block as dead but leave its actual
removal for a deadcode pass. Blocks that are dead must have
no successors, predecessors, values, or control values,
so they will generally be ignored by other passes.
To be safe, we add a deadcode pass after the opt pass,
which is the only other pass that calls removePredecessor.

Two alternatives that I considered and discarded:

(1) Make all call sites aware of the fact that removePrecessor
might make arbitrary changes to the list of blocks. This
will needlessly complicate callers.

(2) Handle the things that can go wrong in practice when
we encounter a dead-but-not-removed block. CL 11930 takes
this approach (and the tests are stolen from that CL).
However, this is just patching over the problem.

Change-Id: Icf0687b0a8148ce5e96b2988b668804411b05bd8
Reviewed-on: https://go-review.googlesource.com/12004
Reviewed-by: Todd Neal <todd@tneal.org>
Reviewed-by: Michael Matloob <michaelmatloob@gmail.com>
2015-07-11 00:08:50 +00:00
Josh Bleecher Snyder
37ddc270ca [dev.ssa] cmd/compile/ssa: add -f suffix to logging methods
Requested in CL 11380.

Change-Id: Icf0d23fb8d383c76272401e363cc9b2169d11403
Reviewed-on: https://go-review.googlesource.com/11450
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-06-24 21:48:26 +00:00
Josh Bleecher Snyder
2a846d2bd3 [dev.ssa] cmd/compile/ssa: add nilcheckelim pass
The nilcheckelim pass eliminates unnecessary nil checks.
The initial implementation removes redundant nil checks.
See the comments in nilcheck.go for ideas for future
improvements.

The efficacy of the cse pass has a significant impact
on this efficacy of this pass.

There are 886 nil checks in the parts of the standard
library that SSA can currently compile (~20%).

This pass eliminates 75 (~8.5%) of them.

As a data point, with a more aggressive but unsound
cse pass that treats many more types as identical,
this pass eliminates 115 (~13%) of the nil checks.

Change-Id: I13e567a39f5f6909fc33434d55c17a7e3884a704
Reviewed-on: https://go-review.googlesource.com/11430
Reviewed-by: Alan Donovan <adonovan@google.com>
2015-06-24 19:56:51 +00:00
Josh Bleecher Snyder
8c6abfeacb [dev.ssa] cmd/compile/ssa: separate logging, work in progress, and fatal errors
The SSA implementation logs for three purposes:

	* debug logging
	* fatal errors
	* unimplemented features

Separating these three uses lets us attempt an SSA
implementation for all functions, not just
_ssa functions. This turns the entire standard
library into a compilation test, and makes it
easy to figure out things like
"how much coverage does SSA have now" and
"what should we do next to get more coverage?".

Functions called _ssa are still special.
They log profusely by default and
the output of the SSA implementation
is used. For all other functions,
logging is off, and the implementation
is built and discarded, due to lack of
support for the runtime.

While we're here, fix a few minor bugs and
add some extra Unimplementeds to allow
all.bash to pass.

As of now, SSA handles 20.79% of the functions
in the standard library (689 of 3314).

The top missing features are:

 10.03%  2597 SSA unimplemented: zero for type error not implemented
  7.79%  2016 SSA unimplemented: addr: bad op DOTPTR
  7.33%  1898 SSA unimplemented: unhandled expr EQ
  6.10%  1579 SSA unimplemented: unhandled expr OROR
  4.91%  1271 SSA unimplemented: unhandled expr NE
  4.49%  1163 SSA unimplemented: unhandled expr LROT
  4.00%  1036 SSA unimplemented: unhandled expr LEN
  3.56%   923 SSA unimplemented: unhandled stmt CALLFUNC
  2.37%   615 SSA unimplemented: zero for type []byte not implemented
  1.90%   492 SSA unimplemented: unhandled stmt CALLMETH
  1.74%   450 SSA unimplemented: unhandled expr CALLINTER
  1.74%   450 SSA unimplemented: unhandled expr DOT
  1.71%   444 SSA unimplemented: unhandled expr ANDAND
  1.65%   426 SSA unimplemented: unhandled expr CLOSUREVAR
  1.54%   400 SSA unimplemented: unhandled expr CALLMETH
  1.51%   390 SSA unimplemented: unhandled stmt SWITCH
  1.47%   380 SSA unimplemented: unhandled expr CONV
  1.33%   345 SSA unimplemented: addr: bad op *
  1.30%   336 SSA unimplemented: unhandled OLITERAL 6

Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7
Reviewed-on: https://go-review.googlesource.com/11160
Reviewed-by: Keith Randall <khr@golang.org>
2015-06-21 02:56:36 +00:00
Keith Randall
8d32360bdd [dev.ssa] cmd/internal/ssa: add deadstore pass
Eliminate dead stores.  Dead stores are those which are
unconditionally followed by another store to the same location, with
no intervening load.

Just a simple intra-block implementation for now.

Change-Id: I2bf54e3a342608fc4e01edbe1b429e83f24764ab
Reviewed-on: https://go-review.googlesource.com/10386
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-06-04 23:03:41 +00:00
Michael Matloob
7bdecbf840 [dev.ssa] cmd/compile/internal/ssa: remove cgen pass
Code generation is now done in genssa.
Also remove the asm field in opInfo. It's no longer used.

Change-Id: I65fffac267e138fd424b2ef8aa7ed79f0ebb63d5
Reviewed-on: https://go-review.googlesource.com/10539
Reviewed-by: Keith Randall <khr@golang.org>
2015-05-29 18:54:49 +00:00
Keith Randall
067e8dfd82 [dev.ssa] Merge remote-tracking branch 'origin/master' into mergebranch
Semi-regular merge of tip to dev.ssa.

Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.

Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d
2015-05-28 13:51:18 -07:00