Commit graph

67 commits

Author SHA1 Message Date
Josh Bleecher Snyder
e5c9358fe2 cmd/compile: move writebarrier pass after dse
This avoids generating writeBarrier.enabled
blocks for dead stores.

Change-Id: Ib11d8e2ba952f3f1f01d16776e40a7200a7683cf
Reviewed-on: https://go-review.googlesource.com/42012
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-04-29 16:37:02 +00:00
Keith Randall
39ce5907ca cmd/compile: rotate loops so conditional branch is at the end
Old loops look like this:
   loop:
     CMPQ ...
     JGE exit
     ...
     JMP loop
   exit:

New loops look like this:
    JMP entry
  loop:
    ...
  entry:
    CMPQ ...
    JLT loop

This removes one instruction (the unconditional jump) from
the inner loop.
Kinda surprisingly, it matters.

This is a bit different than the peeling that the old obj
library did in that we don't duplicate the loop exit test.
We just jump to the test.  I'm not sure if it is better or
worse to do that (peeling gets rid of the JMP but means more
code duplication), but this CL is certainly a much simpler
compiler change, so I'll try this way first.

The obj library used to do peeling before
CL https://go-review.googlesource.com/c/36205 turned it off.

Fixes #15837 (remove obj instruction reordering)
The reordering is already removed, this CL implements the only
part of that reordering that we'd like to keep.

Fixes #14758 (append loop)
name    old time/op    new time/op    delta
Foo-12     817ns ± 4%     538ns ± 0%  -34.08%   (p=0.000 n=10+9)
Bar-12     850ns ±11%     570ns ±13%  -32.88%  (p=0.000 n=10+10)

Update #19595 (BLAS slowdown)
name                       old time/op  new time/op  delta
DgemvMedMedNoTransIncN-12  13.2µs ± 9%  10.2µs ± 1%  -22.26%  (p=0.000 n=9+9)

Fixes #19633 (append loop)
name    old time/op    new time/op    delta
Foo-12     810ns ± 1%     540ns ± 0%  -33.30%   (p=0.000 n=8+9)

Update #18977 (Fannkuch11 regression)
name         old time/op    new time/op    delta
Fannkuch11-8                2.80s ± 0%     3.01s ± 0%  +7.47%   (p=0.000 n=9+10)
This one makes no sense.  There's strictly 1 less instruction in the
inner loop (17 instead of 18).  They are exactly the same instructions
except for the JMP that has been elided.

go1 benchmarks generally don't look very impressive.  But the gains for the
specific issues above make this CL still probably worth it.
name                      old time/op    new time/op    delta
BinaryTree17-8              2.32s ± 0%     2.34s ± 0%  +1.14%    (p=0.000 n=9+7)
Fannkuch11-8                2.80s ± 0%     3.01s ± 0%  +7.47%   (p=0.000 n=9+10)
FmtFprintfEmpty-8          44.1ns ± 1%    46.1ns ± 1%  +4.53%  (p=0.000 n=10+10)
FmtFprintfString-8         67.8ns ± 0%    74.4ns ± 1%  +9.80%   (p=0.000 n=10+9)
FmtFprintfInt-8            74.9ns ± 0%    78.4ns ± 0%  +4.67%   (p=0.000 n=8+10)
FmtFprintfIntInt-8          117ns ± 1%     123ns ± 1%  +4.69%   (p=0.000 n=9+10)
FmtFprintfPrefixedInt-8     160ns ± 1%     146ns ± 0%  -8.22%   (p=0.000 n=8+10)
FmtFprintfFloat-8           214ns ± 0%     206ns ± 0%  -3.91%    (p=0.000 n=8+8)
FmtManyArgs-8               468ns ± 0%     497ns ± 1%  +6.09%   (p=0.000 n=8+10)
GobDecode-8                6.16ms ± 0%    6.21ms ± 1%  +0.76%   (p=0.000 n=9+10)
GobEncode-8                4.90ms ± 0%    4.92ms ± 1%  +0.37%   (p=0.028 n=9+10)
Gzip-8                      209ms ± 0%     212ms ± 0%  +1.33%  (p=0.000 n=10+10)
Gunzip-8                   36.6ms ± 0%    38.0ms ± 1%  +4.03%    (p=0.000 n=9+9)
HTTPClientServer-8         84.2µs ± 0%    86.0µs ± 1%  +2.14%    (p=0.000 n=9+9)
JSONEncode-8               13.6ms ± 3%    13.8ms ± 1%  +1.55%   (p=0.003 n=9+10)
JSONDecode-8               53.2ms ± 5%    52.9ms ± 0%    ~     (p=0.280 n=10+10)
Mandelbrot200-8            3.78ms ± 0%    3.78ms ± 1%    ~      (p=0.661 n=10+9)
GoParse-8                  2.89ms ± 0%    2.94ms ± 2%  +1.50%  (p=0.000 n=10+10)
RegexpMatchEasy0_32-8      68.5ns ± 2%    68.9ns ± 1%    ~     (p=0.136 n=10+10)
RegexpMatchEasy0_1K-8       220ns ± 1%     225ns ± 1%  +2.41%  (p=0.000 n=10+10)
RegexpMatchEasy1_32-8      64.7ns ± 0%    64.5ns ± 0%  -0.28%  (p=0.042 n=10+10)
RegexpMatchEasy1_1K-8       348ns ± 1%     355ns ± 0%  +1.90%  (p=0.000 n=10+10)
RegexpMatchMedium_32-8      102ns ± 1%     105ns ± 1%  +2.95%  (p=0.000 n=10+10)
RegexpMatchMedium_1K-8     33.1µs ± 3%    32.5µs ± 0%  -1.75%  (p=0.000 n=10+10)
RegexpMatchHard_32-8       1.71µs ± 1%    1.70µs ± 1%  -0.84%   (p=0.002 n=10+9)
RegexpMatchHard_1K-8       51.1µs ± 0%    50.8µs ± 1%  -0.48%  (p=0.004 n=10+10)
Revcomp-8                   411ms ± 1%     402ms ± 0%  -2.22%   (p=0.000 n=10+9)
Template-8                 61.8ms ± 1%    59.7ms ± 0%  -3.44%    (p=0.000 n=9+9)
TimeParse-8                 306ns ± 0%     318ns ± 0%  +3.83%  (p=0.000 n=10+10)
TimeFormat-8                320ns ± 0%     318ns ± 1%  -0.53%   (p=0.012 n=7+10)

Change-Id: Ifaf29abbe5874e437048e411ba8f7cfbc9e1c94b
Reviewed-on: https://go-review.googlesource.com/38431
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-04-24 22:13:36 +00:00
Matthew Dempsky
1e3570ac86 cmd/internal/objabi: extract shared functionality from obj
Now only cmd/asm and cmd/compile depend on cmd/internal/obj. Changing
the assembler backends no longer requires reinstalling cmd/link or
cmd/addr2line.

There's also now one canonical definition of the object file format in
cmd/internal/objabi/doc.go, with a warning to update all three
implementations.

objabi is still something of a grab bag of unrelated code (e.g., flag
and environment variable handling probably belong in a separate "tool"
package), but this is still progress.

Fixes #15165.
Fixes #20026.

Change-Id: Ic4b92fac7d0d35438e0d20c9579aad4085c5534c
Reviewed-on: https://go-review.googlesource.com/40972
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-04-19 00:00:09 +00:00
Josh Bleecher Snyder
2cdb7f118a cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.

This is a giant CL, but it is almost entirely routine refactoring.

The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.

Passes toolstash -cmp.

Updates #15756

Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder
a5e3cac895 cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.

The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.

DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.

Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.

Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-17 05:21:42 +00:00
Austin Clements
8eb14e9de5 cmd/compile: accept string debug flags
The compiler's -d flag accepts string-valued flags, but currently only
for SSA debug flags. Extend it to support string values for other
flags. This also makes the syntax somewhat more sane so flag=value and
flag:value now both accept integers and strings.

Change-Id: Idd144d8479a430970cc1688f824bffe0a56ed2df
Reviewed-on: https://go-review.googlesource.com/37345
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-03-03 15:50:49 +00:00
Russ Cox
47ce87877b all: merge dev.inline into master
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
2017-02-01 09:47:23 -05:00
Robert Griesemer
472c792e0a [dev.inline] cmd/internal/src: introduce compact source position representation
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.

In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.

The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):

name       old time/op     new time/op     delta
Template       256ms ± 1%      262ms ± 2%    ~             (p=0.063 n=5+4)
Unicode        132ms ± 1%      135ms ± 2%    ~             (p=0.063 n=5+4)
GoTypes        891ms ± 1%      871ms ± 1%  -2.28%          (p=0.016 n=5+4)
Compiler       3.84s ± 2%      3.89s ± 2%    ~             (p=0.413 n=5+4)
MakeBash       47.1s ± 1%      46.2s ± 2%    ~             (p=0.095 n=5+5)

name       old user-ns/op  new user-ns/op  delta
Template        309M ± 1%       314M ± 2%    ~             (p=0.111 n=5+4)
Unicode         165M ± 1%       172M ± 9%    ~             (p=0.151 n=5+5)
GoTypes        1.14G ± 2%      1.12G ± 1%    ~             (p=0.063 n=5+4)
Compiler       5.00G ± 1%      4.96G ± 1%    ~             (p=0.286 n=5+4)

Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-09 22:43:22 +00:00
David Chase
7f1ff65c39 cmd/compile: insert scheduling checks on loop backedges
Loop breaking with a counter.  Benchmarked (see comments),
eyeball checked for sanity on popular loops.  This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.

Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test.  Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.

If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop.  Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.

This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.

Updates #17831.
Updates #10958.

Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-09 21:01:29 +00:00
Robert Griesemer
eaca0e0529 [dev.inline] cmd/internal/src: introduce NoPos and use it instead Pos{}
Using a variable instead of a composite literal makes
the code independent of implementation changes of Pos.

Per David Lazar's suggestion.

Change-Id: I336967ac12a027c51a728a58ac6207cb5119af4a
Reviewed-on: https://go-review.googlesource.com/34148
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-12-09 00:35:07 +00:00
Robert Griesemer
82d0caea2c [dev.inline] cmd/internal/src: make Pos implementation abstract
Adjust cmd/compile accordingly.

This will make it easier to replace the underlying implementation.

Change-Id: I33645850bb18c839b24785b6222a9e028617addb
Reviewed-on: https://go-review.googlesource.com/34133
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:31:28 +00:00
Cherry Zhang
f6aec889e1 cmd/compile: add a writebarrier phase in SSA
When the compiler insert write barriers, the frontend makes
conservative decisions at an early stage. This may have false
positives which result in write barriers for stack writes.

A new phase, writebarrier, is added to the SSA backend, to delay
the decision and eliminate false positives. The frontend still
makes conservative decisions. When building SSA, instead of
emitting runtime calls directly, it emits WB ops (StoreWB,
MoveWB, etc.), which will be expanded to branches and runtime
calls in writebarrier phase. Writes to static locations on stack
are detected and write barriers are removed.

All write barriers of stack writes found by the script from
issue #17330 are eliminated (except two false positives).

Fixes #17330.

Change-Id: I9bd66333da9d0ceb64dcaa3c6f33502798d1a0f8
Reviewed-on: https://go-review.googlesource.com/31131
Reviewed-by: Austin Clements <austin@google.com>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-25 21:53:40 +00:00
David Chase
a190f3c8a3 cmd/compile: enable flag-specified dump of specific phase+function
For very large input files, use of GOSSAFUNC to obtain a dump
after compilation steps can lead to both unwieldy large output
files and unwieldy larger processes (because the output is
buffered in a string).  This flag

  -d=ssa/<phase>/dump:<function name>

provides finer control of what is dumped, into a smaller
file, and with less memory overhead in the running compiler.
The special phase name "build" is added to allow printing
of the just-built ssa before any transformations are applied.

This was helpful in making sense of the gogo/protobuf
problems.

The output format was tweaked to remove gratuitous spaces,
and a crude -d=ssa/help help text was added.

Change-Id: If7516e22203420eb6ed3614f7cee44cb9260f43e
Reviewed-on: https://go-review.googlesource.com/23044
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-20 22:23:56 +00:00
Matthew Dempsky
8c24bff52b cmd/compile: layout stack frame during SSA
Identify live stack variables during SSA and compute the stack frame
layout earlier so that we can emit instructions with the correct
offsets upfront.

Passes toolstash/buildall.

Change-Id: I191100dba274f1e364a15bdcfdc1d1466cdd1db5
Reviewed-on: https://go-review.googlesource.com/30216
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-04 17:07:36 +00:00
Keith Randall
75ce89c20d cmd/compile: cache CFG-dependent computations
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest.  Multiple phases use this
information and we end up recomputing some of it.  Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.

Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
2016-09-19 16:00:13 +00:00
Keith Randall
3134ab3c2d cmd/compile: redo nil checks
Get rid of BlockCheck. Josh goaded me into it, and I went
down a rabbithole making it happen.

NilCheck now panics if the pointer is nil and returns void, as before.
BlockCheck is gone, and NilCheck is no longer a Control value for
any block. It just exists (and deadcode knows not to throw it away).

I rewrote the nilcheckelim pass to handle this case.  In particular,
there can now be multiple NilCheck ops per block.

I moved all of the arch-dependent nil check elimination done as
part of ssaGenValue into its own proper pass, so we don't have to
duplicate that code for every architecture.

Making the arch-dependent nil check its own pass means I needed
to add a bunch of flags to the opcode table so I could write
the code without arch-dependent ops everywhere.

Change-Id: I419f891ac9b0de313033ff09115c374163416a9f
Reviewed-on: https://go-review.googlesource.com/29120
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-09-15 02:42:13 +00:00
Keith Randall
305a0ac123 cmd/compile: move phi args which are constants closer to the phi
entry:
   x = MOVQconst [7]
   ...
b1:
   goto b2
b2:
   v = Phi(x, y, z)

Transform that program to:

entry:
   ...
b1:
   x = MOVQconst [7]
   goto b2
b2:
   v = Phi(x, y, z)

This CL moves constant-generating instructions used by a phi to the
appropriate immediate predecessor of the phi's block.

We used to put all constants in the entry block.  Unfortunately, in
large functions we have lots of constants at the start of the
function, all of which are used by lots of phis throughout the
function.  This leads to the constants being live through most of the
function (especially if there is an outer loop).  That's an O(n^2)
problem.

Note that most of the non-phi uses of constants have already been
folded into instructions (ADDQconst, MOVQstoreconst, etc.).

This CL may be generally useful for other instances of compiler
slowness, I'll have to check.  It may cause some programs to run
slower, but probably not by much, as rematerializeable values like
these constants are allocated late (not at their originally scheduled
location) anyway.

This CL is definitely a minimal change that can be considered for 1.7.
We probably want to do a better job in the tighten pass generally, not
just for phi args.  Leaving that for 1.8.

Update #16407

Change-Id: If112a8883b4ef172b2f37dea13e44bda9346c342
Reviewed-on: https://go-review.googlesource.com/25046
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-07-21 15:17:23 +00:00
David Chase
6b99fb5bea cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.

Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)

Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs.  This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.

Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.

Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:

	 #blocks	#vars	    product
 max	6171	28180	173,898,780
99.9%	1641	 6548	 10,401,878
  99%	 463	 1909	    873,721
  95%	 152	  639	     95,235
  90%	  84	  359	     30,021

The old algorithm is indeed usually fastest, for 99%ile
values of usually.

The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.

With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	4m35.200s
user	13m16.644s
sys	0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles

With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	10m36.569s
user	25m52.286s
sys	4m3.696s
and pprof reports 8.3GB allocated in the same larger profile

With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).

Fixes #15537.

Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-16 21:08:05 +00:00
Alexandru Moșoi
8b20fd000d cmd/compile: transform some Phis into Or8.
func f(a, b bool) bool {
          return a || b
}

is now a single instructions (excluding loading and unloading the arguments):
      v10 = ORB <bool> v11 v12 : AX

Change-Id: Iff63399410cb46909f4318ea1c3f45a029f4aa5e
Reviewed-on: https://go-review.googlesource.com/21872
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-19 22:04:30 +00:00
Alexandru Moșoi
e0611b1664 cmd/compile: use shared dom tree for cse, too
Missed this in the previous CL where the shared
dom tree was introduced.

Change-Id: If0bd85d4b4567d7e87814ed511603b1303ab3903
Reviewed-on: https://go-review.googlesource.com/21970
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-13 12:42:44 +00:00
Alexandru Moșoi
9743e4b031 cmd/compile: share dominator tree among many passes
These passes do not modify the dominator tree too much.

% benchstat old.txt new.txt
name       old time/op     new time/op     delta
Template       335ms ± 3%      325ms ± 8%    ~             (p=0.074 n=8+9)
GoTypes        1.05s ± 1%      1.05s ± 3%    ~            (p=0.095 n=9+10)
Compiler       5.37s ± 4%      5.29s ± 1%  -1.42%         (p=0.022 n=9+10)
MakeBash       34.9s ± 3%      34.4s ± 2%    ~            (p=0.095 n=9+10)

name       old alloc/op    new alloc/op    delta
Template      55.4MB ± 0%     54.9MB ± 0%  -0.81%        (p=0.000 n=10+10)
GoTypes        179MB ± 0%      178MB ± 0%  -0.89%        (p=0.000 n=10+10)
Compiler       807MB ± 0%      798MB ± 0%  -1.10%        (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        498k ± 0%       496k ± 0%  -0.29%          (p=0.000 n=9+9)
GoTypes        1.42M ± 0%      1.41M ± 0%  -0.24%        (p=0.000 n=10+10)
Compiler       5.61M ± 0%      5.60M ± 0%  -0.12%        (p=0.000 n=10+10)

Change-Id: I4cd20cfba3f132ebf371e16046ab14d7e42799ec
Reviewed-on: https://go-review.googlesource.com/21806
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-12 14:44:26 +00:00
Keith Randall
7f53391f6b cmd/compile: fix -N build
The decomposer of builtin types is confused by having structs
still around from the user-type decomposer.  They're all dead though,
so just enabling a deadcode pass fixes things.

Change-Id: I2df6bc7e829be03eabfd24c8dda1bff96f3d7091
Reviewed-on: https://go-review.googlesource.com/21839
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-04-11 19:43:47 +00:00
Alexandru Moșoi
1747788c56 cmd/compile: add a pass to print bound checks
Since BCE happens over several passes (opt, loopbce, prove)
it's easy to regress especially with rewriting.

The pass is only activated with special debug flag.

Change-Id: I46205982e7a2751156db8e875d69af6138068f59
Reviewed-on: https://go-review.googlesource.com/21510
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-05 12:46:59 +00:00
Alexandru Moșoi
b91cc53033 cmd/compile/internal/ssa: BCE for induction variables
There are 5293 loop in the main go repository.
A survey of the top most common for loops:

     18 for __k__ := 0; i < len(sa.Addr); i++ {
     19 for __k__ := 0; ; i++ {
     19 for __k__ := 0; i < 16; i++ {
     25 for __k__ := 0; i < length; i++ {
     30 for __k__ := 0; i < 8; i++ {
     49 for __k__ := 0; i < len(s); i++ {
     67 for __k__ := 0; i < n; i++ {
    376 for __k__ := range __slice__ {
    685 for __k__, __v__ := range __slice__ {
   2074 for __, __v__ := range __slice__ {

The algorithm to find induction variables handles all cases
with an upper limit. It currently doesn't find related induction
variables such as c * ind or c + ind.

842 out of 22954 bound checks are removed for src/make.bash.
1957 out of 42952 bounds checks are removed for src/all.bash.

Things to do in follow-up CLs:
* Find the associated pointer for `for _, v := range a {}`
* Drop the NilChecks on the pointer.
* Replace the implicit induction variable by a loop over the pointer

Generated garbage can be reduced if we share the sdom between passes.

% benchstat old.txt new.txt
name       old time/op     new time/op     delta
Template       337ms ± 3%      333ms ± 3%    ~             (p=0.258 n=9+9)
GoTypes        1.11s ± 2%      1.10s ± 2%    ~           (p=0.912 n=10+10)
Compiler       5.25s ± 1%      5.29s ± 2%    ~             (p=0.077 n=9+9)
MakeBash       33.5s ± 1%      34.1s ± 2%  +1.85%          (p=0.011 n=9+9)

name       old alloc/op    new alloc/op    delta
Template      63.6MB ± 0%     63.9MB ± 0%  +0.52%         (p=0.000 n=10+9)
GoTypes        218MB ± 0%      219MB ± 0%  +0.59%         (p=0.000 n=10+9)
Compiler       978MB ± 0%      985MB ± 0%  +0.69%        (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        582k ± 0%       583k ± 0%  +0.10%        (p=0.000 n=10+10)
GoTypes        1.78M ± 0%      1.78M ± 0%  +0.12%        (p=0.000 n=10+10)
Compiler       7.68M ± 0%      7.69M ± 0%  +0.05%        (p=0.000 n=10+10)

name       old text-bytes  new text-bytes  delta
HelloSize       581k ± 0%       581k ± 0%  -0.08%        (p=0.000 n=10+10)
CmdGoSize      6.40M ± 0%      6.39M ± 0%  -0.08%        (p=0.000 n=10+10)

name       old data-bytes  new data-bytes  delta
HelloSize      3.66k ± 0%      3.66k ± 0%    ~     (all samples are equal)
CmdGoSize       134k ± 0%       134k ± 0%    ~     (all samples are equal)

name       old bss-bytes   new bss-bytes   delta
HelloSize       126k ± 0%       126k ± 0%    ~     (all samples are equal)
CmdGoSize       149k ± 0%       149k ± 0%    ~     (all samples are equal)

name       old exe-bytes   new exe-bytes   delta
HelloSize       947k ± 0%       946k ± 0%  -0.01%        (p=0.000 n=10+10)
CmdGoSize      9.92M ± 0%      9.91M ± 0%  -0.06%        (p=0.000 n=10+10)

Change-Id: Ie74bdff46fd602db41bb457333d3a762a0c3dc4d
Reviewed-on: https://go-review.googlesource.com/20517
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
2016-04-01 09:37:58 +00:00
David Chase
8eec2bbfbc cmd/compile: added some intrinsics to SSA back end
One intrinsic was needed to help get the very best
performance out of a future GC; as long as that one was
being added, I also added Bswap since that is sometimes
a handy thing to have.  I had intended to fill out the
bit-scan intrinsic family, but the mismatch between the
"scan forward" instruction and "count leading zeroes"
was large enough to cause me to leave it out -- it poses
a dilemma that I'd rather dodge right now.

These intrinsics are not exposed for general use.
That's a separate issue requiring an API proposal change
( https://github.com/golang/proposal )

All intrinsics are tested, both that they are substituted
on the appropriate architecture, and that they produce the
expected result.

Change-Id: I5848037cfd97de4f75bdc33bdd89bba00af4a8ee
Reviewed-on: https://go-review.googlesource.com/20564
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-28 16:29:59 +00:00
David Chase
09a9ce60c7 cmd/compile: get gcflags to bootstrap; ssa debug opts for "all"
This is intended to help debug compiler problems that pop
up in the bootstrap phase of make.bash.  GO_GCFLAGS does not
normally apply there.  Options-for-all phases is intended
to allow crude tracing (and full timing) by turning on timing
for all phases, not just one.

Phase names can also be specified using a regular expression,
for example
BOOT_GO_GCFLAGS=-d='ssa/~^.*scc$/off' \
GO_GCFLAGS='-d=ssa/~^.*scc$/off' ./make.bash

I just added this because it was the fastest way to get
me to a place where I could easily debug the compiler.

Change-Id: I0781f3e7c19651ae7452fa25c2d54c9a245ef62d
Reviewed-on: https://go-review.googlesource.com/20775
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-18 01:04:47 +00:00
David Chase
2d56dee61b cmd/compile: escape analysis explanations added to -m -m output
This should probably be considered "experimental" at this stage, but
what it needs is feedback from adventurous adopters.  I think the data
structure used for describing escape reasons might be extendable to
allow a cleanup of the underlying algorithms, which suffers from
insufficiently separated concerns (the graph does not deal well with
escape level adjustments, so it is augmented by a second custom-walk
portion of the "flood" phase. It would be better to put it all,
including level adjustments, in a single graph structure, and then
simply flood the graph.

Tweaked to avoid allocations in the no-logging case.

Modified run.go to ignore lines with leading "#" in the output (since
it can never match a line), and in -update_errors to ignore leading
tabs in output lines and to normalize embedded filenames.

Currently requires -m -m because otherwise the noise/update
burden for the other escape tests is considerable.

There is a partial test.  Existing escape analysis tests seem to
cover all except the panic case and what looks like it might be
unreachable code in escape analysis.

Fixes #10526.

Change-Id: I2524fdec54facae48b00b2548e25d9e46fcaf832
Reviewed-on: https://go-review.googlesource.com/18041
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-17 13:29:48 +00:00
Keith Randall
5305a329d8 cmd/compile: turn off SSA internal consistency checks
They've been on for a few weeks of general use and nothing
has tripped up on them yet.

Makes the compiler ~18% faster.

Change-Id: I42d7bbc0581597f9cf4fb28989847814c81b08a2
Reviewed-on: https://go-review.googlesource.com/20741
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-15 22:25:48 +00:00
Alexandru Moșoi
13f74db304 cmd/compile: fix no-opt build after moving decomposing user functions
decompose-builtin pass requires an opt pass, but -N disables
late-opt, the only opt pass (out of two) that happens
after decompose-builtin.  This CL enables both 'opt' and 'late opt'
passes. The extra compile time for 'late opt' in negligible
since most rewrites were already done in the first 'opt'
(also measured before). We should put some effort in splitting the
generic rules into required and optional.

Also update generic.rules comments about lowering
of StringMake and SliceMake.

Tested with GO_GCFLAGS=-N ./all.bash

Change-Id: I92999681aaa02587b6dc6e32ce997a91f1fc9499
Reviewed-on: https://go-review.googlesource.com/20682
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-14 18:44:10 +00:00
Alexandru Moșoi
8ec80176d4 cmd/compile: move decompose builtin closer to late opt
* Shaves about 10k from pkg/tools/linux_amd64.
* Was suggested by drchase before
* Found by looking at ssa output of #14758

Change-Id: If2c4ddf3b2603d4dfd8fb4d9199b9a3dcb05b17d
Reviewed-on: https://go-review.googlesource.com/20570
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-13 22:12:01 +00:00
Alexandru Moșoi
dfcb853d9d cmd/compile/internal/ssa: lower builtins much later
* Move lowering into a separate pass.
* SliceLen/SliceCap is now available to various intermediate passes
which use useful for bounds checking.
* Add a second opt pass to handle the new opportunities

Decreases the code size of binaries in pkg/tool/linux_amd64
by ~45K.

Updates #14564 #14606

Change-Id: I5b2bd6202181c50623a3585fbf15c0d6db6d4685
Reviewed-on: https://go-review.googlesource.com/20172
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-09 11:08:59 +00:00
Keith Randall
c0740fed37 cmd/compile: more ssa config flags
To turn ssa compilation on or off altogether, use
-ssa=1 or -ssa=0.  Default is on.

To turn on or off consistency checks, do
-d=ssa/check/on or -d=ssa/check/off.  Default is on for now.

Change-Id: I277e0311f538981c8b9c62e7b7382a0c8755ce4c
Reviewed-on: https://go-review.googlesource.com/20217
Reviewed-by: David Chase <drchase@google.com>
2016-03-04 19:03:12 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
David Chase
6b3462c784 [dev.ssa] cmd/compile: adjust branch likeliness for calls/loops
Static branch predictions (which guide block ordering) are
adjusted based on:

loop/not-loop     (favor looping)
abnormal-exit/not (avoid panic)
call/not-call     (avoid call)
ret/default       (treat returns as rare)

This appears to make no difference in performance of real
code, meaning the compiler itself.  The earlier version of
this has been stripped down to help make the cost of this
only-aesthetic-on-Intel phase be as cheap as possible (we
probably want information about inner loops for improving
register allocation, but because register allocation follows
close behind this pass, conceivably the information could be
reused -- so we might do this anyway just to normalize
output).

For a ./make.bash that takes 200 user seconds, about .75
second is reported in likelyadjust (summing nanoseconds
reported with -d=ssa/likelyadjust/time ).

Upstream predictions are respected.
Includes test, limited to build on amd64 only.
Did several iterations on the debugging output to allow
some rough checks on behavior.
Debug=1 logging notes agree/disagree with earlier passes,
allowing analysis like the following:

Run on make.bash:
GO_GCFLAGS=-d=ssa/likelyadjust/debug \
   ./make.bash >& lkly5.log

grep 'ranch prediction' lkly5.log | wc -l
   78242 // 78k predictions

grep 'ranch predi' lkly5.log | egrep -v 'agrees with' | wc -l
   29633 // 29k NEW predictions

grep 'disagrees' lkly5.log | wc -l
     444 // contradicted 444 times

grep '< exit' lkly5.log | wc -l
   10212 // 10k exit predictions

grep '< exit' lkly5.log | egrep 'disagrees' | wc -l
       5 // 5 contradicted by previous prediction

grep '< exit' lkly5.log | egrep -v 'agrees' | wc -l
     702 // 702-5 redundant with previous prediction

grep '< call' lkly5.log | egrep -v 'agrees' | wc -l
   16699 // 16k new call predictions

grep 'stay in loop' lkly5.log | egrep -v 'agrees' | wc -l
    3951 // 4k new "remain in loop" predictions

Fixes #11451.

Change-Id: Iafb0504f7030d304ef4b6dc1aba9a5789151a593
Reviewed-on: https://go-review.googlesource.com/19995
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-01 20:09:41 +00:00
Alexandru Moșoi
e197f467d5 [dev.ssa] cmd/compile/internal/ssa: simplify boolean phis
* Decreases the generated code slightly.
* Similar to phiopt pass from gcc, except it only handles
booleans. Handling Eq/Neq had no impact on the generated code.

name       old time/op     new time/op     delta
Template       453ms ± 4%      451ms ± 4%    ~           (p=0.468 n=24+24)
GoTypes        1.55s ± 1%      1.55s ± 2%    ~           (p=0.287 n=24+25)
Compiler       6.53s ± 2%      6.56s ± 1%  +0.46%        (p=0.050 n=23+23)
MakeBash       45.8s ± 2%      45.7s ± 2%    ~           (p=0.866 n=24+25)

name       old text-bytes  new text-bytes  delta
HelloSize       676k ± 0%       676k ± 0%    ~     (all samples are equal)
CmdGoSize      8.07M ± 0%      8.07M ± 0%  -0.03%        (p=0.000 n=25+25)

Change-Id: Ia62477b7554127958a14cb27f85849b095d63663
Reviewed-on: https://go-review.googlesource.com/20090
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-01 17:56:13 +00:00
Alexandru Moșoi
bdea1d58cf [dev.ssa] cmd/compile/internal/ssa: remove proven redundant controls.
* It does very simple bounds checking elimination. E.g.
removes the second check in for i := range a { a[i]++; a[i++]; }
* Improves on the following redundant expression:
return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
* Linear in the number of block edges.

I patched in CL 12960 that does bounds, nil and constant propagation
to make sure this CL is not just redundant. Size of pkg/tool/linux_amd64/*
(excluding compile which is affected by this change):

With IsInBounds and IsSliceInBounds
-this -12960 92285080
+this -12960 91947416
-this +12960 91978976
+this +12960 91923088

Gain is ~110% of 12960.

Without IsInBounds and IsSliceInBounds (older run)
-this -12960 95515512
+this -12960 95492536
-this +12960 95216920
+this +12960 95204440

Shaves 22k on its own.

* Can we handle IsInBounds better with this? In
for i := range a { a[i]++; } the bounds checking at a[i]
is not eliminated.

Change-Id: I98957427399145fb33693173fd4d5a8d71c7cc20
Reviewed-on: https://go-review.googlesource.com/19710
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-28 19:48:20 +00:00
David Chase
378a863682 [dev.ssa] cmd/compile: enhance command line option processing for SSA
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats

Spaces in the phase names can be specified with an
underscore.  Flags currently parsed (not necessarily
recognized by the phases yet) are:

   on, off, mem, time, debug, stats, and test

On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.

The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output.  Output fields
are separated by tabs to ease digestion by awk and
spreadsheets.  For example,
	if f.pass.stats > 0 {
		f.logStat("CSE REWRITES", rewrites)
	}

Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-25 20:32:15 +00:00
Todd Neal
94f0245114 [dev.ssa] cmd/compile: add a zero arg cse pass
Add an initial cse pass that only operates on zero argument
values.  This removes the need for a special case in cse for removing
OpSB and speeds up arithConst_ssa.go compilation by 9% while slowing
"test -c net/http" by 1.5%.

Change-Id: Id1500482485426f66c6c2eba75eeaf4f19c8a889
Reviewed-on: https://go-review.googlesource.com/19454
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-22 13:06:18 +00:00
Keith Randall
7f7f7cddec [dev.ssa] cmd/compile: split decompose pass in two
A first pass to decompose user types (structs, maybe
arrays someday), and a second pass to decompose builtin
types (strings, interfaces, slices, complex).  David wants
this for value range analysis so he can have structs decomposed
but slices and friends will still be intact and he can deduce
things like the length of a slice is >= 0.

Change-Id: Ia2300d07663329b51ed6270cfed21d31980daa7c
Reviewed-on: https://go-review.googlesource.com/19340
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
2016-02-09 02:18:31 +00:00
David Chase
58cfa40419 [dev.ssa] cmd/compile: fix for bug in cse speed improvements
Problem was caused by use of Args[].Aux differences
in early partitioning.  This artificially separated
two equivalent expressions because sort ignores the
Aux field, hence things can end with equal things
separated by unequal things and thus the equal things
are split into more than one partition.  For example:
SliceLen(a), SliceLen(b), SliceLen(a).

Fix: don't use Args[].Aux in initial partitioning.

Left in a debugging flag and some debugging Fprintf's;
not sure if that is house style or not.  We'll probably
want to be more systematic in our naming conventions,
e.g. ssa.cse, ssa.scc, etc.

Change-Id: Ib1412539cc30d91ea542c0ac7b2f9b504108ca7f
Reviewed-on: https://go-review.googlesource.com/19316
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-08 18:44:03 +00:00
David Chase
c87a62f32b [dev.ssa] cmd/compile: reducing alloc footprint of dominator calc
Converted working slices of pointer into slices of pointer
index.  Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.

TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.

Minor optimization in regalloc that cuts allocation count.

Minor optimization in compile.go that cuts calls to Sprintf.

Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:20:25 +00:00
David Chase
88b230eaa6 [dev.ssa] cmd/compile: exposed do-log boolean to reduce allocations
From memory profiling, about 3% reduction in allocation count.

Change-Id: I4b662d55b8a94fe724759a2b22f05a08d0bf40f8
Reviewed-on: https://go-review.googlesource.com/19103
Reviewed-by: Keith Randall <khr@golang.org>
2016-01-29 21:30:29 +00:00
Keith Randall
d8a65672f8 [dev.ssa] cmd/compile: optimization for && and || expressions
Compiling && and || expressions often leads to control
flow of the following form:

p:
  If a goto b else c
b: <- p ...
  x = phi(a, ...)
  If x goto t else u

Note that if we take the edge p->b, then we are guaranteed
to take the edge b->t also.  So in this situation, we might
as well go directly from p to t.

Change-Id: I6974f1e6367119a2ddf2014f9741fdb490edcc12
Reviewed-on: https://go-review.googlesource.com/18910
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:49:45 +00:00
Keith Randall
8a961aee28 [dev.ssa] cmd/compile: fix -N build
The OpSB hack didn't quite work.  We need to really
CSE these ops to make regalloc happy.

Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:27:12 +00:00
Keith Randall
6a96a2fe5a [dev.ssa] cmd/compile: make cse faster
It is one of the slowest compiler phases right now, and we
run two of them.

Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.

Do a few optimizations to avoid work for size-1 equivalence classes.

Implement -N.

Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
2016-01-28 20:59:20 +00:00
Keith Randall
3c26c0db39 [dev.ssa] cmd/compile: short-circuit empty blocks
Empty blocks are introduced to remove critical edges.
After regalloc, we can remove any of the added blocks
that are still empty.

Change-Id: I0b40e95ac3a6cc1e632a479443479532b6c5ccd9
Reviewed-on: https://go-review.googlesource.com/18833
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-22 22:12:12 +00:00
Keith Randall
b5c5efd5de [dev.ssa] cmd/compile: optimize phi ops
Redo how we keep track of forward references when building SSA.
When the forward reference is resolved, update the Value node
in place.

Improve the phi elimination pass so it can simplify phis of phis.

Give SSA package access to decoded line numbers.  Fix line numbers
for constant booleans.

Change-Id: I3dc9896148d260be2f3dd14cbe5db639ec9fa6b7
Reviewed-on: https://go-review.googlesource.com/18674
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
2016-01-20 21:45:37 +00:00
Keith Randall
7d9f1067d1 [dev.ssa] cmd/compile: better register allocator
Reorder how register & stack allocation is done.  We used to allocate
registers, then fix up merge edges, then allocate stack slots.  This
lead to lots of unnecessary copies on merge edges:

v2 = LoadReg v1
v3 = StoreReg v2

If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary.  But at regalloc time we didn't know the homes of v1 and
v3.

To fix this problem, allocate all the stack slots before fixing up the
merge edges.  That way, we know what stack slots values use so we know
what copies are required.

Use a good technique for shuffling values around on merge edges.

Improves performance of the go1 TimeParse benchmark by ~12%

Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
2015-12-21 23:12:05 +00:00
Keith Randall
c140df0326 [dev.ssa] cmd/compile: allocate the flag register in a separate pass
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass.  Regalloc then
operates normally on any flag recomputation instructions.

Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
2015-12-11 21:08:15 +00:00
Keith Randall
02f4d0a130 [dev.ssa] cmd/compile: start arguments as spilled
Declare a function's arguments as having already been
spilled so their use just requires a restore.

Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.

Rename the memory input to InputMem.  Use Arg for the
pre-spilled argument values.

Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-03 17:29:40 +00:00