Commit graph

66 commits

Author SHA1 Message Date
Keith Randall
ade0eb2f06 cmd/compile: fix reslice
:= is the wrong thing here.  The new variable masks the old
variable so we allocate the slice afresh each time around the loop.

Change-Id: I759c30e1bfa88f40decca6dd7d1e051e14ca0844
Reviewed-on: https://go-review.googlesource.com/22679
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Minux Ma <minux@golang.org>
2016-05-01 22:11:45 +00:00
Keith Randall
8ad8d7d87e cmd/compile: Use pre-regalloc value ID in lateSpillUse
The cached copy's ID is sometimes outside the bounds of the orig array.

There's no reason to start at the cached copy and work backwards
to the original value. We already have the original value ID at
all the callsites.

Fixes noopt build

Change-Id: I313508a1917e838a87e8cc83b2ef3c2e4a8db304
Reviewed-on: https://go-review.googlesource.com/22355
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-21 21:25:50 +00:00
Keith Randall
b57ac33331 cmd/compile: forward-looking desired register biasing
Improve forward-looking desired register calculations.
It is now inter-block and handles a bunch more cases.

Fixes #14504
Fixes #14828
Fixes #15254

Change-Id: Ic240fa0ec6a779d80f577f55c8a6c4ac8c1a940a
Reviewed-on: https://go-review.googlesource.com/22160
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-20 15:31:42 +00:00
David Chase
6b0b3f86d6 cmd/compile: fix use of original spill name after sinking
This is a fix for the ssacheck builder
http://build.golang.org/log/baa00f70c34e41186051cfe90568de3d91f115d7
after CL 21307 for sinking spills down loop exits
https://go-review.googlesource.com/#/c/21037/

The fix is to reuse (move) the original spill, thus preserving
the definition of the variable and its use count. Original and
copy both use the same stack slot, but ssacheck needs to see
a definition for the variable itself.

Fixes #15279.

Change-Id: I286285490193dc211b312d64dbc5a54867730bd6
Reviewed-on: https://go-review.googlesource.com/21995
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-14 18:24:54 +00:00
David Chase
6b85a45edc cmd/compile: move spills to loop exits when easy.
For call-free inner loops.

Revised statistics:
  85 inner loop spills sunk
 341 inner loop spills remaining
1162 inner loop spills that were candidates for sinking
     ended up completely register allocated
 119 inner loop spills could have been sunk were used in
     "shuffling" at the bottom of the loop.
   1 inner loop spill not sunk because the register assigned
     changed between def and exit,

 Understanding how to make an inner loop definition not be
 a candidate for from-memory shuffling (to force the shuffle
 code to choose some other value) should pick up some of the
 119 other spills disqualified for this reason.

 Modified the stats printing based on feedback from Austin.

Change-Id: If3fb9b5d5a028f42ccc36c4e3d9e0da39db5ca60
Reviewed-on: https://go-review.googlesource.com/21037
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-04-13 15:59:42 +00:00
Keith Randall
0004f34cef cmd/compile: regalloc enforces 2-address instructions
Instead of being a hint, resultInArg0 is now enforced by regalloc.
This allows us to delete all the code from amd64/ssa.go which
deals with converting from a semantically three-address instruction
into some copies plus a two-address instruction.

Change-Id: Id4f39a80be4b678718bfd42a229f9094ab6ecd7c
Reviewed-on: https://go-review.googlesource.com/21816
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-10 23:20:38 +00:00
David Chase
c3b3e7b4ef cmd/compile: insert instrumentation more carefully in racewalk
Be more careful about inserting instrumentation in racewalk.
If the node being instrumented is an OAS, and it has a non-
empty Ninit, then append instrumentation to the Ninit list
rather than letting it be inserted before the OAS (and the
compilation of its init list).  This deals with the case that
the Ninit list defines a variable used in the RHS of the OAS.

Fixes #15091.

Change-Id: Iac91696d9104d07f0bf1bd3499bbf56b2e1ef073
Reviewed-on: https://go-review.googlesource.com/21771
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: David Chase <drchase@google.com>
2016-04-08 21:06:39 +00:00
Dave Cheney
b9feb91f32 cmd/compile: minor cleanups
Some minor scoping cleanups found by a very old version of grind.

Change-Id: I1d373817586445fc87e38305929097b652696fdd
Reviewed-on: https://go-review.googlesource.com/21064
Run-TryBot: Dave Cheney <dave@cheney.net>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-24 11:18:04 +00:00
Keith Randall
4c9a470d46 cmd/compile: start on ARM port
Start working on arm port.  Gets close to correct
code for fibonacci:
    func fib(n int) int {
        if n < 2 {
            return n
        }
        return fib(n-1) + fib(n-2)
    }

Still a lot to do, but this is a good starting point.

Cleaned up some arch-specific dependencies in regalloc.

Change-Id: I4301c6c31a8402168e50dcfee8bcf7aee73ea9d5
Reviewed-on: https://go-review.googlesource.com/21000
Reviewed-by: David Chase <drchase@google.com>
2016-03-23 17:46:05 +00:00
David Chase
815c9a7f28 cmd/compile: use loop information in regalloc
This seems to help the problem reported in #14606; this
change seems to produce about a 4% improvement (mostly
for the 128-8192 shards).

Fixes #14789.

Change-Id: I1bd52c82d4ca81d9d5e9ab371fdfc860d7e8af50
Reviewed-on: https://go-review.googlesource.com/20660
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-18 01:23:29 +00:00
Keith Randall
56e0ecc5ea cmd/compile: keep value use counts in SSA
Keep track of how many uses each Value has.  Each appearance in
Value.Args and in Block.Control counts once.

The number of uses of a value is generically useful to
constrain rewrite rules.  For instance, we might want to
prevent merging index operations into loads if the same
index expression is used lots of times.

But I have one use in particular for which the use count is required.
We must make sure we don't combine ops with loads if the load has
more than one use.  Otherwise, we may split a single load
into multiple loads and that breaks perceived behavior in
the presence of races.  In particular, the load of m.state
in sync/mutex.go:Lock can't be done twice.  (I have a separate
CL which triggers the mutex failure.  This CL has a test which
demonstrates a similar failure.)

Change-Id: Icaafa479239f48632a069d0c3f624e6ebc6b1f0e
Reviewed-on: https://go-review.googlesource.com/20790
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-03-17 04:20:02 +00:00
Todd Neal
763afe13b9 cmd/compile: change logging of spills for regalloc to Warnl format
Change-Id: I01c000ff3f6dc6b0ed691e289eeef0fa61500337
Reviewed-on: https://go-review.googlesource.com/20744
Reviewed-by: Keith Randall <khr@golang.org>
2016-03-16 02:50:13 +00:00
Keith Randall
369f4f5de5 cmd/compile: regalloc of two address instructions
x86 has a lot of instructions that require the output to be in the same
register as one of the inputs.  When allocating the output register,
allocate the same register as the input if it is available.

Improves the performance of golang.org/x/crypto/sha3 by
10% (from 6% slower than 1.6 to 4% faster).

Fixes #14745

Change-Id: I4d81785240c9368e4dc75107b45c959d200df8e6
Reviewed-on: https://go-review.googlesource.com/20488
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-03-11 04:13:07 +00:00
Todd Neal
6b3d4a5353 cmd/compile: modify regalloc/stackalloc to use the cmd line debug args
Change the existing flags from compile time consts to be configurable
from the command line.

Change-Id: I4aab4bf3dfcbdd8e2b5a2ff51af95c2543967769
Reviewed-on: https://go-review.googlesource.com/20560
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-11 01:35:12 +00:00
Keith Randall
ddc6b64444 cmd/compile: fix defer/deferreturn
Make sure we do any just-before-return cleanup on all paths out of a
function, including when recovering.  Each exit path should include
deferreturn (if there are any defers) and then the exit
code (e.g. copying heap-escaping return values back to the stack).

Introduce a Defer SSA block type which has two outgoing edges - one the
fallthrough edge (the defer was queued successfully) and one which
immediately returns (the defer had a successful recover() call and
normal execution should resume at the return point).

Fixes #14725

Change-Id: Iad035c9fd25ef8b7a74dafbd7461cf04833d981f
Reviewed-on: https://go-review.googlesource.com/20486
Reviewed-by: David Chase <drchase@google.com>
2016-03-10 22:33:49 +00:00
Keith Randall
686fbdb3b0 cmd/compile: make compilation deterministic, fixes toolstash
Make sure we don't depend on map iterator order.

Fixes #14600

Change-Id: Iac0e0c8689f3ace7a4dc8e2127e2fd3c8545bd29
Reviewed-on: https://go-review.googlesource.com/20158
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-03-03 18:03:45 +00:00
Keith Randall
c03ed491fe cmd/compile: load some live values into registers before loop
If we're about to enter a loop, load values which are live
and will soon be used in the loop into registers.

name                     old time/op    new time/op    delta
BinaryTree17-8              2.80s ± 4%     2.62s ± 2%   -6.43%          (p=0.008 n=5+5)
Fannkuch11-8                2.45s ± 2%     2.14s ± 1%  -12.43%          (p=0.008 n=5+5)
FmtFprintfEmpty-8          49.0ns ± 1%    48.4ns ± 1%   -1.35%          (p=0.032 n=5+5)
FmtFprintfString-8          160ns ± 1%     153ns ± 0%   -4.63%          (p=0.008 n=5+5)
FmtFprintfInt-8             152ns ± 0%     150ns ± 0%   -1.57%          (p=0.000 n=5+4)
FmtFprintfIntInt-8          252ns ± 2%     244ns ± 1%   -3.02%          (p=0.008 n=5+5)
FmtFprintfPrefixedInt-8     223ns ± 0%     223ns ± 0%     ~     (all samples are equal)
FmtFprintfFloat-8           293ns ± 2%     291ns ± 2%     ~             (p=0.389 n=5+5)
FmtManyArgs-8               956ns ± 0%     936ns ± 0%   -2.05%          (p=0.008 n=5+5)
GobDecode-8                7.18ms ± 0%    7.11ms ± 0%   -1.02%          (p=0.008 n=5+5)
GobEncode-8                6.12ms ± 3%    6.07ms ± 1%     ~             (p=0.690 n=5+5)
Gzip-8                      284ms ± 1%     284ms ± 0%     ~             (p=1.000 n=5+5)
Gunzip-8                   40.8ms ± 1%    40.6ms ± 1%     ~             (p=0.310 n=5+5)
HTTPClientServer-8         69.8µs ± 1%    72.2µs ± 4%     ~             (p=0.056 n=5+5)
JSONEncode-8               16.1ms ± 2%    16.2ms ± 1%     ~             (p=0.151 n=5+5)
JSONDecode-8               54.9ms ± 0%    57.0ms ± 1%   +3.79%          (p=0.008 n=5+5)
Mandelbrot200-8            4.35ms ± 0%    4.39ms ± 0%   +0.85%          (p=0.008 n=5+5)
GoParse-8                  3.56ms ± 1%    3.42ms ± 1%   -4.03%          (p=0.008 n=5+5)
RegexpMatchEasy0_32-8      75.6ns ± 1%    75.0ns ± 0%   -0.83%          (p=0.016 n=5+4)
RegexpMatchEasy0_1K-8       250ns ± 0%     252ns ± 1%   +0.80%          (p=0.016 n=4+5)
RegexpMatchEasy1_32-8      75.0ns ± 0%    75.4ns ± 2%     ~             (p=0.206 n=5+5)
RegexpMatchEasy1_1K-8       401ns ± 0%     398ns ± 1%     ~             (p=0.056 n=5+5)
RegexpMatchMedium_32-8      119ns ± 0%     118ns ± 0%   -0.84%          (p=0.008 n=5+5)
RegexpMatchMedium_1K-8     36.6µs ± 0%    36.9µs ± 0%   +0.91%          (p=0.008 n=5+5)
RegexpMatchHard_32-8       1.95µs ± 1%    1.92µs ± 0%   -1.23%          (p=0.032 n=5+5)
RegexpMatchHard_1K-8       58.3µs ± 1%    58.1µs ± 1%     ~             (p=0.548 n=5+5)
Revcomp-8                   425ms ± 1%     389ms ± 1%   -8.39%          (p=0.008 n=5+5)
Template-8                 65.5ms ± 1%    63.6ms ± 1%   -2.86%          (p=0.008 n=5+5)
TimeParse-8                 363ns ± 0%     354ns ± 1%   -2.59%          (p=0.008 n=5+5)
TimeFormat-8                363ns ± 0%     364ns ± 1%     ~             (p=0.159 n=5+5)

Fixes #14511

Change-Id: I1b79d2545271fa90d5b04712cc25573bdc94f2ce
Reviewed-on: https://go-review.googlesource.com/20151
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-03-03 17:39:43 +00:00
Brad Fitzpatrick
5fea2ccc77 all: single space after period.
The tree's pretty inconsistent about single space vs double space
after a period in documentation. Make it consistently a single space,
per earlier decisions. This means contributors won't be confused by
misleading precedence.

This CL doesn't use go/doc to parse. It only addresses // comments.
It was generated with:

$ perl -i -npe 's,^(\s*// .+[a-z]\.)  +([A-Z]),$1 $2,' $(git grep -l -E '^\s*//(.+\.)  +([A-Z])')
$ go test go/doc -update

Change-Id: Iccdb99c37c797ef1f804a94b22ba5ee4b500c4f7
Reviewed-on: https://go-review.googlesource.com/20022
Reviewed-by: Rob Pike <r@golang.org>
Reviewed-by: Dave Day <djd@golang.org>
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-03-02 00:13:47 +00:00
David Chase
288817b05a [dev.ssa] cmd/compile: reduce line number churn in generated code
In regalloc, make LoadReg instructions use the line number
of their *use*, not their *source*.  This reduces the
tendency of debugger stepping to "jump around" the program.

Change-Id: I59e2eeac4dca9168d8af3a93effbc5bdacac2881
Reviewed-on: https://go-review.googlesource.com/19836
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-02-24 16:57:36 +00:00
David Chase
b86cafc7dc [dev.ssa] cmd/compile: memory allocation tweaks to regalloc and dom
Spotted a minor source of excess allocation in the register
allocator.  Rearranged the dominator tree code to pull its
scratch memory from a reused buffer attached to Config.

Change-Id: I6da6e7b112f7d3eb1fd00c58faa8214cdea44e38
Reviewed-on: https://go-review.googlesource.com/19450
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-02-22 15:32:58 +00:00
Keith Randall
16b1fce921 [dev.ssa] cmd/compile: add aux typing, flags to ops
Add the aux type to opcodes.
Add rematerializeable as a flag.

Change-Id: I906e19281498f3ee51bb136299bf26e13a54b2ec
Reviewed-on: https://go-review.googlesource.com/19088
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:55:13 +00:00
David Chase
c87a62f32b [dev.ssa] cmd/compile: reducing alloc footprint of dominator calc
Converted working slices of pointer into slices of pointer
index.  Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.

TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.

Minor optimization in regalloc that cuts allocation count.

Minor optimization in compile.go that cuts calls to Sprintf.

Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2016-02-02 02:20:25 +00:00
Todd Neal
f962f33035 [dev.ssa] cmd/compile: reuse sparse sets across compiler passes
Cache sparse sets in the function so they can be reused by subsequent
compiler passes.

benchmark                        old ns/op     new ns/op     delta
BenchmarkDSEPass-8               206945        180022        -13.01%
BenchmarkDSEPassBlock-8          5286103       2614054       -50.55%
BenchmarkCSEPass-8               1790277       1790655       +0.02%
BenchmarkCSEPassBlock-8          18083588      18112771      +0.16%
BenchmarkDeadcodePass-8          59837         41375         -30.85%
BenchmarkDeadcodePassBlock-8     1651575       511169        -69.05%
BenchmarkMultiPass-8             531529        427506        -19.57%
BenchmarkMultiPassBlock-8        7033496       4487814       -36.19%

benchmark                        old allocs     new allocs     delta
BenchmarkDSEPass-8               11             4              -63.64%
BenchmarkDSEPassBlock-8          599            120            -79.97%
BenchmarkCSEPass-8               18             18             +0.00%
BenchmarkCSEPassBlock-8          2700           2700           +0.00%
BenchmarkDeadcodePass-8          4              3              -25.00%
BenchmarkDeadcodePassBlock-8     30             9              -70.00%
BenchmarkMultiPass-8             24             20             -16.67%
BenchmarkMultiPassBlock-8        1800           1000           -44.44%

benchmark                        old bytes     new bytes     delta
BenchmarkDSEPass-8               221367        142           -99.94%
BenchmarkDSEPassBlock-8          3695207       3846          -99.90%
BenchmarkCSEPass-8               303328        303328        +0.00%
BenchmarkCSEPassBlock-8          5006400       5006400       +0.00%
BenchmarkDeadcodePass-8          84232         10506         -87.53%
BenchmarkDeadcodePassBlock-8     1274940       163680        -87.16%
BenchmarkMultiPass-8             608674        313834        -48.44%
BenchmarkMultiPassBlock-8        9906001       5003450       -49.49%

Change-Id: Ib1fa58c7f494b374d1a4bb9cffbc2c48377b59d3
Reviewed-on: https://go-review.googlesource.com/19100
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2016-01-30 13:57:39 +00:00
Keith Randall
9d6e605cf7 [dev.ssa] cmd/compile: simple forward-looking register allocation tweak
For each value that needs to be in a fixed register at the end of the
block, and try to pick that fixed register when the instruction
generating that value is scheduled (or restored from a spill).

Just used for end-of-block register requirements for now.
Fixed-register instruction requirements (e.g. shift in ecx) can be
added later.  Also two-instruction constraints (input reg == output
reg) might be recorded in a similar manner.

Change-Id: I59916e2e7f73657bb4fc3e3b65389749d7a23fa8
Reviewed-on: https://go-review.googlesource.com/18774
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:32:38 +00:00
Keith Randall
8a961aee28 [dev.ssa] cmd/compile: fix -N build
The OpSB hack didn't quite work.  We need to really
CSE these ops to make regalloc happy.

Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-01-29 17:27:12 +00:00
Keith Randall
2f57d0fe02 [dev.ssa] cmd/compile: preallocate small-numbered values and blocks
Speeds up the compiler ~5%.

Change-Id: Ia5cf0bcd58701fd14018ec77d01f03d5c7d6385b
Reviewed-on: https://go-review.googlesource.com/19060
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2016-01-28 22:52:42 +00:00
Keith Randall
6a96a2fe5a [dev.ssa] cmd/compile: make cse faster
It is one of the slowest compiler phases right now, and we
run two of them.

Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.

Do a few optimizations to avoid work for size-1 equivalence classes.

Implement -N.

Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
2016-01-28 20:59:20 +00:00
Keith Randall
7b773946c0 [dev.ssa] cmd/compile: disable xor clearing when flags must be preserved
The x86 backend automatically rewrites MOV $0, AX to
XOR AX, AX.  That rewrite isn't ok when the flags register
is live across the MOV.  Keep track of which moves care
about preserving flags, then disable this rewrite for them.

On x86, Prog.Mark was being used to hold the length of the
instruction.  We already store that in Prog.Isize, so no
need to store it in Prog.Mark also.  This frees up Prog.Mark
to hold a bitmask on x86 just like all the other architectures.

Update #12405

Change-Id: Ibad8a8f41fc6222bec1e4904221887d3cc3ca029
Reviewed-on: https://go-review.googlesource.com/18861
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Russ Cox <rsc@golang.org>
2016-01-26 17:40:22 +00:00
Keith Randall
9094e3ada2 [dev.ssa] cmd/compile: fix spill sizes
In code that does:

    var x, z int32
    var y int64
    z = phi(x, int32(y))

We silently drop the int32 cast because truncation is a no-op.
The phi operation needs to make sure it uses the size of the
phi, not the size of its arguments, when generating spills.

Change-Id: I1f7baf44f019256977a46fdd3dad1972be209042
Reviewed-on: https://go-review.googlesource.com/18390
Reviewed-by: David Chase <drchase@google.com>
2016-01-13 18:39:15 +00:00
Keith Randall
d7ad7b9efe [dev.ssa] cmd/compile: zero register masks for each edge
Forgot to reset these masks before each merge edge is processed.

Change-Id: I2f593189b63f50a1cd12b2dd4645ca7b9614f1f3
Reviewed-on: https://go-review.googlesource.com/18223
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
2016-01-05 20:20:18 +00:00
Keith Randall
7d9f1067d1 [dev.ssa] cmd/compile: better register allocator
Reorder how register & stack allocation is done.  We used to allocate
registers, then fix up merge edges, then allocate stack slots.  This
lead to lots of unnecessary copies on merge edges:

v2 = LoadReg v1
v3 = StoreReg v2

If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary.  But at regalloc time we didn't know the homes of v1 and
v3.

To fix this problem, allocate all the stack slots before fixing up the
merge edges.  That way, we know what stack slots values use so we know
what copies are required.

Use a good technique for shuffling values around on merge edges.

Improves performance of the go1 TimeParse benchmark by ~12%

Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
2015-12-21 23:12:05 +00:00
Keith Randall
c140df0326 [dev.ssa] cmd/compile: allocate the flag register in a separate pass
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass.  Regalloc then
operates normally on any flag recomputation instructions.

Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
2015-12-11 21:08:15 +00:00
Keith Randall
75102afce7 [dev.ssa] cmd/compile: better register allocation
Use a more precise computation of next use.  It properly
detects lifetime holes and deallocates values during those holes.
It also uses a more precise version of distance to next use which
affects which values get spilled.

Change-Id: I49eb3ebe2d2cb64842ecdaa7fb4f3792f8afb90b
Reviewed-on: https://go-review.googlesource.com/16760
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-12 15:31:11 +00:00
Keith Randall
02f4d0a130 [dev.ssa] cmd/compile: start arguments as spilled
Declare a function's arguments as having already been
spilled so their use just requires a restore.

Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.

Rename the memory input to InputMem.  Use Arg for the
pre-spilled argument values.

Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-11-03 17:29:40 +00:00
Keith Randall
d04f38e3ee [dev.ssa] cmd/compile: flag recomputing: find original values correctly
We "spill" flag values by recomputing them from their original
inputs.  The "find original inputs" part of the algorithm was
a hack.  It was broken by rematerialization.  This change does
the real job of keeping track of original values for each
spill/restore/flagrecompute/rematerialization we issue.

Change-Id: I95088326a4ee4958c98148b063e518c80e863e4c
Reviewed-on: https://go-review.googlesource.com/16500
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2015-10-29 21:59:10 +00:00
Keith Randall
31115a5c98 [dev.ssa] cmd/compile: optimize nil checks
Use faulting loads instead of test/jeq to do nil checks.
Fold nil checks into a following load/store if possible.

Makes binaries about 2% smaller.

Change-Id: I54af0f0a93c853f37e34e0ce7e3f01dd2ac87f64
Reviewed-on: https://go-review.googlesource.com/16287
Reviewed-by: David Chase <drchase@google.com>
2015-10-25 20:34:28 +00:00
Keith Randall
7d61246972 [dev.ssa] cmd/compile: implement reserved registers
BP for framepointer experiment
R15 for dynamic linking

Change-Id: I28e48be461d04a4d5c9b013f48fce5c0e58d6a08
Reviewed-on: https://go-review.googlesource.com/16231
Run-TryBot: Todd Neal <todd@tneal.org>
Reviewed-by: Todd Neal <todd@tneal.org>
2015-10-23 04:14:06 +00:00
Keith Randall
f206b16ff7 [dev.ssa] cmd/compile: assign unused registers to phi ops
Register phis are better than stack phis.  If we have
unused registers available, use them for phis.

Change-Id: I3045711c65caa1b6d0be29131b87b57466320cc2
Reviewed-on: https://go-review.googlesource.com/16080
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-10-22 17:06:10 +00:00
Keith Randall
d694f83c21 [dev.ssa] cmd/compile: getg needs a memory arg
getg reads from memory, so it should really have a
memory arg.  It is critical in functions which call setg
to make sure getg gets ordered correctly with setg.

Change-Id: Ief4875421f741fc49c07b0e1f065ce2535232341
Reviewed-on: https://go-review.googlesource.com/16100
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: David Chase <drchase@google.com>
2015-10-20 03:41:03 +00:00
Keith Randall
2dc88eead8 [dev.ssa] cmd/compile: Don't rematerialize getg
It isn't safe in functions that also call setg.

Change-Id: I76a7bf0401b4b6c8a129c245b15a2d6f06080e94
Reviewed-on: https://go-review.googlesource.com/16095
Reviewed-by: Todd Neal <todd@tneal.org>
2015-10-20 01:41:50 +00:00
Keith Randall
c64a6f6362 [dev.ssa] cmd/compile: Rematerialize in regalloc
Rematerialize constants instead of spilling and loading them.
"Constants" includes constant offsets from SP and SB.

Should help somewhat with stack frame sizes.  I'm not sure
exactly how much yet.

Change-Id: I44dbad97aae870cf31cb6e89c92fe4f6a2b9586f
Reviewed-on: https://go-review.googlesource.com/16029
Run-TryBot: Keith Randall <khr@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2015-10-19 19:01:24 +00:00
David Chase
956f3199a3 [dev.ssa] cmd/compile: addressed vars and closures
Cleaned up first-block-in-function code.
Added cases for |PHEAP for PPARAM and PAUTO.
Made PPARAMOUT act more like PAUTO for purposes
of address generation and vardef placement.
Added cases for OCLOSUREVAR and Ops for getting closure
pointer.  Closure ops are scheduled at top of entry block
to capture DX.

Wrote test that seems to show proper behavior for addressed
parameters, locals, and returns.

Change-Id: Iee93ebf9e3d9f74cfb4d1c1da8038eb278d8a857
Reviewed-on: https://go-review.googlesource.com/14650
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
2015-09-22 18:07:03 +00:00
Keith Randall
cde977c23c [dev.ssa] cmd/compile/internal/ssa: fix sign extension + load combo
Load-and-sign-extend opcodes were being generated in the
wrong block, leading to having more than one memory variable
live at once.  Fix the rules + add a test.

Change-Id: Iadf80e55ea901549c15c628ae295c2d0f1f64525
Reviewed-on: https://go-review.googlesource.com/14591
Reviewed-by: Todd Neal <todd@tneal.org>
Run-TryBot: Todd Neal <todd@tneal.org>
2015-09-15 17:48:25 +00:00
Todd Neal
a0022d9b8c [dev.ssa] cmd/compile: add more specific regalloc logging
Change-Id: Ib0ea4b9c245f3d551e0f703826caa6b444b56a2d
Reviewed-on: https://go-review.googlesource.com/14136
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-09-01 02:37:42 +00:00
Todd Neal
3b7f0c9cba [dev.ssa] cmd/compile: fix typo in log
Change-Id: Ic7be8fa3a89e46a93df181df3163ec1bf7e96a23
Reviewed-on: https://go-review.googlesource.com/14076
Reviewed-by: Minux Ma <minux@golang.org>
2015-08-31 03:07:43 +00:00
Todd Neal
cff0c6ad0f [dev.ssa] cmd/compile: add instrumentation to regalloc
Change-Id: Ice206f7e94af4a148d9dd9a7570f5ed21722bedc
Reviewed-on: https://go-review.googlesource.com/14075
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2015-08-31 02:53:07 +00:00
Todd Neal
7cadf23afb [dev.ssa] cmd/compile: fix phi floats
The code previously always used AX causing errors.  For now, just
switch off the type in order to at least generate valid code.

Change-Id: Iaf13120a24b62456b9b33c04ab31f2d5104b381b
Reviewed-on: https://go-review.googlesource.com/13943
Reviewed-by: David Chase <drchase@google.com>
2015-08-26 14:40:34 +00:00
Josh Bleecher Snyder
9f8f8c27dc [dev.ssa] cmd/compile: support spilling and loading flags
This CL takes a simple approach to spilling and loading flags.
We never spill. When a load is needed, we recalculate,
loading the arguments as needed.

This is simple and architecture-independent.
It is not very efficient, but as of this CL,
there are fewer than 200 flag spills during make.bash.

This was tested by manually reverting CLs 13813 and 13843,
causing SETcc, MOV, and LEA instructions to clobber flags,
which dramatically increases the number of flags spills.
With that done, all stdlib tests that used to pass
still pass.

For future reference, here are some other, more efficient
amd64-only schemes that we could adapt in the future if needed.

(1) Spill exactly the flags needed.

For example, if we know that the flags will be needed
by a SETcc or Jcc op later, we could use SETcc to
extract just the relevant flag. When needed,
we could use TESTB and change the op to JNE/SETNE.
(Alternatively, we could leave the op unaltered
and prepare an appropriate CMPB instruction
to produce the desired flag.)

However, this requires separate handling for every
instruction that uses the flags register,
including (say) SBBQcarrymask.

We could enable this on an ad hoc basis for common cases
and fall back to recalculation for other cases.

(2) Spill all flags with PUSHF and POPF

This modifies SP, which the runtime won't like.
It also requires coordination with stackalloc to
make sure that we have a stack slot ready for use.

(3) Spill almost all flags with LAHF, SETO, and SAHF

See http://blog.freearrow.com/archives/396
for details. This would handle all the flags we currently
use. However, LAHF and SAHF are not universally available
and it requires arranging for AX to be free.

Change-Id: Ie36600fd8e807ef2bee83e2e2ae3685112a7f276
Reviewed-on: https://go-review.googlesource.com/13844
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-24 22:13:42 +00:00
Keith Randall
0b46b42943 [dev.ssa] cmd/compile/internal/ssa: New register allocator
Implement a global (whole function) register allocator.
This replaces the local (per basic block) register allocator.

Clobbering of registers by instructions is handled properly.
A separate change will add the correct clobbers to all the instructions.

Change-Id: I38ce4dc7dccb8303c1c0e0295fe70247b0a3f2ea
Reviewed-on: https://go-review.googlesource.com/13622
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Todd Neal <todd@tneal.org>
2015-08-17 21:06:30 +00:00
Josh Bleecher Snyder
ddeee0eed3 [dev.ssa] cmd/compile: enforce that all phis are first during regalloc
Change-Id: I035708f5d0659b3deef00808d35e1cc8a80215e0
Reviewed-on: https://go-review.googlesource.com/13243
Reviewed-by: Keith Randall <khr@golang.org>
2015-08-06 17:26:43 +00:00