Commit graph

85 commits

Author SHA1 Message Date
Josh Bleecher Snyder
c9ccdf1f8c cmd/compile: make deadcode pass cheaper
The deadcode pass runs a lot.
I'd like it to run even more.

This change adds dedicated storage for deadcode to ssa.Cache.
In addition to being a nice win now, it makes
deadcode easier to add other places in the future.

name        old time/op       new time/op       delta
Template          210ms ± 3%        209ms ± 2%    ~     (p=0.951 n=93+95)
Unicode          92.2ms ± 3%       93.0ms ± 3%  +0.87%  (p=0.000 n=94+94)
GoTypes           739ms ± 2%        733ms ± 2%  -0.84%  (p=0.000 n=92+94)
Compiler          3.51s ± 2%        3.49s ± 2%  -0.57%  (p=0.000 n=94+91)
SSA               9.80s ± 2%        9.75s ± 2%  -0.57%  (p=0.000 n=95+92)
Flate             132ms ± 2%        132ms ± 3%    ~     (p=0.165 n=94+98)
GoParser          160ms ± 3%        159ms ± 3%  -0.42%  (p=0.005 n=96+94)
Reflect           446ms ± 4%        442ms ± 4%  -0.91%  (p=0.000 n=95+98)
Tar               186ms ± 3%        186ms ± 2%    ~     (p=0.221 n=94+97)
XML               252ms ± 2%        250ms ± 2%  -0.55%  (p=0.000 n=95+94)
[Geo mean]        430ms             429ms       -0.34%

name        old user-time/op  new user-time/op  delta
Template          256ms ± 3%        257ms ± 3%    ~     (p=0.521 n=94+98)
Unicode           120ms ± 9%        121ms ± 9%    ~     (p=0.074 n=99+100)
GoTypes           935ms ± 3%        935ms ± 2%    ~     (p=0.574 n=82+96)
Compiler          4.56s ± 1%        4.55s ± 2%    ~     (p=0.247 n=88+90)
SSA               13.6s ± 2%        13.6s ± 1%    ~     (p=0.277 n=94+95)
Flate             155ms ± 3%        156ms ± 3%    ~     (p=0.181 n=95+100)
GoParser          193ms ± 8%        184ms ± 6%  -4.39%  (p=0.000 n=100+89)
Reflect           549ms ± 3%        552ms ± 3%  +0.45%  (p=0.036 n=94+96)
Tar               230ms ± 4%        230ms ± 4%    ~     (p=0.670 n=97+99)
XML               315ms ± 5%        309ms ±12%  -2.05%  (p=0.000 n=99+99)
[Geo mean]        540ms             538ms       -0.47%

name        old alloc/op      new alloc/op      delta
Template         40.3MB ± 0%       38.9MB ± 0%  -3.36%  (p=0.008 n=5+5)
Unicode          28.6MB ± 0%       28.4MB ± 0%  -0.90%  (p=0.008 n=5+5)
GoTypes           137MB ± 0%        132MB ± 0%  -3.65%  (p=0.008 n=5+5)
Compiler          637MB ± 0%        609MB ± 0%  -4.40%  (p=0.008 n=5+5)
SSA              2.19GB ± 0%       2.07GB ± 0%  -5.63%  (p=0.008 n=5+5)
Flate            25.0MB ± 0%       24.1MB ± 0%  -3.80%  (p=0.008 n=5+5)
GoParser         30.0MB ± 0%       29.1MB ± 0%  -3.17%  (p=0.008 n=5+5)
Reflect          87.1MB ± 0%       84.4MB ± 0%  -3.05%  (p=0.008 n=5+5)
Tar              37.3MB ± 0%       36.0MB ± 0%  -3.31%  (p=0.008 n=5+5)
XML              49.8MB ± 0%       48.0MB ± 0%  -3.69%  (p=0.008 n=5+5)
[Geo mean]       87.6MB            84.6MB       -3.50%

name        old allocs/op     new allocs/op     delta
Template           387k ± 0%         380k ± 0%  -1.76%  (p=0.008 n=5+5)
Unicode            342k ± 0%         341k ± 0%  -0.31%  (p=0.008 n=5+5)
GoTypes           1.39M ± 0%        1.37M ± 0%  -1.64%  (p=0.008 n=5+5)
Compiler          5.68M ± 0%        5.60M ± 0%  -1.41%  (p=0.008 n=5+5)
SSA               17.1M ± 0%        16.8M ± 0%  -1.49%  (p=0.008 n=5+5)
Flate              240k ± 0%         236k ± 0%  -1.99%  (p=0.008 n=5+5)
GoParser           309k ± 0%         304k ± 0%  -1.57%  (p=0.008 n=5+5)
Reflect           1.01M ± 0%        0.99M ± 0%  -2.69%  (p=0.008 n=5+5)
Tar                360k ± 0%         353k ± 0%  -1.91%  (p=0.008 n=5+5)
XML                447k ± 0%         441k ± 0%  -1.26%  (p=0.008 n=5+5)
[Geo mean]         858k              844k       -1.60%

Fixes #15306

Change-Id: I9f558adb911efddead3865542fe2ca71f66fe1da
Reviewed-on: https://go-review.googlesource.com/c/go/+/166718
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2019-03-11 21:20:01 +00:00
Yury Smolsky
3068fcfa0d cmd/compile: add control flow graphs to ssa.html
This CL adds CFGs to ssa.html.
It execs dot to generate SVG,
which then gets inlined into the html.
Some standard naming and javascript hacks
enable integration with the rest of ssa.html.
Clicking on blocks highlights the relevant
part of the CFG, and vice versa.

Sample output and screenshots can be seen in #28177.

CFGs can be turned on with the suffix mask:
:*            - dump CFG for every phase
:lower        - just the lower phase
:lower-layout - lower through layout
:w,x-y        - phases w and x through y

Calling dot after every pass is noticeably slow,
instead use the range of phases.

Dead blocks are not displayed on CFG.

User can zoom and pan individual CFG
when the automatic adjustment has failed.

Dot-related errors are reported
without bringing down the process.

Fixes #28177

Change-Id: Id52c42d86c4559ca737288aa10561b67a119c63d
Reviewed-on: https://go-review.googlesource.com/c/142517
Run-TryBot: Yury Smolsky <yury@smolsky.by>
Reviewed-by: David Chase <drchase@google.com>
2018-11-21 10:22:43 +00:00
Brad Fitzpatrick
3813edf26e all: use "reports whether" consistently in the few places that didn't
Go documentation style for boolean funcs is to say:

    // Foo reports whether ...
    func Foo() bool

(rather than "returns true if")

This CL also replaces 4 uses of "iff" with the same "reports whether"
wording, which doesn't lose any meaning, and will prevent people from
sending typo fixes when they don't realize it's "if and only if". In
the past I think we've had the typo CLs updated to just say "reports
whether". So do them all at once.

(Inspired by the addition of another "returns true if" in CL 146938
in fd_plan9.go)

Created with:

$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true iff" | grep -v vendor)
$ perl -i -npe 's/returns true if/reports whether/' $(git grep -l "returns true if" | grep -v vendor)

Change-Id: Ided502237f5ab0d25cb625dbab12529c361a8b9f
Reviewed-on: https://go-review.googlesource.com/c/147037
Reviewed-by: Ian Lance Taylor <iant@golang.org>
2018-11-02 22:47:58 +00:00
David Chase
69c5830c2b cmd/compile: repair display of values & blocks in prog column
This restores the printing of vXX and bYY in the left-hand
edge of the last column of ssa.html, where the generated
progs appear.

Change-Id: I81ab9b2fa5ae28e6e5de1b77665cfbed8d14e000
Reviewed-on: https://go-review.googlesource.com/c/141277
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Yury Smolsky <yury@smolsky.by>
2018-10-11 15:29:00 +00:00
David Chase
0029cd479e cmd/compile: add LocalAddr that takes SP,mem operands
Lack of a well-defined order between VarDef and related
address operations sometimes causes problems with store order
and write barrier transformations; glitches in the order are
made irreparable (by later optimizations) if the two parts of
the glitch straddle a split in the original block caused by
insertion of a write barrier diamond.

Fix this by creating a LocalAddr for addresses of locals
(what VarDef matters for) that takes a memory input to
help make the order explicit.  Addr is modified to only
be legal for SB operand, so there is no overlap between
Addr and LocalAddr uses (there may be some downstream
cleanup from this).

Changes to generic.rules and rewrite.go ensure that codegen
tests continue to pass; CSE of LocalAddr is impaired, not
quite sure of the cost.

Fixes #26105.

Change-Id: Id4192b4440aa4e9d7ba54a465c456df9b530b515
Reviewed-on: https://go-review.googlesource.com/122483
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
2018-07-12 18:45:31 +00:00
Austin Clements
a367f44c18 cmd/compile: enable stack maps everywhere except unsafe points
This modifies issafepoint in liveness analysis to report almost every
operation as a safe point. There are four things we don't mark as
safe-points:

1. Runtime code (other than at calls).

2. go:nosplit functions (other than at calls).

3. Instructions between the load of the write barrier-enabled flag and
   the write.

4. Instructions leading up to a uintptr -> unsafe.Pointer conversion.

We'll optimize this in later CLs:

name        old time/op       new time/op       delta
Template          185ms ± 2%        190ms ± 2%   +2.95%  (p=0.000 n=10+10)
Unicode          96.3ms ± 3%       96.4ms ± 1%     ~     (p=0.905 n=10+9)
GoTypes           658ms ± 0%        669ms ± 1%   +1.72%  (p=0.000 n=10+9)
Compiler          3.14s ± 1%        3.18s ± 1%   +1.56%  (p=0.000 n=9+10)
SSA               7.41s ± 2%        7.59s ± 1%   +2.48%  (p=0.000 n=9+10)
Flate             126ms ± 1%        128ms ± 1%   +2.08%  (p=0.000 n=10+10)
GoParser          153ms ± 1%        157ms ± 2%   +2.38%  (p=0.000 n=10+10)
Reflect           437ms ± 1%        442ms ± 1%   +0.98%  (p=0.001 n=10+10)
Tar               178ms ± 1%        179ms ± 1%   +0.67%  (p=0.035 n=10+9)
XML               223ms ± 1%        229ms ± 1%   +2.58%  (p=0.000 n=10+10)
[Geo mean]        394ms             401ms        +1.75%

No effect on binary size because we're not yet emitting these extra
safe points.

For #24543.

Change-Id: I16a1eebb9183cad7cef9d53c0fd21a973cad6859
Reviewed-on: https://go-review.googlesource.com/109348
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-05-22 14:43:37 +00:00
Giovanni Bajo
3c8545c5f6 cmd/compile: reduce allocations in prove by reusing posets
In prove, reuse posets between different functions by storing them
in the per-worker cache.

Allocation count regression caused by prove improvements is down
from 5% to 3% after this CL.

Updates #25179

Change-Id: I6d14003109833d9b3ef5165fdea00aa9c9e952e8
Reviewed-on: https://go-review.googlesource.com/110455
Run-TryBot: Giovanni Bajo <rasky@develer.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2018-05-14 14:44:55 +00:00
David Chase
c2c1822b12 cmd/compile: assign and preserve statement boundaries.
A new pass run after ssa building (before any other
optimization) identifies the "first" ssa node for each
statement. Other "noise" nodes are tagged as being never
appropriate for a statement boundary (e.g., VarKill, VarDef,
Phi).

Rewrite, deadcode, cse, and nilcheck are modified to move
the statement boundaries forward whenever possible if a
boundary-tagged ssa value is removed; never-boundary nodes
are ignored in this search (some operations involving
constants are also tagged as never-boundary and also ignored
because they are likely to be moved or removed during
optimization).

Code generation treats all nodes except those explicitly
marked as statement boundaries as "not statement" nodes,
and floats statement boundaries to the beginning of each
same-line run of instructions found within a basic block.

Line number html conversion was modified to make statement
boundary nodes a bit more obvious by prepending a "+".

The code in fuse.go that glued together the value slices
of two blocks produced a result that depended on the
former capacities (not lengths) of the two slices.  This
causes differences in the 386 bootstrap, and also can
sometimes put values into an order that does a worse job
of preserving statement boundaries when values are removed.

Portions of two delve tests that had caught problems were
incorporated into ssa/debug_test.go.  There are some
opportunities to do better with optimized code, but the
next-ing is not lying or overly jumpy.

Over 4 CLs, compilebench geomean measured binary size
increase of 3.5% and compile user time increase of 3.8%
(this is after optimization to reuse a sparse map instead
of creating multiple maps.)

This CL worsens the optimized-debugging experience with
Delve; we need to work with the delve team so that
they can use the is_stmt marks that we're emitting now.

The reference output changes from time to time depending
on other changes in the compiler, sometimes better,
sometimes worse.

This CL now includes a test ensuring that 99+% of the lines
in the Go command itself (a handy optimized binary) include
is_stmt markers.

Change-Id: I359c94e06843f1eb41f9da437bd614885aa9644a
Reviewed-on: https://go-review.googlesource.com/102435
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Austin Clements <austin@google.com>
2018-05-14 14:09:49 +00:00
Daniel Martí
14393c5cd4 cmd: remove a few more unused parameters
ssa's pos parameter on the Const* funcs is unused, so remove it.

ld's alloc parameter on elfnote is always true, so remove the arguments
and simplify the code.

Finally, arm's addpltreloc never has its return parameter used, so
remove it.

Change-Id: I63387ecf6ab7b5f7c20df36be823322bb98427b8
Reviewed-on: https://go-review.googlesource.com/104456
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2018-04-09 17:10:25 +00:00
Daniel Martí
cd2cb6e3f5 cmd/compile: cache sparse maps across ssa passes
This is done for sparse sets already, but it was missing for sparse
maps. Only affects deadstore and regalloc, as they're the only ones that
use sparse maps.

name                 old time/op    new time/op    delta
DSEPass-4               247µs ± 0%     216µs ± 0%  -12.75%  (p=0.008 n=5+5)
DSEPassBlock-4         3.05ms ± 1%    2.87ms ± 1%   -6.02%  (p=0.002 n=6+6)
CSEPass-4              2.30ms ± 0%    2.32ms ± 0%   +0.53%  (p=0.026 n=6+6)
CSEPassBlock-4         23.8ms ± 0%    23.8ms ± 0%     ~     (p=0.931 n=6+5)
DeadcodePass-4         51.7µs ± 1%    51.5µs ± 2%     ~     (p=0.429 n=5+6)
DeadcodePassBlock-4     734µs ± 1%     742µs ± 3%     ~     (p=0.394 n=6+6)
MultiPass-4             152µs ± 0%     149µs ± 2%     ~     (p=0.082 n=5+6)
MultiPassBlock-4       2.67ms ± 1%    2.41ms ± 2%   -9.77%  (p=0.008 n=5+5)

name                 old alloc/op   new alloc/op   delta
DSEPass-4              41.2kB ± 0%     0.1kB ± 0%  -99.68%  (p=0.002 n=6+6)
DSEPassBlock-4          560kB ± 0%       4kB ± 0%  -99.34%  (p=0.026 n=5+6)
CSEPass-4               189kB ± 0%     189kB ± 0%     ~     (all equal)
CSEPassBlock-4         3.10MB ± 0%    3.10MB ± 0%     ~     (p=0.444 n=5+5)
DeadcodePass-4         10.5kB ± 0%    10.5kB ± 0%     ~     (all equal)
DeadcodePassBlock-4     164kB ± 0%     164kB ± 0%     ~     (all equal)
MultiPass-4             240kB ± 0%     199kB ± 0%  -17.06%  (p=0.002 n=6+6)
MultiPassBlock-4       3.60MB ± 0%    2.99MB ± 0%  -17.06%  (p=0.002 n=6+6)

name                 old allocs/op  new allocs/op  delta
DSEPass-4                8.00 ± 0%      4.00 ± 0%  -50.00%  (p=0.002 n=6+6)
DSEPassBlock-4            240 ± 0%       120 ± 0%  -50.00%  (p=0.002 n=6+6)
CSEPass-4                9.00 ± 0%      9.00 ± 0%     ~     (all equal)
CSEPassBlock-4          1.35k ± 0%     1.35k ± 0%     ~     (all equal)
DeadcodePass-4           3.00 ± 0%      3.00 ± 0%     ~     (all equal)
DeadcodePassBlock-4      9.00 ± 0%      9.00 ± 0%     ~     (all equal)
MultiPass-4              11.0 ± 0%      10.0 ± 0%   -9.09%  (p=0.002 n=6+6)
MultiPassBlock-4          165 ± 0%       150 ± 0%   -9.09%  (p=0.002 n=6+6)

Change-Id: I43860687c88f33605eb1415f36473c5cfe8fde4a
Reviewed-on: https://go-review.googlesource.com/98449
Run-TryBot: Daniel Martí <mvdan@mvdan.cc>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2018-03-15 17:24:39 +00:00
Austin Clements
491f409a32 cmd/compile: minor comment improvements/corrections
Change-Id: Ie0934f1528d58d4971cdef726d3e2d23cf3935d3
Reviewed-on: https://go-review.googlesource.com/87475
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Alexandru Moșoi <alexandru@mosoi.ro>
2018-03-08 22:25:21 +00:00
Keith Randall
4b00d3f4a2 cmd/compile: implement comparisons directly with memory
Allow the compiler to generate code like CMPQ 16(AX), $7

It's tricky because it's difficult to spill such a comparison during
flagalloc, because the same memory state might not be available at
the restore locations.

Solve this problem by decomposing the compare+load back into its parts
if it needs to be spilled.

The big win is that the write barrier test goes from:

MOVL	runtime.writeBarrier(SB), CX
TESTL	CX, CX
JNE	60

to

CMPL	runtime.writeBarrier(SB), $0
JNE	59

It's one instruction and one byte smaller.

Fixes #19485
Fixes #15245
Update #22460

Binaries are about 0.15% smaller.

Change-Id: I4fd8d1111b6b9924d52f9a0901ca1b2e5cce0836
Reviewed-on: https://go-review.googlesource.com/86035
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Reviewed-by: Ilya Tocar <ilya.tocar@intel.com>
2018-02-26 23:49:44 +00:00
David Chase
f22cf7131a cmd/compile: use src.NoXPos for entry-block constants
The ssa backend is aggressive about placing constants and
certain other values in the Entry block.  It's implausible
that the original line numbers for these constants makes
any sort of sense when it appears to a user stepping in a
debugger, and they're also not that useful in dumps since
entry-block instructions tend to be constants (i.e.,
unlikely to be the cause of a crash).

Therefore, use src.NoXPos for any values that are explicitly
inserted into a function's entry block.

Passes all tests, including ssa/debug_test.go with both
gdb and a fairly recent dlv.  Hand-verified that it solves
the reported problem; constructed a test that reproduced
a problem, and fixed it.

Modified test harness to allow injection of slightly more
interesting inputs.

Fixes #22558.

Change-Id: I4476927067846bc4366da7793d2375c111694c55
Reviewed-on: https://go-review.googlesource.com/81215
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-12-01 07:09:54 +00:00
Austin Clements
afbe646ab4 cmd/compile: report typedslicecopy write barriers
Most write barrier calls are inserted by SSA, but copy and append are
lowered to runtime.typedslicecopy during walk. Fix these to set
Func.WBPos and emit the "write barrier" warning, as done for the write
barriers inserted by SSA. As part of this, we refactor setting WBPos
and emitting this warning into the frontend so it can be shared by
both walk and SSA.

Change-Id: I5fe9997d9bdb55e03e01dd58aee28908c35f606b
Reviewed-on: https://go-review.googlesource.com/73411
Run-TryBot: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-10-29 20:21:43 +00:00
Keith Randall
770d8d8207 cmd/compile: free value earlier in nilcheck
When we remove a nil check, add it back to the free Value pool immediately.

Fixes #18732

Change-Id: I8d644faabbfb52157d3f2d071150ff0342ac28dc
Reviewed-on: https://go-review.googlesource.com/58810
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-08-25 06:01:26 +00:00
Josh Bleecher Snyder
46b88c9fbc cmd/compile: change ssa.Type into *types.Type
When package ssa was created, Type was in package gc.
To avoid circular dependencies, we used an interface (ssa.Type)
to represent type information in SSA.

In the Go 1.9 cycle, gri extricated the Type type from package gc.
As a result, we can now use it in package ssa.
Now, instead of package types depending on package ssa,
it is the other way.
This is a more sensible dependency tree,
and helps compiler performance a bit.

Though this is a big CL, most of the changes are
mechanical and uninteresting.

Interesting bits:

* Add new singleton globals to package types for the special
  SSA types Memory, Void, Invalid, Flags, and Int128.
* Add two new Types, TSSA for the special types,
  and TTUPLE, for SSA tuple types.
  ssa.MakeTuple is now types.NewTuple.
* Move type comparison result constants CMPlt, CMPeq, and CMPgt
  to package types.
* We had picked the name "types" in our rules for the handy
  list of types provided by ssa.Config. That conflicted with
  the types package name, so change it to "typ".
* Update the type comparison routine to handle tuples and special
  types inline.
* Teach gc/fmt.go how to print special types.
* We can now eliminate ElemTypes in favor of just Elem,
  and probably also some other duplicated Type methods
  designed to return ssa.Type instead of *types.Type.
* The ssa tests were using their own dummy types,
  and they were not particularly careful about types in general.
  Of necessity, this CL switches them to use *types.Type;
  it does not make them more type-accurate.
  Unfortunately, using types.Type means initializing a bit
  of the types universe.
  This is prime for refactoring and improvement.

This shrinks ssa.Value; it now fits in a smaller size class
on 64 bit systems. This doesn't have a giant impact,
though, since most Values are preallocated in a chunk.

name        old alloc/op      new alloc/op      delta
Template         37.9MB ± 0%       37.7MB ± 0%  -0.57%  (p=0.000 n=10+8)
Unicode          28.9MB ± 0%       28.7MB ± 0%  -0.52%  (p=0.000 n=10+10)
GoTypes           110MB ± 0%        109MB ± 0%  -0.88%  (p=0.000 n=10+10)
Flate            24.7MB ± 0%       24.6MB ± 0%  -0.66%  (p=0.000 n=10+10)
GoParser         31.1MB ± 0%       30.9MB ± 0%  -0.61%  (p=0.000 n=10+9)
Reflect          73.9MB ± 0%       73.4MB ± 0%  -0.62%  (p=0.000 n=10+8)
Tar              25.8MB ± 0%       25.6MB ± 0%  -0.77%  (p=0.000 n=9+10)
XML              41.2MB ± 0%       40.9MB ± 0%  -0.80%  (p=0.000 n=10+10)
[Geo mean]       40.5MB            40.3MB       -0.68%

name        old allocs/op     new allocs/op     delta
Template           385k ± 0%         386k ± 0%    ~     (p=0.356 n=10+9)
Unicode            343k ± 1%         344k ± 0%    ~     (p=0.481 n=10+10)
GoTypes           1.16M ± 0%        1.16M ± 0%  -0.16%  (p=0.004 n=10+10)
Flate              238k ± 1%         238k ± 1%    ~     (p=0.853 n=10+10)
GoParser           320k ± 0%         320k ± 0%    ~     (p=0.720 n=10+9)
Reflect            957k ± 0%         957k ± 0%    ~     (p=0.460 n=10+8)
Tar                252k ± 0%         252k ± 0%    ~     (p=0.133 n=9+10)
XML                400k ± 0%         400k ± 0%    ~     (p=0.796 n=10+10)
[Geo mean]         428k              428k       -0.01%


Removing all the interface calls helps non-trivially with CPU, though.

name        old time/op       new time/op       delta
Template          178ms ± 4%        173ms ± 3%  -2.90%  (p=0.000 n=94+96)
Unicode          85.0ms ± 4%       83.9ms ± 4%  -1.23%  (p=0.000 n=96+96)
GoTypes           543ms ± 3%        528ms ± 3%  -2.73%  (p=0.000 n=98+96)
Flate             116ms ± 3%        113ms ± 4%  -2.34%  (p=0.000 n=96+99)
GoParser          144ms ± 3%        140ms ± 4%  -2.80%  (p=0.000 n=99+97)
Reflect           344ms ± 3%        334ms ± 4%  -3.02%  (p=0.000 n=100+99)
Tar               106ms ± 5%        103ms ± 4%  -3.30%  (p=0.000 n=98+94)
XML               198ms ± 5%        192ms ± 4%  -2.88%  (p=0.000 n=92+95)
[Geo mean]        178ms             173ms       -2.65%

name        old user-time/op  new user-time/op  delta
Template          229ms ± 5%        224ms ± 5%  -2.36%  (p=0.000 n=95+99)
Unicode           107ms ± 6%        106ms ± 5%  -1.13%  (p=0.001 n=93+95)
GoTypes           696ms ± 4%        679ms ± 4%  -2.45%  (p=0.000 n=97+99)
Flate             137ms ± 4%        134ms ± 5%  -2.66%  (p=0.000 n=99+96)
GoParser          176ms ± 5%        172ms ± 8%  -2.27%  (p=0.000 n=98+100)
Reflect           430ms ± 6%        411ms ± 5%  -4.46%  (p=0.000 n=100+92)
Tar               128ms ±13%        123ms ±13%  -4.21%  (p=0.000 n=100+100)
XML               239ms ± 6%        233ms ± 6%  -2.50%  (p=0.000 n=95+97)
[Geo mean]        220ms             213ms       -2.76%


Change-Id: I15c7d6268347f8358e75066dfdbd77db24e8d0c1
Reviewed-on: https://go-review.googlesource.com/42145
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-05-09 23:01:51 +00:00
Josh Bleecher Snyder
4ee934ad27 cmd/compile: remove references to *os.File from ssa package
This reduces the size of the ssa export data
by 10%, from 76154 to 67886.

It doesn't appear that #20084, which would do this automatically,
is going to be fixed soon. Do it manually for now.

This speeds up compiling cmd/compile/internal/amd64
and presumably its comrades as well:

name          old time/op       new time/op       delta
CompileAMD64       89.6ms ± 6%       86.7ms ± 5%  -3.29%  (p=0.000 n=49+47)

name          old user-time/op  new user-time/op  delta
CompileAMD64        116ms ± 5%        112ms ± 5%  -3.51%  (p=0.000 n=45+42)

name          old alloc/op      new alloc/op      delta
CompileAMD64       26.7MB ± 0%       25.8MB ± 0%  -3.26%  (p=0.008 n=5+5)

name          old allocs/op     new allocs/op     delta
CompileAMD64         223k ± 0%         213k ± 0%  -4.46%  (p=0.008 n=5+5)

Updates #20084

Change-Id: I49e8951c5bfce63ad2b7f4fc3bfa0868c53114f9
Reviewed-on: https://go-review.googlesource.com/41493
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-04-24 23:58:14 +00:00
Josh Bleecher Snyder
0323895cc0 cmd/compile: catch and report nowritebarrier violations later
Prior to this CL, the SSA backend reported violations
of the //go:nowritebarrier annotation immediately.
This necessitated emitting errors during SSA compilation,
which is not compatible with a concurrent backend.

Instead, check for such violations later.
We already save the data required to do a late check
for violations of the //go:nowritebarrierrec annotation.
Use the same data, and check //go:nowritebarrier at the same time.

One downside to doing this is that now only a single
violation will be reported per function.
Given that this is for the runtime only,
and violations are rare, this seems an acceptable cost.

While we are here, remove several 'nerrors != 0' checks
that are rendered pointless.

Updates #15756
Fixes #19250 (as much as it ever can be)

Change-Id: Ia44c4ad5b6fd6f804d9f88d9571cec8d23665cb3
Reviewed-on: https://go-review.googlesource.com/38973
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-31 16:31:20 +00:00
Josh Bleecher Snyder
b3a8beb9d1 cmd/compile: minor cleanup in debug code
Change-Id: I9885606801b9c8fcb62c16d0856025c4e83e658b
Reviewed-on: https://go-review.googlesource.com/38650
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-03-24 22:21:55 +00:00
Matthew Dempsky
325904fe6a cmd/compile: port liveness analysis to SSA
Passes toolstash-check -all.

Change-Id: I92c3c25d6c053f971f346f4fa3bbc76419b58183
Reviewed-on: https://go-review.googlesource.com/38087
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-20 22:58:50 +00:00
Josh Bleecher Snyder
2cdb7f118a cmd/compile: move Frontend field from ssa.Config to ssa.Func
Suggested by mdempsky in CL 38232.
This allows us to use the Frontend field
to associate frontend state and information
with a function.
See the following CL in the series for examples.

This is a giant CL, but it is almost entirely routine refactoring.

The ssa test API is starting to feel a bit unwieldy.
I will clean it up separately, once the dust has settled.

Passes toolstash -cmp.

Updates #15756

Change-Id: I71c573bd96ff7251935fce1391b06b1f133c3caf
Reviewed-on: https://go-review.googlesource.com/38327
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2017-03-17 23:18:57 +00:00
Josh Bleecher Snyder
88e47187c1 cmd/compile: relocate code from config.go to func.go
This is a follow-up to CL 38167.
Pure code movement.

Change-Id: I13e58f7eac6718c77076d89e13fc721a5205ec57
Reviewed-on: https://go-review.googlesource.com/38322
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-17 05:21:53 +00:00
Josh Bleecher Snyder
a5e3cac895 cmd/compile: rearrange fields between ssa.Func, ssa.Cache, and ssa.Config
This makes ssa.Func, ssa.Cache, and ssa.Config fulfill
the roles laid out for them in CL 38160.

The only non-trivial change in this CL is how cached
values and blocks get IDs. Prior to this CL, their IDs were
assigned as part of resetting the cache, and only modified
IDs were reset. This required knowing how many values and
blocks were modified, which required a tight coupling between
ssa.Func and ssa.Config. To eliminate that coupling,
we now zero values and blocks during reset,
and assign their IDs when they are used.
Since unused values and blocks have ID == 0,
we can efficiently find the last used value/block,
to avoid zeroing everything.
Bulk zeroing is efficient, but not efficient enough
to obviate the need to avoid zeroing everything every time.
As a happy side-effect, ssa.Func.Free is no longer necessary.

DebugHashMatch and friends now belong in func.go.
They have been left in place for clarity and review.
I will move them in a subsequent CL.

Passes toolstash -cmp. No compiler performance impact.
No change in 'go test cmd/compile/internal/ssa' execution time.

Change-Id: I2eb7af58da067ef6a36e815a6f386cfe8634d098
Reviewed-on: https://go-review.googlesource.com/38167
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-17 05:21:42 +00:00
Cherry Zhang
c8f38b3398 cmd/compile: use type information in Aux for Store size
Remove size AuxInt in Store, and alignment in Move/Zero. We still
pass size AuxInt to Move/Zero, as it is used for partial Move/Zero
lowering (e.g. cmd/compile/internal/ssa/gen/386.rules:288).
SizeAndAlign is gone.

Passes "toolstash -cmp" on std.

Change-Id: I1ca34652b65dd30de886940e789fcf41d521475d
Reviewed-on: https://go-review.googlesource.com/38150
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:25:04 +00:00
Cherry Zhang
1b85300602 cmd/compile: clean up SSA-building code
Now that the write barrier insertion is moved to SSA, the SSA
building code can be simplified.

Updates #17583.

Change-Id: I5cacc034b11aa90b0abe6f8dd97e4e3994e2bc25
Reviewed-on: https://go-review.googlesource.com/36840
Run-TryBot: Cherry Zhang <cherryyz@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-16 14:24:40 +00:00
Josh Bleecher Snyder
43afcb5c96 cmd/compile: define roles for ssa.Func, ssa.Config, and ssa.Cache
The line between ssa.Func and ssa.Config has blurred.
Concurrent compilation in the backend will require more precision.
This CL lays out an (aspirational) organization.
The implementation will come in follow-up CLs,
once the organization is settled.

ssa.Config holds basic compiler configuration,
mostly arch-specific information.
It is configured once, early on, and is readonly,
so it is safe for concurrent use.

ssa.Func is a single-shot object used for
compiling a single Func. It is not concurrency-safe
and not re-usable.

ssa.Cache is a multi-use object used to avoid
expensive allocations during compilation.
Each ssa.Func is given an ssa.Cache to use.
ssa.Cache is not concurrency-safe.

Change-Id: Id02809b6f3541541cac6c27bbb598834888ce1cc
Reviewed-on: https://go-review.googlesource.com/38160
Reviewed-by: Keith Randall <khr@golang.org>
2017-03-15 04:27:49 +00:00
David Chase
886e9e6065 cmd/compile: put spills in better places
Previously we always issued a spill right after the op
that was being spilled.  This CL pushes spills father away
from the generator, hopefully pushing them into unlikely branches.
For example:

  x = ...
  if unlikely {
    call ...
  }
  ... use x ...

Used to compile to

  x = ...
  spill x
  if unlikely {
    call ...
    restore x
  }

It now compiles to

  x = ...
  if unlikely {
    spill x
    call ...
    restore x
  }

This is particularly useful for code which appends, as the only
call is an unlikely call to growslice.  It also helps for the
spills needed around write barrier calls.

The basic algorithm is walk down the dominator tree following a
path where the block still dominates all of the restores.  We're
looking for a block that:
 1) dominates all restores
 2) has the value being spilled in a register
 3) has a loop depth no deeper than the value being spilled

The walking-down code is iterative.  I was forced to limit it to
searching 100 blocks so it doesn't become O(n^2).  Maybe one day
we'll find a better way.

I had to delete most of David's code which pushed spills out of loops.
I suspect this CL subsumes most of the cases that his code handled.

Generally positive performance improvements, but hard to tell for sure
with all the noise.  (compilebench times are unchanged.)

name                      old time/op    new time/op    delta
BinaryTree17-12              2.91s ±15%     2.80s ±12%    ~     (p=0.063 n=10+10)
Fannkuch11-12                3.47s ± 0%     3.30s ± 4%  -4.91%   (p=0.000 n=9+10)
FmtFprintfEmpty-12          48.0ns ± 1%    47.4ns ± 1%  -1.32%    (p=0.002 n=9+9)
FmtFprintfString-12         85.6ns ±11%    79.4ns ± 3%  -7.27%  (p=0.005 n=10+10)
FmtFprintfInt-12            91.8ns ±10%    85.9ns ± 4%    ~      (p=0.203 n=10+9)
FmtFprintfIntInt-12          135ns ±13%     127ns ± 1%  -5.72%   (p=0.025 n=10+9)
FmtFprintfPrefixedInt-12     167ns ± 1%     168ns ± 2%    ~      (p=0.580 n=9+10)
FmtFprintfFloat-12           249ns ±11%     230ns ± 1%  -7.32%  (p=0.000 n=10+10)
FmtManyArgs-12               504ns ± 7%     506ns ± 1%    ~       (p=0.198 n=9+9)
GobDecode-12                6.95ms ± 1%    7.04ms ± 1%  +1.37%  (p=0.001 n=10+10)
GobEncode-12                6.32ms ±13%    6.04ms ± 1%    ~     (p=0.063 n=10+10)
Gzip-12                      233ms ± 1%     235ms ± 0%  +1.01%   (p=0.000 n=10+9)
Gunzip-12                   40.1ms ± 1%    39.6ms ± 0%  -1.12%   (p=0.000 n=10+8)
HTTPClientServer-12          227µs ± 9%     221µs ± 5%    ~       (p=0.114 n=9+8)
JSONEncode-12               16.1ms ± 2%    15.8ms ± 1%  -2.09%    (p=0.002 n=9+8)
JSONDecode-12               61.8ms ±11%    57.9ms ± 1%  -6.30%   (p=0.000 n=10+9)
Mandelbrot200-12            4.30ms ± 3%    4.28ms ± 1%    ~      (p=0.203 n=10+8)
GoParse-12                  3.18ms ± 2%    3.18ms ± 2%    ~     (p=0.579 n=10+10)
RegexpMatchEasy0_32-12      76.7ns ± 1%    77.5ns ± 1%  +0.92%    (p=0.002 n=9+8)
RegexpMatchEasy0_1K-12       239ns ± 3%     239ns ± 1%    ~     (p=0.204 n=10+10)
RegexpMatchEasy1_32-12      71.4ns ± 1%    70.6ns ± 0%  -1.15%   (p=0.000 n=10+9)
RegexpMatchEasy1_1K-12       383ns ± 2%     390ns ±10%    ~       (p=0.181 n=8+9)
RegexpMatchMedium_32-12      114ns ± 0%     113ns ± 1%  -0.88%    (p=0.000 n=9+8)
RegexpMatchMedium_1K-12     36.3µs ± 1%    36.8µs ± 1%  +1.59%   (p=0.000 n=10+8)
RegexpMatchHard_32-12       1.90µs ± 1%    1.90µs ± 1%    ~     (p=0.341 n=10+10)
RegexpMatchHard_1K-12       59.4µs ±11%    57.8µs ± 1%    ~      (p=0.968 n=10+9)
Revcomp-12                   461ms ± 1%     462ms ± 1%    ~       (p=1.000 n=9+9)
Template-12                 67.5ms ± 1%    66.3ms ± 1%  -1.77%   (p=0.000 n=10+8)
TimeParse-12                 314ns ± 3%     309ns ± 0%  -1.56%    (p=0.000 n=9+8)
TimeFormat-12                340ns ± 2%     331ns ± 1%  -2.79%  (p=0.000 n=10+10)

The go binary is 0.2% larger.  Not really sure why the size
would change.

Change-Id: Ia5116e53a3aeb025ef350ffc51c14ae5cc17871c
Reviewed-on: https://go-review.googlesource.com/34822
Reviewed-by: David Chase <drchase@google.com>
2017-03-15 02:09:25 +00:00
Josh Bleecher Snyder
3dcfce8d19 cmd/compile: add OpOffPtr [c] SP to constant cache
They accounted for almost 30% of all CSE'd values.

By never creating the duplicates in the first place,
we reduce the high water mark of Value IDs,
which in turn makes all SSA phases cheaper,
particularly regalloc.

name       old time/op     new time/op     delta
Template       200ms ± 3%      198ms ± 4%  -0.87%  (p=0.016 n=50+49)
Unicode       86.9ms ± 2%     85.5ms ± 3%  -1.56%  (p=0.000 n=49+50)
GoTypes        553ms ± 4%      551ms ± 4%    ~     (p=0.183 n=50+49)
SSA            3.97s ± 3%      3.93s ± 2%  -1.06%  (p=0.000 n=48+48)
Flate          124ms ± 4%      124ms ± 3%    ~     (p=0.545 n=48+50)
GoParser       146ms ± 4%      146ms ± 4%    ~     (p=0.810 n=49+49)
Reflect        357ms ± 3%      355ms ± 3%  -0.59%  (p=0.049 n=50+48)
Tar            106ms ± 4%      107ms ± 5%    ~     (p=0.454 n=49+50)
XML            203ms ± 4%      203ms ± 4%    ~     (p=0.726 n=48+50)

name       old user-ns/op  new user-ns/op  delta
Template        237M ± 3%       235M ± 4%    ~     (p=0.208 n=47+48)
Unicode         111M ± 4%       108M ± 9%  -2.50%  (p=0.000 n=47+50)
GoTypes         736M ± 5%       729M ± 4%  -0.95%  (p=0.017 n=50+46)
SSA            5.73G ± 4%      5.74G ± 4%    ~     (p=0.765 n=50+50)
Flate           150M ± 5%       148M ± 6%  -0.89%  (p=0.045 n=48+47)
GoParser        180M ± 5%       178M ± 7%  -1.34%  (p=0.012 n=50+50)
Reflect         450M ± 4%       444M ± 4%  -1.40%  (p=0.000 n=50+49)
Tar             124M ± 7%       123M ± 7%    ~     (p=0.092 n=50+50)
XML             248M ± 6%       245M ± 5%    ~     (p=0.057 n=50+50)

name       old alloc/op    new alloc/op    delta
Template      39.4MB ± 0%     39.3MB ± 0%  -0.37%  (p=0.000 n=50+50)
Unicode       30.9MB ± 0%     30.9MB ± 0%  -0.27%  (p=0.000 n=48+50)
GoTypes        114MB ± 0%      113MB ± 0%  -1.03%  (p=0.000 n=50+49)
SSA            882MB ± 0%      865MB ± 0%  -1.95%  (p=0.000 n=49+49)
Flate         25.8MB ± 0%     25.7MB ± 0%  -0.21%  (p=0.000 n=50+50)
GoParser      31.7MB ± 0%     31.6MB ± 0%  -0.33%  (p=0.000 n=50+50)
Reflect       79.7MB ± 0%     79.3MB ± 0%  -0.49%  (p=0.000 n=44+49)
Tar           27.2MB ± 0%     27.1MB ± 0%  -0.31%  (p=0.000 n=50+50)
XML           42.7MB ± 0%     42.3MB ± 0%  -1.05%  (p=0.000 n=48+49)

name       old allocs/op   new allocs/op   delta
Template        379k ± 1%       380k ± 1%  +0.26%  (p=0.000 n=50+50)
Unicode         324k ± 1%       324k ± 1%    ~     (p=0.964 n=49+50)
GoTypes        1.14M ± 0%      1.15M ± 0%  +0.14%  (p=0.000 n=50+49)
SSA            7.89M ± 0%      7.89M ± 0%  -0.05%  (p=0.000 n=49+49)
Flate           240k ± 1%       241k ± 1%  +0.27%  (p=0.001 n=50+50)
GoParser        310k ± 1%       311k ± 1%  +0.48%  (p=0.000 n=50+49)
Reflect        1.00M ± 0%      1.00M ± 0%  +0.17%  (p=0.000 n=48+50)
Tar             254k ± 1%       255k ± 1%  +0.23%  (p=0.005 n=50+50)
XML             395k ± 1%       395k ± 1%  +0.19%  (p=0.002 n=49+47)

Change-Id: Iaa8f5f37e23bd81983409f7359f9dcd4dfe2961f
Reviewed-on: https://go-review.googlesource.com/38003
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-03-10 16:50:58 +00:00
Josh Bleecher Snyder
d11a2184fb cmd/compile: allow earlier GC of freed constant value
Minor fix, because it's the right thing to do.
No significant impact.

Change-Id: I2138285d397494daa9a88c414149c2a7860edd7e
Reviewed-on: https://go-review.googlesource.com/38001
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-10 01:39:09 +00:00
Josh Bleecher Snyder
c63ad970f6 cmd/compile: rename Func.constVal arg for clarity
Values have an Aux and an AuxInt.
We're setting AuxInt, not Aux.
Say so.

Change-Id: I41aa783273bb7e1ba47c941aa4233f818e37dadd
Reviewed-on: https://go-review.googlesource.com/37997
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
2017-03-09 23:39:01 +00:00
Josh Bleecher Snyder
f791b288d1 cmd/compile: remove some allocs from CSE
Pick up a few pennies:

* CSE gets run twice for each function,
but the set of Aux values doesn't change.
Avoid populating it twice.

* Don't bother populating auxmap for values
that can't be CSE'd anyway.

name       old alloc/op     new alloc/op     delta
Template       41.0MB ± 0%      40.7MB ± 0%  -0.61%  (p=0.008 n=5+5)
Unicode        32.3MB ± 0%      32.3MB ± 0%  -0.22%  (p=0.008 n=5+5)
GoTypes         122MB ± 0%       121MB ± 0%  -0.55%  (p=0.008 n=5+5)
Compiler        482MB ± 0%       479MB ± 0%  -0.58%  (p=0.008 n=5+5)
SSA             865MB ± 0%       862MB ± 0%  -0.35%  (p=0.008 n=5+5)
Flate          26.5MB ± 0%      26.5MB ± 0%    ~     (p=0.056 n=5+5)
GoParser       32.6MB ± 0%      32.4MB ± 0%  -0.58%  (p=0.008 n=5+5)
Reflect        84.2MB ± 0%      83.8MB ± 0%  -0.57%  (p=0.008 n=5+5)
Tar            27.7MB ± 0%      27.6MB ± 0%  -0.37%  (p=0.008 n=5+5)
XML            44.7MB ± 0%      44.5MB ± 0%  -0.53%  (p=0.008 n=5+5)

name       old allocs/op    new allocs/op    delta
Template         373k ± 0%        373k ± 1%    ~     (p=1.000 n=5+5)
Unicode          326k ± 0%        325k ± 0%    ~     (p=0.548 n=5+5)
GoTypes         1.16M ± 0%       1.16M ± 0%    ~     (p=0.841 n=5+5)
Compiler        4.16M ± 0%       4.15M ± 0%    ~     (p=0.222 n=5+5)
SSA             7.57M ± 0%       7.56M ± 0%  -0.22%  (p=0.008 n=5+5)
Flate            238k ± 1%        239k ± 1%    ~     (p=0.690 n=5+5)
GoParser         304k ± 0%        304k ± 0%    ~     (p=1.000 n=5+5)
Reflect         1.01M ± 0%       1.00M ± 0%  -0.31%  (p=0.016 n=4+5)
Tar              245k ± 0%        245k ± 1%    ~     (p=0.548 n=5+5)
XML              393k ± 0%        391k ± 1%    ~     (p=0.095 n=5+5)

Change-Id: I78f1ffe129bd8fd590b7511717dd2bf9f5ecbd6d
Reviewed-on: https://go-review.googlesource.com/36690
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Daniel Martí <mvdan@mvdan.cc>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2017-02-09 20:42:46 +00:00
Matthew Dempsky
5c90e1cf8a cmd/compile/internal/ssa: remove Func.StaticData field
Rather than collecting static data nodes to be written out later, just
write them out immediately.

Change-Id: I51708b690e94bc3e288b4d6ba3307bf738a80f64
Reviewed-on: https://go-review.googlesource.com/36352
Run-TryBot: Matthew Dempsky <mdempsky@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-02-04 01:09:26 +00:00
Josh Bleecher Snyder
57546d67ec cmd/compile: add reusable []Location to ssa.Config
name       old time/op      new time/op      delta
Template        218ms ± 3%       214ms ± 3%  -1.70%  (p=0.000 n=30+30)
Unicode         100ms ± 3%       100ms ± 4%    ~     (p=0.614 n=29+30)
GoTypes         657ms ± 1%       660ms ± 3%  +0.46%  (p=0.046 n=29+30)
Compiler        2.80s ± 2%       2.80s ± 1%    ~     (p=0.451 n=28+29)
Flate           131ms ± 2%       132ms ± 4%    ~     (p=1.000 n=29+29)
GoParser        159ms ± 3%       160ms ± 5%    ~     (p=0.341 n=28+30)
Reflect         406ms ± 3%       408ms ± 4%    ~     (p=0.511 n=28+30)
Tar             118ms ± 4%       118ms ± 4%    ~     (p=0.827 n=29+30)
XML             222ms ± 6%       222ms ± 3%    ~     (p=0.532 n=30+30)

name       old user-ns/op   new user-ns/op   delta
Template   274user-ms ± 3%  272user-ms ± 3%  -0.87%  (p=0.015 n=29+30)
Unicode    140user-ms ± 4%  140user-ms ± 3%    ~     (p=0.735 n=29+30)
GoTypes    890user-ms ± 1%  897user-ms ± 2%  +0.88%  (p=0.002 n=29+30)
Compiler   3.88user-s ± 2%  3.89user-s ± 1%    ~     (p=0.132 n=30+29)
Flate      168user-ms ± 2%  157user-ms ± 4%  -6.21%  (p=0.000 n=25+28)
GoParser   211user-ms ± 2%  213user-ms ± 5%    ~     (p=0.086 n=28+30)
Reflect    539user-ms ± 2%  541user-ms ± 3%    ~     (p=0.267 n=27+29)
Tar        156user-ms ± 7%  155user-ms ± 5%    ~     (p=0.708 n=30+30)
XML        291user-ms ± 5%  294user-ms ± 3%  +0.83%  (p=0.029 n=29+30)

name       old alloc/op     new alloc/op     delta
Template       40.7MB ± 0%      39.4MB ± 0%  -3.26%  (p=0.000 n=29+26)
Unicode        30.8MB ± 0%      30.7MB ± 0%  -0.40%  (p=0.000 n=28+30)
GoTypes         123MB ± 0%       119MB ± 0%  -3.47%  (p=0.000 n=30+29)
Compiler        472MB ± 0%       455MB ± 0%  -3.60%  (p=0.000 n=30+30)
Flate          26.5MB ± 0%      25.6MB ± 0%  -3.21%  (p=0.000 n=28+30)
GoParser       32.3MB ± 0%      31.4MB ± 0%  -2.98%  (p=0.000 n=29+30)
Reflect        84.4MB ± 0%      82.1MB ± 0%  -2.83%  (p=0.000 n=30+30)
Tar            27.3MB ± 0%      26.5MB ± 0%  -2.70%  (p=0.000 n=29+29)
XML            44.6MB ± 0%      43.1MB ± 0%  -3.49%  (p=0.000 n=30+30)

name       old allocs/op    new allocs/op    delta
Template         401k ± 1%        399k ± 0%  -0.35%  (p=0.000 n=30+28)
Unicode          331k ± 0%        331k ± 1%    ~     (p=0.907 n=28+30)
GoTypes         1.24M ± 0%       1.23M ± 0%  -0.43%  (p=0.000 n=30+30)
Compiler        4.26M ± 0%       4.25M ± 0%  -0.34%  (p=0.000 n=29+30)
Flate            252k ± 1%        251k ± 1%  -0.41%  (p=0.000 n=30+30)
GoParser         325k ± 1%        324k ± 1%  -0.31%  (p=0.000 n=27+30)
Reflect         1.06M ± 0%       1.05M ± 0%  -0.69%  (p=0.000 n=30+30)
Tar              266k ± 1%        265k ± 1%  -0.51%  (p=0.000 n=29+30)
XML              416k ± 1%        415k ± 1%  -0.36%  (p=0.002 n=30+30)

Change-Id: I8f784001324df83b2764c44f0e83a540e5beab34
Reviewed-on: https://go-review.googlesource.com/36212
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
Reviewed-by: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-02-02 22:39:32 +00:00
Russ Cox
47ce87877b all: merge dev.inline into master
Change-Id: I7715581a04e513dcda9918e853fa6b1ddc703770
2017-02-01 09:47:23 -05:00
Robert Griesemer
472c792e0a [dev.inline] cmd/internal/src: introduce compact source position representation
XPos is a compact (8 instead of 16 bytes on a 64bit machine) source
position representation. There is a 1:1 correspondence between each
XPos and each regular Pos, translated via a global table.

In some sense this brings back the LineHist, though positions can
track line and column information; there is a O(1) translation
between the representations (no binary search), and the translation
is factored out.

The size increase with the prior change is brought down again and
the compiler speed is in line with the master repo (measured on
the same "quiet" machine as for prior change):

name       old time/op     new time/op     delta
Template       256ms ± 1%      262ms ± 2%    ~             (p=0.063 n=5+4)
Unicode        132ms ± 1%      135ms ± 2%    ~             (p=0.063 n=5+4)
GoTypes        891ms ± 1%      871ms ± 1%  -2.28%          (p=0.016 n=5+4)
Compiler       3.84s ± 2%      3.89s ± 2%    ~             (p=0.413 n=5+4)
MakeBash       47.1s ± 1%      46.2s ± 2%    ~             (p=0.095 n=5+5)

name       old user-ns/op  new user-ns/op  delta
Template        309M ± 1%       314M ± 2%    ~             (p=0.111 n=5+4)
Unicode         165M ± 1%       172M ± 9%    ~             (p=0.151 n=5+5)
GoTypes        1.14G ± 2%      1.12G ± 1%    ~             (p=0.063 n=5+4)
Compiler       5.00G ± 1%      4.96G ± 1%    ~             (p=0.286 n=5+4)

Change-Id: Icc570cc60ab014d8d9af6976f1f961ab8828cc47
Reviewed-on: https://go-review.googlesource.com/34506
Run-TryBot: Robert Griesemer <gri@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
Reviewed-by: Austin Clements <austin@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-01-09 22:43:22 +00:00
David Chase
7f1ff65c39 cmd/compile: insert scheduling checks on loop backedges
Loop breaking with a counter.  Benchmarked (see comments),
eyeball checked for sanity on popular loops.  This code
ought to handle loops in general, and properly inserts phi
functions in cases where the earlier version might not have.

Includes test, plus modifications to test/run.go to deal with
timeout and killing looping test.  Tests broken by the addition
of extra code (branch frequency and live vars) for added
checks turn the check insertion off.

If GOEXPERIMENT=preemptibleloops, the compiler inserts reschedule
checks on every backedge of every reducible loop.  Alternately,
specifying GO_GCFLAGS=-d=ssa/insert_resched_checks/on will
enable it for a single compilation, but because the core Go
libraries contain some loops that may run long, this is less
likely to have the desired effect.

This is intended as a tool to help in the study and diagnosis
of GC and other latency problems, now that goal STW GC latency
is on the order of 100 microseconds or less.

Updates #17831.
Updates #10958.

Change-Id: I6206c163a5b0248e3f21eb4fc65f73a179e1f639
Reviewed-on: https://go-review.googlesource.com/33910
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2017-01-09 21:01:29 +00:00
Robert Griesemer
c10499b539 [dev.inline] cmd/compile/internal/ssa: another round of renames from line -> pos (cleanup)
Mostly mechanical renames. Make variable names consistent with use.

Change-Id: Iaa89d31deab11eca6e784595b58e779ad525c8a3
Reviewed-on: https://go-review.googlesource.com/34146
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-12-08 23:10:30 +00:00
Robert Griesemer
cfd17f51c8 [dev.inline] cmd/compile/internal/ssa: rename various fields from Line to Pos
This is a mostly mechanical rename followed by manual fixes where necessary.

Change-Id: Ie5c670b133db978f15dc03e50dc2da0c80fc8842
Reviewed-on: https://go-review.googlesource.com/34137
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:36:52 +00:00
Robert Griesemer
24597c080b [dev.inline] cmd/compile: introduce cmd/internal/src.Pos type for line numbers
This is a step toward chosing a different position representation.
By introducing an explicit type, it will be easier to make the
transition step-wise while ensuring everything keeps running.

This has been reviewed via https://go-review.googlesource.com/#/c/34025/.

Change-Id: Ibceddcd62d8f346321ac3250e3940e9c436ed684
Reviewed-on: https://go-review.googlesource.com/34132
Run-TryBot: Robert Griesemer <gri@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Lazar <lazard@golang.org>
2016-12-08 21:26:25 +00:00
David Chase
a190f3c8a3 cmd/compile: enable flag-specified dump of specific phase+function
For very large input files, use of GOSSAFUNC to obtain a dump
after compilation steps can lead to both unwieldy large output
files and unwieldy larger processes (because the output is
buffered in a string).  This flag

  -d=ssa/<phase>/dump:<function name>

provides finer control of what is dumped, into a smaller
file, and with less memory overhead in the running compiler.
The special phase name "build" is added to allow printing
of the just-built ssa before any transformations are applied.

This was helpful in making sense of the gogo/protobuf
problems.

The output format was tweaked to remove gratuitous spaces,
and a crude -d=ssa/help help text was added.

Change-Id: If7516e22203420eb6ed3614f7cee44cb9260f43e
Reviewed-on: https://go-review.googlesource.com/23044
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
2016-10-20 22:23:56 +00:00
Hajime Hoshi
c5368123fe cmd/compile: remove redundant function idom
Change-Id: Ib14b5421bb5e407bbd4d3cbfc68c92d3dd257cb1
Reviewed-on: https://go-review.googlesource.com/30732
Run-TryBot: Brad Fitzpatrick <bradfitz@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Matthew Dempsky <mdempsky@google.com>
2016-10-11 16:43:12 +00:00
Keith Randall
5a6e511c61 cmd/compile: Use Sreedhar+Gao phi building algorithm
Should be more asymptotically happy.

We process each variable in turn to find all the
locations where it needs a phi (the dominance frontier
of all of its definitions).  Then we add all those phis.
This takes O(n * #variables), although hopefully much less.

Then we do a single tree walk to match all the
FwdRefs with the nearest definition or phi.
This takes O(n) time.

The one remaining inefficiency is that we might end up
introducing a bunch of dead phis in the first step.
A TODO is to introduce phis only where they might be
used by a read.

The old algorithm is still faster on small functions,
so there's a cutover size (currently 500 blocks).

This algorithm supercedes the David's sparse phi
placement algorithm for large functions.

Lowers compile time of example from #14934 from
~10 sec to ~4 sec.
Lowers compile time of example from #16361 from
~4.5 sec to ~3 sec.
Lowers #16407 from ~20 min to ~30 sec.

Update #14934
Update #16361
Fixes #16407

Change-Id: I1cff6364e1623c143190b6a924d7599e309db58f
Reviewed-on: https://go-review.googlesource.com/30163
Reviewed-by: David Chase <drchase@google.com>
2016-10-03 20:30:08 +00:00
Keith Randall
75ce89c20d cmd/compile: cache CFG-dependent computations
We compute a lot of stuff based off the CFG: postorder traversal,
dominators, dominator tree, loop nest.  Multiple phases use this
information and we end up recomputing some of it.  Add a cache
for this information so if the CFG hasn't changed, we can reuse
the previous computation.

Change-Id: I9b5b58af06830bd120afbee9cfab395a0a2f74b2
Reviewed-on: https://go-review.googlesource.com/29356
Reviewed-by: David Chase <drchase@google.com>
2016-09-19 16:00:13 +00:00
Keith Randall
167e381f40 cmd/compile: make ssa compilation unconditional
Rip out the code that allows SSA to be used conditionally.

No longer exists:
 ssa=0 flag
 GOSSAHASH
 GOSSAPKG
 SSATEST

GOSSAFUNC now only controls the printing of the IR/html.

Still need to rip out all of the old backend.  It should no longer be
callable after this CL.

Update #16357

Change-Id: Ib30cc18fba6ca52232c41689ba610b0a94aa74f5
Reviewed-on: https://go-review.googlesource.com/29155
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-09-14 17:38:04 +00:00
Keith Randall
84aac622a4 cmd/compile: intrinsify the rest of runtime/internal/atomic for amd64
Atomic swap, add/and/or, compare and swap.

Also works on amd64p32.

Change-Id: Idf2d8f3e1255f71deba759e6e75e293afe4ab2ba
Reviewed-on: https://go-review.googlesource.com/27813
Reviewed-by: Cherry Zhang <cherryyz@google.com>
2016-08-28 16:31:08 +00:00
David Chase
6b99fb5bea cmd/compile: use sparse algorithm for phis in large program
This adds a sparse method for locating nearest ancestors
in a dominator tree, and checks blocks with more than one
predecessor for differences and inserts phi functions where
there are.

Uses reversed post order to cut number of passes, running
it from first def to last use ("last use" for paramout and
mem is end-of-program; last use for a phi input from a
backedge is the source of the back edge)

Includes a cutover from old algorithm to new to avoid paying
large constant factor for small programs.  This keeps normal
builds running at about the same time, while not running
over-long on large machine-generated inputs.

Add "phase" flags for ssa/build -- ssa/build/stats prints
number of blocks, values (before and after linking references
and inserting phis, so expansion can be measured), and their
product; the product governs the cutover, where a good value
seems to be somewhere between 1 and 5 million.

Among the files compiled by make.bash, this is the shape of
the tail of the distribution for #blocks, #vars, and their
product:

	 #blocks	#vars	    product
 max	6171	28180	173,898,780
99.9%	1641	 6548	 10,401,878
  99%	 463	 1909	    873,721
  95%	 152	  639	     95,235
  90%	  84	  359	     30,021

The old algorithm is indeed usually fastest, for 99%ile
values of usually.

The fix to LookupVarOutgoing
( https://go-review.googlesource.com/#/c/22790/ )
deals with some of the same problems addressed by this CL,
but on at least one bug ( #15537 ) this change is still
a significant help.

With this CL:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	4m35.200s
user	13m16.644s
sys	0m36.712s
and pprof reports 3.4GB allocated in one of the larger profiles

With tip:
/tmp/gopath$ rm -rf pkg bin
/tmp/gopath$ time go get -v -gcflags -memprofile=y.mprof \
   github.com/gogo/protobuf/test/theproto3/combos/...
...
real	10m36.569s
user	25m52.286s
sys	4m3.696s
and pprof reports 8.3GB allocated in the same larger profile

With this CL, most of the compilation time on the benchmarked
input is spent in register/stack allocation (cumulative 53%)
and in the sparse lookup algorithm itself (cumulative 20%).

Fixes #15537.

Change-Id: Ia0299dda6a291534d8b08e5f9883216ded677a00
Reviewed-on: https://go-review.googlesource.com/22342
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: David Chase <drchase@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-16 21:08:05 +00:00
Keith Randall
4fa050024f cmd/compile: enable constant-time CFG editing
Provide indexes along with block pointers for Preds
and Succs arrays.  This allows us to splice edges in
and out of those arrays in constant time.

Fixes worst-case O(n^2) behavior in deadcode and fuse.

benchmark                     old ns/op      new ns/op     delta
BenchmarkFuse1-8              2065           2057          -0.39%
BenchmarkFuse10-8             9408           9073          -3.56%
BenchmarkFuse100-8            105238         76277         -27.52%
BenchmarkFuse1000-8           3982562        1026750       -74.22%
BenchmarkFuse10000-8          301220329      12824005      -95.74%
BenchmarkDeadCode1-8          1588           1566          -1.39%
BenchmarkDeadCode10-8         4333           4250          -1.92%
BenchmarkDeadCode100-8        32031          32574         +1.70%
BenchmarkDeadCode1000-8       590407         468275        -20.69%
BenchmarkDeadCode10000-8      17822890       5000818       -71.94%
BenchmarkDeadCode100000-8     1388706640     78021127      -94.38%
BenchmarkDeadCode200000-8     5372518479     168598762     -96.86%

Change-Id: Iccabdbb9343fd1c921ba07bbf673330a1c36ee17
Reviewed-on: https://go-review.googlesource.com/22589
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2016-05-05 15:58:59 +00:00
Josh Bleecher Snyder
2563b6f9fe cmd/compile/internal/ssa: use Compare instead of Equal
They have different semantics.

Equal is stricter and is designed for the front-end.
Compare is looser and cheaper and is designed for the back-end.
To avoid possible regression, remove Equal from ssa.Type.

Updates #15043

Change-Id: Ie23ce75ff6b4d01b7982e0a89e6f81b5d099d8d6
Reviewed-on: https://go-review.googlesource.com/21483
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-17 04:50:45 +00:00
Alexandru Moșoi
9743e4b031 cmd/compile: share dominator tree among many passes
These passes do not modify the dominator tree too much.

% benchstat old.txt new.txt
name       old time/op     new time/op     delta
Template       335ms ± 3%      325ms ± 8%    ~             (p=0.074 n=8+9)
GoTypes        1.05s ± 1%      1.05s ± 3%    ~            (p=0.095 n=9+10)
Compiler       5.37s ± 4%      5.29s ± 1%  -1.42%         (p=0.022 n=9+10)
MakeBash       34.9s ± 3%      34.4s ± 2%    ~            (p=0.095 n=9+10)

name       old alloc/op    new alloc/op    delta
Template      55.4MB ± 0%     54.9MB ± 0%  -0.81%        (p=0.000 n=10+10)
GoTypes        179MB ± 0%      178MB ± 0%  -0.89%        (p=0.000 n=10+10)
Compiler       807MB ± 0%      798MB ± 0%  -1.10%        (p=0.000 n=10+10)

name       old allocs/op   new allocs/op   delta
Template        498k ± 0%       496k ± 0%  -0.29%          (p=0.000 n=9+9)
GoTypes        1.42M ± 0%      1.41M ± 0%  -0.24%        (p=0.000 n=10+10)
Compiler       5.61M ± 0%      5.60M ± 0%  -0.12%        (p=0.000 n=10+10)

Change-Id: I4cd20cfba3f132ebf371e16046ab14d7e42799ec
Reviewed-on: https://go-review.googlesource.com/21806
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2016-04-12 14:44:26 +00:00
Keith Randall
7e40627a0e cmd/compile: zero all three argstorage slots
These changes were missed when going from 2 to 3 argstorage slots.
https://go-review.googlesource.com/20296/

Change-Id: I930a307bb0b695bf1ae088030c9bbb6d14ca31d2
Reviewed-on: https://go-review.googlesource.com/21841
Reviewed-by: Brad Fitzpatrick <bradfitz@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2016-04-11 20:49:22 +00:00