* It does very simple bounds checking elimination. E.g.
removes the second check in for i := range a { a[i]++; a[i++]; }
* Improves on the following redundant expression:
return a6 || (a6 || (a6 || a4)) || (a6 || (a4 || a6 || (false || a6)))
* Linear in the number of block edges.
I patched in CL 12960 that does bounds, nil and constant propagation
to make sure this CL is not just redundant. Size of pkg/tool/linux_amd64/*
(excluding compile which is affected by this change):
With IsInBounds and IsSliceInBounds
-this -12960 92285080
+this -12960 91947416
-this +12960 91978976
+this +12960 91923088
Gain is ~110% of 12960.
Without IsInBounds and IsSliceInBounds (older run)
-this -12960 95515512
+this -12960 95492536
-this +12960 95216920
+this +12960 95204440
Shaves 22k on its own.
* Can we handle IsInBounds better with this? In
for i := range a { a[i]++; } the bounds checking at a[i]
is not eliminated.
Change-Id: I98957427399145fb33693173fd4d5a8d71c7cc20
Reviewed-on: https://go-review.googlesource.com/19710
Reviewed-by: David Chase <drchase@google.com>
Reviewed-by: Keith Randall <khr@golang.org>
Run-TryBot: Alexandru Moșoi <alexandru@mosoi.ro>
TryBot-Result: Gobot Gobot <gobot@golang.org>
The -d compiler flag can also specify ssa phase and flag,
for example -d=ssa/generic_cse/time,ssa/generic_cse/stats
Spaces in the phase names can be specified with an
underscore. Flags currently parsed (not necessarily
recognized by the phases yet) are:
on, off, mem, time, debug, stats, and test
On, off and time are handled in the harness,
debug, stats, and test are interpreted by the phase itself.
The pass is now attached to the Func being compiled, and a
new method logStats(key, ...value) on *Func to encourage a
semi-standardized format for that output. Output fields
are separated by tabs to ease digestion by awk and
spreadsheets. For example,
if f.pass.stats > 0 {
f.logStat("CSE REWRITES", rewrites)
}
Change-Id: I16db2b5af64c50ca9a47efeb51d961147a903abc
Reviewed-on: https://go-review.googlesource.com/19885
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
Add an initial cse pass that only operates on zero argument
values. This removes the need for a special case in cse for removing
OpSB and speeds up arithConst_ssa.go compilation by 9% while slowing
"test -c net/http" by 1.5%.
Change-Id: Id1500482485426f66c6c2eba75eeaf4f19c8a889
Reviewed-on: https://go-review.googlesource.com/19454
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
A first pass to decompose user types (structs, maybe
arrays someday), and a second pass to decompose builtin
types (strings, interfaces, slices, complex). David wants
this for value range analysis so he can have structs decomposed
but slices and friends will still be intact and he can deduce
things like the length of a slice is >= 0.
Change-Id: Ia2300d07663329b51ed6270cfed21d31980daa7c
Reviewed-on: https://go-review.googlesource.com/19340
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: David Chase <drchase@google.com>
Problem was caused by use of Args[].Aux differences
in early partitioning. This artificially separated
two equivalent expressions because sort ignores the
Aux field, hence things can end with equal things
separated by unequal things and thus the equal things
are split into more than one partition. For example:
SliceLen(a), SliceLen(b), SliceLen(a).
Fix: don't use Args[].Aux in initial partitioning.
Left in a debugging flag and some debugging Fprintf's;
not sure if that is house style or not. We'll probably
want to be more systematic in our naming conventions,
e.g. ssa.cse, ssa.scc, etc.
Change-Id: Ib1412539cc30d91ea542c0ac7b2f9b504108ca7f
Reviewed-on: https://go-review.googlesource.com/19316
Reviewed-by: Keith Randall <khr@golang.org>
Converted working slices of pointer into slices of pointer
index. Half the size (on 64-bit machine) and no pointers
to trace if GC occurs while they're live.
TODO - could expose slice mapping ID->*Block; some dom
clients also construct these.
Minor optimization in regalloc that cuts allocation count.
Minor optimization in compile.go that cuts calls to Sprintf.
Change-Id: I28f0bfed422b7344af333dc52ea272441e28e463
Reviewed-on: https://go-review.googlesource.com/19104
Run-TryBot: Todd Neal <todd@tneal.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Todd Neal <todd@tneal.org>
From memory profiling, about 3% reduction in allocation count.
Change-Id: I4b662d55b8a94fe724759a2b22f05a08d0bf40f8
Reviewed-on: https://go-review.googlesource.com/19103
Reviewed-by: Keith Randall <khr@golang.org>
Compiling && and || expressions often leads to control
flow of the following form:
p:
If a goto b else c
b: <- p ...
x = phi(a, ...)
If x goto t else u
Note that if we take the edge p->b, then we are guaranteed
to take the edge b->t also. So in this situation, we might
as well go directly from p to t.
Change-Id: I6974f1e6367119a2ddf2014f9741fdb490edcc12
Reviewed-on: https://go-review.googlesource.com/18910
Reviewed-by: David Chase <drchase@google.com>
The OpSB hack didn't quite work. We need to really
CSE these ops to make regalloc happy.
Change-Id: I9f4d7bfb0929407c84ee60c9e25ff0c0fbea84af
Reviewed-on: https://go-review.googlesource.com/19083
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
It is one of the slowest compiler phases right now, and we
run two of them.
Instead of using a map to make the initial partition, use a sort.
It is much less memory intensive.
Do a few optimizations to avoid work for size-1 equivalence classes.
Implement -N.
Change-Id: I1d2d85d3771abc918db4dd7cc30b0b2d854b15e1
Reviewed-on: https://go-review.googlesource.com/19024
Reviewed-by: David Chase <drchase@google.com>
Empty blocks are introduced to remove critical edges.
After regalloc, we can remove any of the added blocks
that are still empty.
Change-Id: I0b40e95ac3a6cc1e632a479443479532b6c5ccd9
Reviewed-on: https://go-review.googlesource.com/18833
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Redo how we keep track of forward references when building SSA.
When the forward reference is resolved, update the Value node
in place.
Improve the phi elimination pass so it can simplify phis of phis.
Give SSA package access to decoded line numbers. Fix line numbers
for constant booleans.
Change-Id: I3dc9896148d260be2f3dd14cbe5db639ec9fa6b7
Reviewed-on: https://go-review.googlesource.com/18674
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Keith Randall <khr@golang.org>
Reorder how register & stack allocation is done. We used to allocate
registers, then fix up merge edges, then allocate stack slots. This
lead to lots of unnecessary copies on merge edges:
v2 = LoadReg v1
v3 = StoreReg v2
If v1 and v3 are allocated to the same stack slot, then this code is
unnecessary. But at regalloc time we didn't know the homes of v1 and
v3.
To fix this problem, allocate all the stack slots before fixing up the
merge edges. That way, we know what stack slots values use so we know
what copies are required.
Use a good technique for shuffling values around on merge edges.
Improves performance of the go1 TimeParse benchmark by ~12%
Change-Id: I731f43e4ff1a7e0dc4cd4aa428fcdb97812b86fa
Reviewed-on: https://go-review.googlesource.com/17915
Reviewed-by: David Chase <drchase@google.com>
Spilling/restoring flag values is a pain to do during regalloc.
Instead, allocate the flag register in a separate pass. Regalloc then
operates normally on any flag recomputation instructions.
Change-Id: Ia1c3d9e6eff678861193093c0b48a00f90e4156b
Reviewed-on: https://go-review.googlesource.com/17694
Reviewed-by: David Chase <drchase@google.com>
Declare a function's arguments as having already been
spilled so their use just requires a restore.
Allow spill locations to be portions of larger objects the stack.
Required to load portions of compound input arguments.
Rename the memory input to InputMem. Use Arg for the
pre-spilled argument values.
Change-Id: I8fe2a03ffbba1022d98bfae2052b376b96d32dda
Reviewed-on: https://go-review.googlesource.com/16536
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
deadstore elimination currently works in a block, fusing before
performing dse eliminates ~1% more stores for make.bash
Change-Id: If5bbddac76bf42616938a8e8e84cb7441fa02f73
Reviewed-on: https://go-review.googlesource.com/16350
Reviewed-by: Keith Randall <khr@golang.org>
Rewite user nil checks as OpIsNonNil so our nil check elimination pass
can take advantage and remove redundant checks.
With make.bash this removes 10% more nilchecks (34110 vs 31088).
Change-Id: Ifb01d1b6d2d759f5e2a5aaa0470e1d5a2a680212
Reviewed-on: https://go-review.googlesource.com/14321
Reviewed-by: Keith Randall <khr@golang.org>
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Add timing/allocation information to each compiler pass for both the
console and html output.
Change-Id: I75833003b806a09b4fb1bbf63983258612cdb7b0
Reviewed-on: https://go-review.googlesource.com/14277
Reviewed-by: Keith Randall <khr@golang.org>
Decompose breaks compound objects up into pieces that can be
operated on by the target architecture. The decompose pass only
does phi ops, the rest is done by the rewrite rules in generic.rules.
Compound objects include strings,slices,interfaces,structs,arrays.
Arrays aren't decomposed because of indexing (we could support
constant indexes, but dynamic indexes can't be handled using SSA).
Structs will come in a subsequent CL.
TODO: after this pass we have lost the association between, e.g.,
a string's pointer and its size. It would be nice if we could keep
that information around for debugging info somehow.
Change-Id: I6379ab962a7beef62297d0f68c421f22aa0a0901
Reviewed-on: https://go-review.googlesource.com/13683
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
This is an initial implementation.
There are many rough edges and TODOs,
which will hopefully be polished out
with use.
Fixes#12071.
Change-Id: I1d6fd5a343063b5200623bceef2c2cfcc885794e
Reviewed-on: https://go-review.googlesource.com/13472
Reviewed-by: Keith Randall <khr@golang.org>
Implement ITAB, selecting the itable field of an interface.
Soften the lowering check to allow lowerings that leave
generic but dead ops behind. (The ITAB lowering does this.)
Change-Id: Icc84961dd4060d143602f001311aa1d8be0d7fc0
Reviewed-on: https://go-review.googlesource.com/13144
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Add label and goto checks and improve test coverage.
Implement OSWITCH and OSELECT.
Implement OBREAK and OCONTINUE.
Allow generation of code in dead blocks.
Change-Id: Ibebb7c98b4b2344f46d38db7c9dce058c56beaac
Reviewed-on: https://go-review.googlesource.com/12445
Reviewed-by: Keith Randall <khr@golang.org>
removePredecessor can change which blocks are live.
However, it cannot remove dead blocks from the function's
slice of blocks because removePredecessor may have been
called from within a function doing a walk of the blocks.
CL 11879 did not handle this correctly and broke the build.
To fix this, mark the block as dead but leave its actual
removal for a deadcode pass. Blocks that are dead must have
no successors, predecessors, values, or control values,
so they will generally be ignored by other passes.
To be safe, we add a deadcode pass after the opt pass,
which is the only other pass that calls removePredecessor.
Two alternatives that I considered and discarded:
(1) Make all call sites aware of the fact that removePrecessor
might make arbitrary changes to the list of blocks. This
will needlessly complicate callers.
(2) Handle the things that can go wrong in practice when
we encounter a dead-but-not-removed block. CL 11930 takes
this approach (and the tests are stolen from that CL).
However, this is just patching over the problem.
Change-Id: Icf0687b0a8148ce5e96b2988b668804411b05bd8
Reviewed-on: https://go-review.googlesource.com/12004
Reviewed-by: Todd Neal <todd@tneal.org>
Reviewed-by: Michael Matloob <michaelmatloob@gmail.com>
The nilcheckelim pass eliminates unnecessary nil checks.
The initial implementation removes redundant nil checks.
See the comments in nilcheck.go for ideas for future
improvements.
The efficacy of the cse pass has a significant impact
on this efficacy of this pass.
There are 886 nil checks in the parts of the standard
library that SSA can currently compile (~20%).
This pass eliminates 75 (~8.5%) of them.
As a data point, with a more aggressive but unsound
cse pass that treats many more types as identical,
this pass eliminates 115 (~13%) of the nil checks.
Change-Id: I13e567a39f5f6909fc33434d55c17a7e3884a704
Reviewed-on: https://go-review.googlesource.com/11430
Reviewed-by: Alan Donovan <adonovan@google.com>
The SSA implementation logs for three purposes:
* debug logging
* fatal errors
* unimplemented features
Separating these three uses lets us attempt an SSA
implementation for all functions, not just
_ssa functions. This turns the entire standard
library into a compilation test, and makes it
easy to figure out things like
"how much coverage does SSA have now" and
"what should we do next to get more coverage?".
Functions called _ssa are still special.
They log profusely by default and
the output of the SSA implementation
is used. For all other functions,
logging is off, and the implementation
is built and discarded, due to lack of
support for the runtime.
While we're here, fix a few minor bugs and
add some extra Unimplementeds to allow
all.bash to pass.
As of now, SSA handles 20.79% of the functions
in the standard library (689 of 3314).
The top missing features are:
10.03% 2597 SSA unimplemented: zero for type error not implemented
7.79% 2016 SSA unimplemented: addr: bad op DOTPTR
7.33% 1898 SSA unimplemented: unhandled expr EQ
6.10% 1579 SSA unimplemented: unhandled expr OROR
4.91% 1271 SSA unimplemented: unhandled expr NE
4.49% 1163 SSA unimplemented: unhandled expr LROT
4.00% 1036 SSA unimplemented: unhandled expr LEN
3.56% 923 SSA unimplemented: unhandled stmt CALLFUNC
2.37% 615 SSA unimplemented: zero for type []byte not implemented
1.90% 492 SSA unimplemented: unhandled stmt CALLMETH
1.74% 450 SSA unimplemented: unhandled expr CALLINTER
1.74% 450 SSA unimplemented: unhandled expr DOT
1.71% 444 SSA unimplemented: unhandled expr ANDAND
1.65% 426 SSA unimplemented: unhandled expr CLOSUREVAR
1.54% 400 SSA unimplemented: unhandled expr CALLMETH
1.51% 390 SSA unimplemented: unhandled stmt SWITCH
1.47% 380 SSA unimplemented: unhandled expr CONV
1.33% 345 SSA unimplemented: addr: bad op *
1.30% 336 SSA unimplemented: unhandled OLITERAL 6
Change-Id: I4ca07951e276714dc13c31de28640aead17a1be7
Reviewed-on: https://go-review.googlesource.com/11160
Reviewed-by: Keith Randall <khr@golang.org>
Eliminate dead stores. Dead stores are those which are
unconditionally followed by another store to the same location, with
no intervening load.
Just a simple intra-block implementation for now.
Change-Id: I2bf54e3a342608fc4e01edbe1b429e83f24764ab
Reviewed-on: https://go-review.googlesource.com/10386
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
Code generation is now done in genssa.
Also remove the asm field in opInfo. It's no longer used.
Change-Id: I65fffac267e138fd424b2ef8aa7ed79f0ebb63d5
Reviewed-on: https://go-review.googlesource.com/10539
Reviewed-by: Keith Randall <khr@golang.org>
Semi-regular merge of tip to dev.ssa.
Complicated a bit by the move of cmd/internal/* to cmd/compile/internal/*.
Change-Id: I1c66d3c29bb95cce4a53c5a3476373aa5245303d