Keeping the start state of each block around costs more than just
recomputing them as necessary, especially because many blocks only have
one predecessor and don't need any merging at all. Stop storing the
start state, and reuse predecessors' end states as much as conveniently
possible.
Change-Id: I549bad9e1a35af76a974e46fe69f74cd4dce873b
Reviewed-on: https://go-review.googlesource.com/92399
Reviewed-by: David Chase <drchase@google.com>
Because getStackOffset is a function pointer, the compiler assumes that
its arguments escape. Pass a value instead to avoid heap allocations.
Change-Id: Ib94e5941847f134cd00e873040a4d7fcf15ced26
Reviewed-on: https://go-review.googlesource.com/92397
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Using bits.TrailingZeroes instead of iterating over each bit is a small
but easy win for the common case of only one or two registers being set.
I copied in the implementation for use with pre-1.9 bootstraps.
Change-Id: Ieaa768554d7d5239a5617fbf34f1ee0b32ce1de5
Reviewed-on: https://go-review.googlesource.com/92395
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
A statement like
foo = bar + qux
might compile to
AX := AX + BX
resulting in a regkill for AX before this instruction.
The buggy behavior is to kill AX "at" this instruction,
before it has executed. (Code generation of no-instruction
values like RegKills applies their effects at the
next actual instruction emitted).
However, bar is still associated with AX until after the
instruction executes, so the effect of the regkill must
occur at the boundary between this instruction and the
next. Similarly, the new value bound to AX is not visible
until this instruction executes (and in the case of values
that require multiple instructions in code generation, until
all of them have executed).
The ranges are adjusted so that a value's start occurs
at the next following instruction after its evaluation,
and the end occurs after (execution of) the first
instruction following the end of the lifetime as a value.
(Notice the asymmetry; the entire value must be finished
before it is visible, but execution of a single instruction
invalidates. However, the value *is* visible before that
next instruction executes).
The test was adjusted to make it insensitive to the result
numbering for variables printed by gdb, since that is not
relevant to the test and makes the differences introduced
by small changes larger than necessary/useful.
The test was also improved to present variable probes
more intuitively, and also to allow explicit indication
of "this variable was optimized out"
Change-Id: I39453eead8399e6bb05ebd957289b112d1100c0e
Reviewed-on: https://go-review.googlesource.com/74090
Run-TryBot: David Chase <drchase@google.com>
Reviewed-by: Cherry Zhang <cherryyz@google.com>
Instead of the hand-written control flow analysis in debug info
generation, use a reverse postorder traversal, which is basically the
same thing. It should be slightly faster.
More importantly, the previous version simply gave up in the case of
non-reducible functions, and produced output that caused a later stage
to crash. It turns out that there's a non-reducible function in
compress/flate, so that wasn't a theoretical issue.
With this change, all blocks will be visited, even for non-reducible
functions.
Change-Id: Id47536764ee93203c6b4105a1a3013fe3265aa12
Reviewed-on: https://go-review.googlesource.com/73110
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
When variables need to be spilled to the stack, they usually get their
own stack slot. Local variables have a slot allocated if they need one,
and arguments start out on the stack. Before this CL, the debug
information made the assumption that this was always the case, and so
didn't bother storing an actual stack offset during SSA analysis.
There's at least one case where this isn't true: variables that alias
arguments. Since the argument is the source of the variable, the
variable will begin its life on the stack in the argument's stack slot,
not its own. Therefore the debug info needs to track the actual stack
slot for each location entry.
No detectable performance change, despite the O(N) loop in getHomeSlot.
Change-Id: I2701adb7eddee17d4524336cb7aa6786e8f32b46
Reviewed-on: https://go-review.googlesource.com/67231
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Just to get rid of lots of .Name() stutter in printf calls.
Change-Id: I86cf00b3f7b2172387a1c6a7f189c1897fab6300
Reviewed-on: https://go-review.googlesource.com/56630
Run-TryBot: Keith Randall <khr@golang.org>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>