[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Copyright 2017 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
2020-08-17 11:28:26 +02:00
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
package ssa
import (
2021-04-27 12:40:43 -04:00
"cmd/compile/internal/abi"
2022-03-29 13:16:35 -04:00
"cmd/compile/internal/abt"
2020-11-17 21:47:56 -05:00
"cmd/compile/internal/ir"
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
"cmd/compile/internal/types"
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
"cmd/internal/dwarf"
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
"cmd/internal/obj"
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
"cmd/internal/src"
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
"encoding/hex"
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
"fmt"
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
"internal/buildcfg"
2018-05-16 11:21:18 +01:00
"math/bits"
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
"sort"
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
"strings"
)
type SlotID int32
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
type VarID int32
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// A FuncDebug contains all the debug information for the variables in a
2022-03-29 13:16:35 -04:00
// function. Variables are identified by their LocalSlot, which may be
// the result of decomposing a larger variable.
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
type FuncDebug struct {
2017-09-29 14:14:03 -04:00
// Slots is all the slots used in the debug info, indexed by their SlotID.
2018-02-05 17:04:44 -05:00
Slots [ ] LocalSlot
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The user variables, indexed by VarID.
2020-12-06 18:28:49 -08:00
Vars [ ] * ir . Name
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The slots that make up each variable, indexed by VarID.
VarSlots [ ] [ ] SlotID
// The location list data, indexed by VarID. Must be processed by PutLocationList.
LocationLists [ ] [ ] byte
2021-11-10 10:44:00 -05:00
// Register-resident output parameters for the function. This is filled in at
// SSA generation time.
RegOutputParams [ ] * ir . Name
2022-09-23 17:31:19 +02:00
// Variable declarations that were removed during optimization
OptDcl [ ] * ir . Name
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Filled in by the user. Translates Block and Value ID to PC.
GetPC func ( ID , ID ) int64
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
type BlockDebug struct {
2022-03-29 13:16:35 -04:00
// State at the start and end of the block. These are initialized,
// and updated from new information that flows on back edges.
startState , endState abt . T
// Use these to avoid excess work in the merge. If none of the
// predecessors has changed since the last check, the old answer is
// still good.
lastCheckedTime , lastChangedTime int32
2018-02-05 17:04:44 -05:00
// Whether the block had any changes to user variables at all.
relevant bool
2022-03-29 13:16:35 -04:00
// false until the block has been processed at least once. This
// affects how the merge is done; the goal is to maximize sharing
// and avoid allocation.
everProcessed bool
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A liveSlot is a slot that's live in loc at entry/exit of a block.
type liveSlot struct {
2022-03-29 13:16:35 -04:00
VarLoc
}
2018-01-31 17:56:14 -05:00
2022-03-29 13:16:35 -04:00
func ( ls * liveSlot ) String ( ) string {
return fmt . Sprintf ( "0x%x.%d.%d" , ls . Registers , ls . stackOffsetValue ( ) , int32 ( ls . StackOffset ) & 1 )
2018-01-31 17:56:14 -05:00
}
func ( loc liveSlot ) absent ( ) bool {
return loc . Registers == 0 && ! loc . onStack ( )
}
2022-03-29 13:16:35 -04:00
// StackOffset encodes whether a value is on the stack and if so, where.
// It is a 31-bit integer followed by a presence flag at the low-order
// bit.
2018-01-31 17:56:14 -05:00
type StackOffset int32
func ( s StackOffset ) onStack ( ) bool {
return s != 0
}
func ( s StackOffset ) stackOffsetValue ( ) int32 {
return int32 ( s ) >> 1
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// stateAtPC is the current state of all variables at some point.
type stateAtPC struct {
// The location of each known slot, indexed by SlotID.
slots [ ] VarLoc
// The slots present in each register, indexed by register number.
registers [ ] [ ] SlotID
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// reset fills state with the live variables from live.
2022-03-29 13:16:35 -04:00
func ( state * stateAtPC ) reset ( live abt . T ) {
2018-01-22 17:06:26 -05:00
slots , registers := state . slots , state . registers
for i := range slots {
slots [ i ] = VarLoc { }
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-01-22 17:06:26 -05:00
for i := range registers {
registers [ i ] = registers [ i ] [ : 0 ]
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2022-03-29 13:16:35 -04:00
for it := live . Iterator ( ) ; ! it . Done ( ) ; {
k , d := it . Next ( )
live := d . ( * liveSlot )
slots [ k ] = live . VarLoc
if live . VarLoc . Registers == 0 {
2018-01-22 17:06:26 -05:00
continue
}
2018-01-29 16:09:11 -05:00
2022-03-29 13:16:35 -04:00
mask := uint64 ( live . VarLoc . Registers )
2018-01-29 16:09:11 -05:00
for {
if mask == 0 {
break
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-05-16 11:21:18 +01:00
reg := uint8 ( bits . TrailingZeros64 ( mask ) )
2018-01-29 16:09:11 -05:00
mask &^= 1 << reg
2022-03-29 13:16:35 -04:00
registers [ reg ] = append ( registers [ reg ] , SlotID ( k ) )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
2018-01-22 17:06:26 -05:00
state . slots , state . registers = slots , registers
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
2018-01-31 15:08:05 -05:00
func ( s * debugState ) LocString ( loc VarLoc ) string {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if loc . absent ( ) {
return "<nil>"
}
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
var storage [ ] string
2018-01-31 17:56:14 -05:00
if loc . onStack ( ) {
2022-03-29 13:16:35 -04:00
storage = append ( storage , fmt . Sprintf ( "@%+d" , loc . stackOffsetValue ( ) ) )
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
2018-01-29 16:09:11 -05:00
mask := uint64 ( loc . Registers )
for {
if mask == 0 {
break
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
2018-05-16 11:21:18 +01:00
reg := uint8 ( bits . TrailingZeros64 ( mask ) )
2018-01-29 16:09:11 -05:00
mask &^= 1 << reg
2018-01-31 15:08:05 -05:00
storage = append ( storage , s . registers [ reg ] . String ( ) )
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return strings . Join ( storage , "," )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A VarLoc describes the storage for part of a user variable.
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
type VarLoc struct {
// The registers this variable is available in. There can be more than
// one in various situations, e.g. it's being moved between registers.
Registers RegisterSet
2018-01-31 17:56:14 -05:00
StackOffset
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-01-31 17:56:14 -05:00
func ( loc VarLoc ) absent ( ) bool {
return loc . Registers == 0 && ! loc . onStack ( )
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
2022-03-29 13:16:35 -04:00
func ( loc VarLoc ) intersect ( other VarLoc ) VarLoc {
if ! loc . onStack ( ) || ! other . onStack ( ) || loc . StackOffset != other . StackOffset {
loc . StackOffset = 0
}
loc . Registers &= other . Registers
return loc
}
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
var BlockStart = & Value {
ID : - 10000 ,
Op : OpInvalid ,
2020-12-07 17:15:44 -08:00
Aux : StringToAux ( "BlockStart" ) ,
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
var BlockEnd = & Value {
ID : - 20000 ,
Op : OpInvalid ,
2020-12-07 17:15:44 -08:00
Aux : StringToAux ( "BlockEnd" ) ,
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2021-04-28 18:26:54 -04:00
var FuncEnd = & Value {
ID : - 30000 ,
Op : OpInvalid ,
Aux : StringToAux ( "FuncEnd" ) ,
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// RegisterSet is a bitmap of registers, indexed by Register.num.
type RegisterSet uint64
2022-03-29 13:16:35 -04:00
// logf prints debug-specific logging to stdout (always stdout) if the
// current function is tagged by GOSSAFUNC (for ssa output directed
// either to stdout or html).
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
func ( s * debugState ) logf ( msg string , args ... interface { } ) {
2018-10-10 16:08:24 -04:00
if s . f . PrintOrHtmlSSA {
fmt . Printf ( msg , args ... )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
type debugState struct {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// See FuncDebug.
2018-02-05 17:04:44 -05:00
slots [ ] LocalSlot
2020-12-06 18:28:49 -08:00
vars [ ] * ir . Name
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
varSlots [ ] [ ] SlotID
2018-02-05 16:55:54 -05:00
lists [ ] [ ] byte
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The user variable that each slot rolls up to, indexed by SlotID.
slotVars [ ] VarID
2022-03-29 13:16:35 -04:00
f * Func
loggingLevel int
convergeCount int // testing; iterate over block debug state this many times
registers [ ] Register
stackOffset func ( LocalSlot ) int32
ctxt * obj . Link
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The names (slots) associated with each value, indexed by Value ID.
valueNames [ ] [ ] SlotID
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// The current state of whatever analysis is running.
currentState stateAtPC
2018-02-02 16:26:58 -05:00
changedVars * sparseSet
2022-03-29 13:16:35 -04:00
changedSlots * sparseSet
2018-02-05 16:55:54 -05:00
// The pending location list entry for each user variable, indexed by VarID.
pendingEntries [ ] pendingEntry
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2022-03-29 13:16:35 -04:00
varParts map [ * ir . Name ] [ ] SlotID
blockDebug [ ] BlockDebug
pendingSlotLocs [ ] VarLoc
partsByVarOffset sort . Interface
2018-02-05 17:04:44 -05:00
}
2018-01-22 17:06:26 -05:00
2018-02-05 17:04:44 -05:00
func ( state * debugState ) initializeCache ( f * Func , numVars , numSlots int ) {
2018-01-22 17:06:26 -05:00
// One blockDebug per block. Initialized in allocBlock.
2018-02-05 17:04:44 -05:00
if cap ( state . blockDebug ) < f . NumBlocks ( ) {
state . blockDebug = make ( [ ] BlockDebug , f . NumBlocks ( ) )
} else {
// This local variable, and the ones like it below, enable compiler
// optimizations. Don't inline them.
b := state . blockDebug [ : f . NumBlocks ( ) ]
for i := range b {
b [ i ] = BlockDebug { }
}
2018-01-22 17:06:26 -05:00
}
// A list of slots per Value. Reuse the previous child slices.
2018-02-05 17:04:44 -05:00
if cap ( state . valueNames ) < f . NumValues ( ) {
old := state . valueNames
state . valueNames = make ( [ ] [ ] SlotID , f . NumValues ( ) )
copy ( state . valueNames , old )
2018-01-22 17:06:26 -05:00
}
2018-02-05 17:04:44 -05:00
vn := state . valueNames [ : f . NumValues ( ) ]
2018-01-22 17:06:26 -05:00
for i := range vn {
vn [ i ] = vn [ i ] [ : 0 ]
}
// Slot and register contents for currentState. Cleared by reset().
2018-02-05 17:04:44 -05:00
if cap ( state . currentState . slots ) < numSlots {
state . currentState . slots = make ( [ ] VarLoc , numSlots )
} else {
state . currentState . slots = state . currentState . slots [ : numSlots ]
2018-01-22 17:06:26 -05:00
}
2018-02-05 17:04:44 -05:00
if cap ( state . currentState . registers ) < len ( state . registers ) {
state . currentState . registers = make ( [ ] [ ] SlotID , len ( state . registers ) )
} else {
state . currentState . registers = state . currentState . registers [ : len ( state . registers ) ]
2018-01-22 17:06:26 -05:00
}
// A relatively small slice, but used many times as the return from processValue.
2018-02-05 17:04:44 -05:00
state . changedVars = newSparseSet ( numVars )
2022-03-29 13:16:35 -04:00
state . changedSlots = newSparseSet ( numSlots )
2018-01-22 17:06:26 -05:00
// A pending entry per user variable, with space to track each of its pieces.
2018-02-05 17:04:44 -05:00
numPieces := 0
2018-01-31 16:30:16 -05:00
for i := range state . varSlots {
2018-02-05 17:04:44 -05:00
numPieces += len ( state . varSlots [ i ] )
2018-01-31 16:30:16 -05:00
}
2018-02-05 17:04:44 -05:00
if cap ( state . pendingSlotLocs ) < numPieces {
state . pendingSlotLocs = make ( [ ] VarLoc , numPieces )
} else {
psl := state . pendingSlotLocs [ : numPieces ]
for i := range psl {
psl [ i ] = VarLoc { }
}
2018-01-22 17:06:26 -05:00
}
2018-02-05 17:04:44 -05:00
if cap ( state . pendingEntries ) < numVars {
state . pendingEntries = make ( [ ] pendingEntry , numVars )
2018-01-22 17:06:26 -05:00
}
2018-02-05 17:04:44 -05:00
pe := state . pendingEntries [ : numVars ]
2018-01-31 16:30:16 -05:00
freePieceIdx := 0
for varID , slots := range state . varSlots {
2018-01-22 17:06:26 -05:00
pe [ varID ] = pendingEntry {
2018-02-05 17:04:44 -05:00
pieces : state . pendingSlotLocs [ freePieceIdx : freePieceIdx + len ( slots ) ] ,
2018-01-22 17:06:26 -05:00
}
2018-01-31 16:30:16 -05:00
freePieceIdx += len ( slots )
2018-01-22 17:06:26 -05:00
}
2018-02-05 16:55:54 -05:00
state . pendingEntries = pe
2018-02-05 17:04:44 -05:00
if cap ( state . lists ) < numVars {
state . lists = make ( [ ] [ ] byte , numVars )
} else {
state . lists = state . lists [ : numVars ]
for i := range state . lists {
state . lists [ i ] = nil
}
}
2018-01-22 17:06:26 -05:00
}
func ( state * debugState ) allocBlock ( b * Block ) * BlockDebug {
2018-02-05 17:04:44 -05:00
return & state . blockDebug [ b . ID ]
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
func ( s * debugState ) blockEndStateString ( b * BlockDebug ) string {
2018-01-31 17:56:14 -05:00
endState := stateAtPC { slots : make ( [ ] VarLoc , len ( s . slots ) ) , registers : make ( [ ] [ ] SlotID , len ( s . registers ) ) }
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
endState . reset ( b . endState )
2018-01-31 15:08:05 -05:00
return s . stateString ( endState )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-01-31 15:08:05 -05:00
func ( s * debugState ) stateString ( state stateAtPC ) string {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
var strs [ ] string
for slotID , loc := range state . slots {
if ! loc . absent ( ) {
2018-01-31 15:08:05 -05:00
strs = append ( strs , fmt . Sprintf ( "\t%v = %v\n" , s . slots [ slotID ] , s . LocString ( loc ) ) )
2017-09-29 14:14:03 -04:00
}
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
strs = append ( strs , "\n" )
for reg , slots := range state . registers {
if len ( slots ) != 0 {
var slotStrs [ ] string
for _ , slot := range slots {
slotStrs = append ( slotStrs , s . slots [ slot ] . String ( ) )
}
strs = append ( strs , fmt . Sprintf ( "\t%v = %v\n" , & s . registers [ reg ] , slotStrs ) )
}
}
if len ( strs ) == 1 {
return "(no vars)\n"
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return strings . Join ( strs , "" )
cmd/compile: cover control flow insns in location lists
The information that's used to generate DWARF location lists is very
ssa.Value centric; it uses Values as start and end coordinates to define
ranges. That mostly works fine, but control flow instructions don't come
from Values, so the ranges couldn't cover them.
Control flow instructions are generated when the SSA representation is
converted to assembly, so that's the best place to extend the ranges
to cover them. (Before that, there's nothing to refer to, and afterward
the boundaries between blocks have been lost.) That requires block
information in the debugInfo type, which then flows down to make
everything else awkward. On the plus side, there's a little less copying
slices around than there used to be, so it should be a little faster.
Previously, the ranges for empty blocks were not very meaningful. That
was fine, because they had no Values to cover, so no debug information
was generated for them. But they do have control flow instructions
(that's why they exist) and so now it's important that the information
be correct. Introduce two sentinel values, BlockStart and BlockEnd, that
denote the boundary of a block, even if the block is empty. BlockEnd
replaces the previous SurvivedBlock flag.
There's one more problem: the last instruction in the function will be a
control flow instruction, so any live ranges need to be extended past
it. But there's no instruction after it to use as the end of the range.
Instead, leave the EndProg field of those ranges as nil and fix it up to
point to past the end of the assembled text at the very last moment.
Change-Id: I81f884020ff36fd6fe8d7888fc57c99412c4245b
Reviewed-on: https://go-review.googlesource.com/63010
Reviewed-by: Alessandro Arzilli <alessandro.arzilli@gmail.com>
Reviewed-by: David Chase <drchase@google.com>
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
2017-09-11 14:28:34 -04:00
}
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
// slotCanonicalizer is a table used to lookup and canonicalize
// LocalSlot's in a type insensitive way (e.g. taking into account the
// base name, offset, and width of the slot, but ignoring the slot
// type).
type slotCanonicalizer struct {
slmap map [ slotKey ] SlKeyIdx
slkeys [ ] LocalSlot
}
func newSlotCanonicalizer ( ) * slotCanonicalizer {
return & slotCanonicalizer {
slmap : make ( map [ slotKey ] SlKeyIdx ) ,
slkeys : [ ] LocalSlot { LocalSlot { N : nil } } ,
}
}
type SlKeyIdx uint32
const noSlot = SlKeyIdx ( 0 )
// slotKey is a type-insensitive encapsulation of a LocalSlot; it
// is used to key a map within slotCanonicalizer.
type slotKey struct {
name * ir . Name
offset int64
width int64
splitOf SlKeyIdx // idx in slkeys slice in slotCanonicalizer
splitOffset int64
}
// lookup looks up a LocalSlot in the slot canonicalizer "sc", returning
// a canonical index for the slot, and adding it to the table if need
// be. Return value is the canonical slot index, and a boolean indicating
// whether the slot was found in the table already (TRUE => found).
func ( sc * slotCanonicalizer ) lookup ( ls LocalSlot ) ( SlKeyIdx , bool ) {
split := noSlot
if ls . SplitOf != nil {
split , _ = sc . lookup ( * ls . SplitOf )
}
k := slotKey {
2021-08-26 12:11:14 -07:00
name : ls . N , offset : ls . Off , width : ls . Type . Size ( ) ,
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
splitOf : split , splitOffset : ls . SplitOffset ,
}
if idx , ok := sc . slmap [ k ] ; ok {
return idx , true
}
rv := SlKeyIdx ( len ( sc . slkeys ) )
sc . slkeys = append ( sc . slkeys , ls )
sc . slmap [ k ] = rv
return rv , false
}
func ( sc * slotCanonicalizer ) canonSlot ( idx SlKeyIdx ) LocalSlot {
return sc . slkeys [ idx ]
}
// PopulateABIInRegArgOps examines the entry block of the function
// and looks for incoming parameters that have missing or partial
// OpArg{Int,Float}Reg values, inserting additional values in
// cases where they are missing. Example:
//
2022-02-03 14:12:08 -05:00
// func foo(s string, used int, notused int) int {
// return len(s) + used
// }
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
//
// In the function above, the incoming parameter "used" is fully live,
// "notused" is not live, and "s" is partially live (only the length
// field of the string is used). At the point where debug value
// analysis runs, we might expect to see an entry block with:
//
2022-02-03 14:12:08 -05:00
// b1:
// v4 = ArgIntReg <uintptr> {s+8} [0] : BX
// v5 = ArgIntReg <int> {used} [0] : CX
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
//
// While this is an accurate picture of the live incoming params,
// we also want to have debug locations for non-live params (or
// their non-live pieces), e.g. something like
//
2022-02-03 14:12:08 -05:00
// b1:
// v9 = ArgIntReg <*uint8> {s+0} [0] : AX
// v4 = ArgIntReg <uintptr> {s+8} [0] : BX
// v5 = ArgIntReg <int> {used} [0] : CX
// v10 = ArgIntReg <int> {unused} [0] : DI
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
//
// This function examines the live OpArg{Int,Float}Reg values and
// synthesizes new (dead) values for the non-live params or the
// non-live pieces of partially live params.
func PopulateABIInRegArgOps ( f * Func ) {
pri := f . ABISelf . ABIAnalyzeFuncType ( f . Type . FuncType ( ) )
// When manufacturing new slots that correspond to splits of
// composite parameters, we want to avoid creating a new sub-slot
// that differs from some existing sub-slot only by type, since
// the debug location analysis will treat that slot as a separate
// entity. To achieve this, create a lookup table of existing
// slots that is type-insenstitive.
sc := newSlotCanonicalizer ( )
for _ , sl := range f . Names {
sc . lookup ( * sl )
}
// Add slot -> value entry to f.NamedValues if not already present.
addToNV := func ( v * Value , sl LocalSlot ) {
values , ok := f . NamedValues [ sl ]
if ! ok {
// Haven't seen this slot yet.
sla := f . localSlotAddr ( sl )
f . Names = append ( f . Names , sla )
} else {
for _ , ev := range values {
if v == ev {
return
}
}
}
values = append ( values , v )
f . NamedValues [ sl ] = values
}
newValues := [ ] * Value { }
abiRegIndexToRegister := func ( reg abi . RegIndex ) int8 {
i := f . ABISelf . FloatIndexFor ( reg )
if i >= 0 { // float PR
return f . Config . floatParamRegs [ i ]
} else {
return f . Config . intParamRegs [ reg ]
}
}
// Helper to construct a new OpArg{Float,Int}Reg op value.
var pos src . XPos
if len ( f . Entry . Values ) != 0 {
pos = f . Entry . Values [ 0 ] . Pos
}
synthesizeOpIntFloatArg := func ( n * ir . Name , t * types . Type , reg abi . RegIndex , sl LocalSlot ) * Value {
aux := & AuxNameOffset { n , sl . Off }
op , auxInt := ArgOpAndRegisterFor ( reg , f . ABISelf )
v := f . newValueNoBlock ( op , t , pos )
v . AuxInt = auxInt
v . Aux = aux
v . Args = nil
v . Block = f . Entry
newValues = append ( newValues , v )
addToNV ( v , sl )
f . setHome ( v , & f . Config . registers [ abiRegIndexToRegister ( reg ) ] )
return v
}
// Make a pass through the entry block looking for
// OpArg{Int,Float}Reg ops. Record the slots they use in a table
// ("sc"). We use a type-insensitive lookup for the slot table,
// since the type we get from the ABI analyzer won't always match
// what the compiler uses when creating OpArg{Int,Float}Reg ops.
for _ , v := range f . Entry . Values {
if v . Op == OpArgIntReg || v . Op == OpArgFloatReg {
aux := v . Aux . ( * AuxNameOffset )
sl := LocalSlot { N : aux . Name , Type : v . Type , Off : aux . Offset }
// install slot in lookup table
idx , _ := sc . lookup ( sl )
// add to f.NamedValues if not already present
addToNV ( v , sc . canonSlot ( idx ) )
} else if v . Op . IsCall ( ) {
// if we hit a call, we've gone too far.
break
}
}
// Now make a pass through the ABI in-params, looking for params
// or pieces of params that we didn't encounter in the loop above.
for _ , inp := range pri . InParams ( ) {
if ! isNamedRegParam ( inp ) {
continue
}
n := inp . Name . ( * ir . Name )
// Param is spread across one or more registers. Walk through
// each piece to see whether we've seen an arg reg op for it.
types , offsets := inp . RegisterTypesAndOffsets ( )
for k , t := range types {
// Note: this recipe for creating a LocalSlot is designed
// to be compatible with the one used in expand_calls.go
// as opposed to decompose.go. The expand calls code just
// takes the base name and creates an offset into it,
// without using the SplitOf/SplitOffset fields. The code
// in decompose.go does the opposite -- it creates a
// LocalSlot object with "Off" set to zero, but with
// SplitOf pointing to a parent slot, and SplitOffset
// holding the offset into the parent object.
pieceSlot := LocalSlot { N : n , Type : t , Off : offsets [ k ] }
// Look up this piece to see if we've seen a reg op
// for it. If not, create one.
_ , found := sc . lookup ( pieceSlot )
if ! found {
// This slot doesn't appear in the map, meaning it
// corresponds to an in-param that is not live, or
// a portion of an in-param that is not live/used.
// Add a new dummy OpArg{Int,Float}Reg for it.
synthesizeOpIntFloatArg ( n , t , inp . Registers [ k ] ,
pieceSlot )
}
}
}
// Insert the new values into the head of the block.
f . Entry . Values = append ( newValues , f . Entry . Values ... )
}
2022-03-29 13:16:35 -04:00
// BuildFuncDebug debug information for f, placing the results
// in "rval". f must be fully processed, so that each Value is where it
// will be when machine code is emitted.
func BuildFuncDebug ( ctxt * obj . Link , f * Func , loggingLevel int , stackOffset func ( LocalSlot ) int32 , rval * FuncDebug ) {
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if f . RegAlloc == nil {
f . Fatalf ( "BuildFuncDebug on func %v that has not been fully processed" , f )
}
2018-02-05 17:04:44 -05:00
state := & f . Cache . debugState
2022-03-29 13:16:35 -04:00
state . loggingLevel = loggingLevel % 1000
// A specific number demands exactly that many iterations. Under
// particular circumstances it make require more than the total of
// 2 passes implied by a single run through liveness and a single
// run through location list generation.
state . convergeCount = loggingLevel / 1000
2018-02-05 17:04:44 -05:00
state . f = f
state . registers = f . Config . registers
state . stackOffset = stackOffset
state . ctxt = ctxt
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
cmd/compile: improve debug locations for partially live in-params
During DWARF debug location generation, as a preamble to the main data
flow analysis, examine the function entry block to look for in-params
arriving in registers that are partially or completely dead, and
insert new OpArg{Int,Float}Reg values for the dead or partially-dead
pieces. In addition, add entries to the f.NamedValues table for
incoming live register-resident params that don't already have
entries. This helps create better/saner DWARF location expressions for
params. Example:
func foo(s string, used int, notused int) int {
return len(s) + used
}
When optimization is complete for this function, the parameter
"notused" is completely dead, meaning that there is no entry for it in
the f.NamedValues table (which then means we don't emit a DWARF
variable location expression for it in the function enty block). In
addition, since only the length field of "s" is used, there is no
DWARF location expression for the other component of "s", leading to
degraded DWARF.
There are still problems/issues with DWARF location generation, but
this does improve things with respect to being able to print the
values of incoming parameters when stopped in the debugger at the
entry point of a function (when optimization is enabled).
Updates #40724.
Change-Id: I5bb5253648942f9fd33b081fe1a5a36208e75785
Reviewed-on: https://go-review.googlesource.com/c/go/+/322631
Trust: Than McIntosh <thanm@google.com>
Run-TryBot: Than McIntosh <thanm@google.com>
TryBot-Result: Go Bot <gobot@golang.org>
Reviewed-by: Cherry Mui <cherryyz@google.com>
2021-05-10 07:31:57 -04:00
if buildcfg . Experiment . RegabiArgs {
PopulateABIInRegArgOps ( f )
}
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 0 {
2018-05-22 18:00:39 -04:00
state . logf ( "Generating location lists for function %q\n" , f . Name )
}
2018-02-05 17:04:44 -05:00
if state . varParts == nil {
2020-12-06 18:28:49 -08:00
state . varParts = make ( map [ * ir . Name ] [ ] SlotID )
2018-02-05 17:04:44 -05:00
} else {
for n := range state . varParts {
delete ( state . varParts , n )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2018-02-05 17:04:44 -05:00
// Recompose any decomposed variables, and establish the canonical
// IDs for each var and slot by filling out state.vars and state.slots.
state . slots = state . slots [ : 0 ]
state . vars = state . vars [ : 0 ]
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for i , slot := range f . Names {
2021-04-21 10:55:42 -04:00
state . slots = append ( state . slots , * slot )
2020-11-17 21:47:56 -05:00
if ir . IsSynthetic ( slot . N ) {
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
continue
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2021-04-21 10:55:42 -04:00
topSlot := slot
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for topSlot . SplitOf != nil {
topSlot = topSlot . SplitOf
}
2018-02-05 17:04:44 -05:00
if _ , ok := state . varParts [ topSlot . N ] ; ! ok {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state . vars = append ( state . vars , topSlot . N )
}
2018-02-05 17:04:44 -05:00
state . varParts [ topSlot . N ] = append ( state . varParts [ topSlot . N ] , SlotID ( i ) )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2018-03-13 16:14:52 -04:00
// Recreate the LocalSlot for each stack-only variable.
// This would probably be better as an output from stackframe.
for _ , b := range f . Blocks {
for _ , v := range b . Values {
2022-07-22 15:08:07 -07:00
if v . Op == OpVarDef {
2020-12-06 12:02:22 -08:00
n := v . Aux . ( * ir . Name )
2020-11-17 21:47:56 -05:00
if ir . IsSynthetic ( n ) {
2018-03-13 16:14:52 -04:00
continue
}
if _ , ok := state . varParts [ n ] ; ! ok {
slot := LocalSlot { N : n , Type : v . Type , Off : 0 }
state . slots = append ( state . slots , slot )
state . varParts [ n ] = [ ] SlotID { SlotID ( len ( state . slots ) - 1 ) }
state . vars = append ( state . vars , n )
}
}
}
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Fill in the var<->slot mappings.
2018-02-05 17:04:44 -05:00
if cap ( state . varSlots ) < len ( state . vars ) {
state . varSlots = make ( [ ] [ ] SlotID , len ( state . vars ) )
} else {
state . varSlots = state . varSlots [ : len ( state . vars ) ]
for i := range state . varSlots {
state . varSlots [ i ] = state . varSlots [ i ] [ : 0 ]
}
}
if cap ( state . slotVars ) < len ( state . slots ) {
state . slotVars = make ( [ ] VarID , len ( state . slots ) )
} else {
state . slotVars = state . slotVars [ : len ( state . slots ) ]
}
2018-02-05 16:55:54 -05:00
2018-02-05 17:04:44 -05:00
if state . partsByVarOffset == nil {
state . partsByVarOffset = & partsByVarOffset { }
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for varID , n := range state . vars {
2018-02-05 17:04:44 -05:00
parts := state . varParts [ n ]
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state . varSlots [ varID ] = parts
for _ , slotID := range parts {
state . slotVars [ slotID ] = VarID ( varID )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2018-02-05 17:04:44 -05:00
* state . partsByVarOffset . ( * partsByVarOffset ) = partsByVarOffset { parts , state . slots }
sort . Sort ( state . partsByVarOffset )
2018-01-22 17:06:26 -05:00
}
2018-02-05 17:04:44 -05:00
state . initializeCache ( f , len ( state . varParts ) , len ( state . slots ) )
2018-01-22 17:06:26 -05:00
for i , slot := range f . Names {
2020-11-17 21:47:56 -05:00
if ir . IsSynthetic ( slot . N ) {
2018-01-22 17:06:26 -05:00
continue
}
2021-04-21 10:55:42 -04:00
for _ , value := range f . NamedValues [ * slot ] {
2018-01-22 17:06:26 -05:00
state . valueNames [ value . ID ] = append ( state . valueNames [ value . ID ] , SlotID ( i ) )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
blockLocs := state . liveness ( )
2018-02-05 16:55:54 -05:00
state . buildLocationLists ( blockLocs )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2021-11-10 10:44:00 -05:00
// Populate "rval" with what we've computed.
rval . Slots = state . slots
rval . VarSlots = state . varSlots
rval . Vars = state . vars
rval . LocationLists = state . lists
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// liveness walks the function in control flow order, calculating the start
// and end state of each block.
func ( state * debugState ) liveness ( ) [ ] * BlockDebug {
blockLocs := make ( [ ] * BlockDebug , state . f . NumBlocks ( ) )
2022-03-29 13:16:35 -04:00
counterTime := int32 ( 1 )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2017-10-19 18:59:41 -04:00
// Reverse postorder: visit a block after as many as possible of its
// predecessors have been visited.
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
po := state . f . Postorder ( )
2022-03-29 13:16:35 -04:00
converged := false
// The iteration rule is that by default, run until converged, but
// if a particular iteration count is specified, run that many
// iterations, no more, no less. A count is specified as the
// thousands digit of the location lists debug flag,
// e.g. -d=locationlists=4000
keepGoing := func ( k int ) bool {
if state . convergeCount == 0 {
return ! converged
}
return k < state . convergeCount
}
for k := 0 ; keepGoing ( k ) ; k ++ {
if state . loggingLevel > 0 {
state . logf ( "Liveness pass %d\n" , k )
}
converged = true
for i := len ( po ) - 1 ; i >= 0 ; i -- {
b := po [ i ]
locs := blockLocs [ b . ID ]
if locs == nil {
locs = state . allocBlock ( b )
blockLocs [ b . ID ] = locs
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2022-03-29 13:16:35 -04:00
// Build the starting state for the block from the final
// state of its predecessors.
startState , blockChanged := state . mergePredecessors ( b , blockLocs , nil , false )
locs . lastCheckedTime = counterTime
counterTime ++
if state . loggingLevel > 1 {
state . logf ( "Processing %v, block changed %v, initial state:\n%v" , b , blockChanged , state . stateString ( state . currentState ) )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2022-03-29 13:16:35 -04:00
if blockChanged {
// If the start did not change, then the old endState is good
converged = false
changed := false
state . changedSlots . clear ( )
// Update locs/registers with the effects of each Value.
for _ , v := range b . Values {
slots := state . valueNames [ v . ID ]
// Loads and stores inherit the names of their sources.
var source * Value
switch v . Op {
case OpStoreReg :
source = v . Args [ 0 ]
case OpLoadReg :
switch a := v . Args [ 0 ] ; a . Op {
case OpArg , OpPhi :
source = a
case OpStoreReg :
source = a . Args [ 0 ]
default :
if state . loggingLevel > 1 {
state . logf ( "at %v: load with unexpected source op: %v (%v)\n" , v , a . Op , a )
}
}
}
// Update valueNames with the source so that later steps
// don't need special handling.
if source != nil && k == 0 {
// limit to k == 0 otherwise there are duplicates.
slots = append ( slots , state . valueNames [ source . ID ] ... )
state . valueNames [ v . ID ] = slots
2018-05-22 18:00:39 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2022-03-29 13:16:35 -04:00
reg , _ := state . f . getHome ( v . ID ) . ( * Register )
c := state . processValue ( v , slots , reg )
changed = changed || c
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
state . logf ( "Block %v done, locs:\n%v" , b , state . stateString ( state . currentState ) )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2022-03-29 13:16:35 -04:00
locs . relevant = locs . relevant || changed
if ! changed {
locs . endState = startState
} else {
for _ , id := range state . changedSlots . contents ( ) {
slotID := SlotID ( id )
slotLoc := state . currentState . slots [ slotID ]
if slotLoc . absent ( ) {
startState . Delete ( int32 ( slotID ) )
continue
}
old := startState . Find ( int32 ( slotID ) ) // do NOT replace existing values
if oldLS , ok := old . ( * liveSlot ) ; ! ok || oldLS . VarLoc != slotLoc {
startState . Insert ( int32 ( slotID ) ,
& liveSlot { VarLoc : slotLoc } )
}
}
locs . endState = startState
2018-01-22 17:06:26 -05:00
}
2022-03-29 13:16:35 -04:00
locs . lastChangedTime = counterTime
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2022-03-29 13:16:35 -04:00
counterTime ++
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
return blockLocs
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
// mergePredecessors takes the end state of each of b's predecessors and
2022-03-29 13:16:35 -04:00
// intersects them to form the starting state for b. It puts that state
// in blockLocs[b.ID].startState, and fills state.currentState with it.
// It returns the start state and whether this is changed from the
// previously approximated value of startState for this block. After
// the first call, subsequent calls can only shrink startState.
//
// Passing forLocationLists=true enables additional side-effects that
// are necessary for building location lists but superflous while still
// iterating to an answer.
//
// If previousBlock is non-nil, it registers changes vs. that block's
// end state in state.changedVars. Note that previousBlock will often
// not be a predecessor.
//
// Note that mergePredecessors behaves slightly differently between
// first and subsequent calls for a block. For the first call, the
// starting state is approximated by taking the state from the
// predecessor whose state is smallest, and removing any elements not
// in all the other predecessors; this makes the smallest number of
// changes and shares the most state. On subsequent calls the old
// value of startState is adjusted with new information; this is judged
// to do the least amount of extra work.
//
// To improve performance, each block's state information is marked with
// lastChanged and lastChecked "times" so unchanged predecessors can be
// skipped on after-the-first iterations. Doing this allows extra
// iterations by the caller to be almost free.
//
// It is important to know that the set representation used for
// startState, endState, and merges can share data for two sets where
// one is a small delta from the other. Doing this does require a
// little care in how sets are updated, both in mergePredecessors, and
// using its result.
func ( state * debugState ) mergePredecessors ( b * Block , blockLocs [ ] * BlockDebug , previousBlock * Block , forLocationLists bool ) ( abt . T , bool ) {
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
// Filter out back branches.
2018-01-31 15:08:05 -05:00
var predsBuf [ 10 ] * Block
2022-03-29 13:16:35 -04:00
2018-01-31 15:08:05 -05:00
preds := predsBuf [ : 0 ]
2022-03-29 13:16:35 -04:00
locs := blockLocs [ b . ID ]
blockChanged := ! locs . everProcessed // the first time it always changes.
updating := locs . everProcessed
// For the first merge, exclude predecessors that have not been seen yet.
// I.e., backedges.
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _ , pred := range b . Preds {
2022-03-29 13:16:35 -04:00
if bl := blockLocs [ pred . b . ID ] ; bl != nil && bl . everProcessed {
// crucially, a self-edge has bl != nil, but bl.everProcessed is false the first time.
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
preds = append ( preds , pred . b )
}
}
2022-03-29 13:16:35 -04:00
locs . everProcessed = true
if state . loggingLevel > 1 {
2018-01-31 15:08:05 -05:00
// The logf below would cause preds to be heap-allocated if
// it were passed directly.
preds2 := make ( [ ] * Block , len ( preds ) )
copy ( preds2 , preds )
2022-03-29 13:16:35 -04:00
state . logf ( "Merging %v into %v (changed=%d, checked=%d)\n" , preds2 , b , locs . lastChangedTime , locs . lastCheckedTime )
}
state . changedVars . clear ( )
markChangedVars := func ( slots , merged abt . T ) {
if ! forLocationLists {
return
}
// Fill changedVars with those that differ between the previous
// block (in the emit order, not necessarily a flow predecessor)
// and the start state for this block.
for it := slots . Iterator ( ) ; ! it . Done ( ) ; {
k , v := it . Next ( )
m := merged . Find ( k )
if m == nil || v . ( * liveSlot ) . VarLoc != m . ( * liveSlot ) . VarLoc {
state . changedVars . add ( ID ( state . slotVars [ k ] ) )
}
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2022-03-29 13:16:35 -04:00
reset := func ( ourStartState abt . T ) {
if ! ( forLocationLists || blockChanged ) {
// there is no change and this is not for location lists, do
// not bother to reset currentState because it will not be
// examined.
return
2018-11-01 15:26:02 -04:00
}
2022-03-29 13:16:35 -04:00
state . currentState . reset ( ourStartState )
2018-11-01 15:26:02 -04:00
}
2022-03-29 13:16:35 -04:00
// Zero predecessors
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if len ( preds ) == 0 {
2018-11-01 15:26:02 -04:00
if previousBlock != nil {
2022-03-29 13:16:35 -04:00
state . f . Fatalf ( "Function %v, block %s with no predecessors is not first block, has previous %s" , state . f , b . String ( ) , previousBlock . String ( ) )
2018-11-01 15:26:02 -04:00
}
2022-03-29 13:16:35 -04:00
// startState is empty
reset ( abt . T { } )
return abt . T { } , blockChanged
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2022-03-29 13:16:35 -04:00
// One predecessor
l0 := blockLocs [ preds [ 0 ] . ID ]
p0 := l0 . endState
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if len ( preds ) == 1 {
2018-11-01 15:26:02 -04:00
if previousBlock != nil && preds [ 0 ] . ID != previousBlock . ID {
2022-03-29 13:16:35 -04:00
// Change from previous block is its endState minus the predecessor's endState
markChangedVars ( blockLocs [ previousBlock . ID ] . endState , p0 )
}
locs . startState = p0
blockChanged = blockChanged || l0 . lastChangedTime > locs . lastCheckedTime
reset ( p0 )
return p0 , blockChanged
}
// More than one predecessor
if updating {
// After the first approximation, i.e., when updating, results
// can only get smaller, because initially backedge
// predecessors do not participate in the intersection. This
// means that for the update, given the prior approximation of
// startState, there is no need to re-intersect with unchanged
// blocks. Therefore remove unchanged blocks from the
// predecessor list.
for i := len ( preds ) - 1 ; i >= 0 ; i -- {
pred := preds [ i ]
if blockLocs [ pred . ID ] . lastChangedTime > locs . lastCheckedTime {
continue // keep this predecessor
}
preds [ i ] = preds [ len ( preds ) - 1 ]
preds = preds [ : len ( preds ) - 1 ]
if state . loggingLevel > 2 {
state . logf ( "Pruned b%d, lastChanged was %d but b%d lastChecked is %d\n" , pred . ID , blockLocs [ pred . ID ] . lastChangedTime , b . ID , locs . lastCheckedTime )
}
}
// Check for an early out; this should always hit for the update
// if there are no cycles.
if len ( preds ) == 0 {
blockChanged = false
reset ( locs . startState )
if state . loggingLevel > 2 {
state . logf ( "Early out, no predecessors changed since last check\n" )
}
if previousBlock != nil {
markChangedVars ( blockLocs [ previousBlock . ID ] . endState , locs . startState )
}
return locs . startState , blockChanged
2018-11-01 15:26:02 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
2018-11-01 15:26:02 -04:00
baseID := preds [ 0 ] . ID
baseState := p0
2022-03-29 13:16:35 -04:00
// Choose the predecessor with the smallest endState for intersection work
for _ , pred := range preds [ 1 : ] {
if blockLocs [ pred . ID ] . endState . Size ( ) < baseState . Size ( ) {
baseState = blockLocs [ pred . ID ] . endState
baseID = pred . ID
2018-11-01 15:26:02 -04:00
}
}
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 2 {
2018-11-01 15:26:02 -04:00
state . logf ( "Starting %v with state from b%v:\n%v" , b , baseID , state . blockEndStateString ( blockLocs [ baseID ] ) )
2022-03-29 13:16:35 -04:00
for _ , pred := range preds {
if pred . ID == baseID {
continue
2017-09-29 14:14:03 -04:00
}
2022-03-29 13:16:35 -04:00
state . logf ( "Merging in state from %v:\n%v" , pred , state . blockEndStateString ( blockLocs [ pred . ID ] ) )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
2022-03-29 13:16:35 -04:00
state . currentState . reset ( abt . T { } )
// The normal logic of "reset" is incuded in the intersection loop below.
2018-01-22 17:06:26 -05:00
2022-03-29 13:16:35 -04:00
slotLocs := state . currentState . slots
// If this is the first call, do updates on the "baseState"; if this
// is a subsequent call, tweak the startState instead. Note that
// these "set" values are values; there are no side effects to
// other values as these are modified.
newState := baseState
if updating {
newState = blockLocs [ b . ID ] . startState
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2022-03-29 13:16:35 -04:00
for it := newState . Iterator ( ) ; ! it . Done ( ) ; {
k , d := it . Next ( )
thisSlot := d . ( * liveSlot )
x := thisSlot . VarLoc
x0 := x // initial value in newState
// Intersect this slot with the slot in all the predecessors
for _ , other := range preds {
if ! updating && other . ID == baseID {
continue
}
otherSlot := blockLocs [ other . ID ] . endState . Find ( k )
if otherSlot == nil {
x = VarLoc { }
break
}
y := otherSlot . ( * liveSlot ) . VarLoc
x = x . intersect ( y )
if x . absent ( ) {
x = VarLoc { }
break
}
}
2018-02-02 16:26:58 -05:00
2022-03-29 13:16:35 -04:00
// Delete if necessary, but not otherwise (in order to maximize sharing).
if x . absent ( ) {
if ! x0 . absent ( ) {
blockChanged = true
newState . Delete ( k )
}
slotLocs [ k ] = VarLoc { }
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
continue
}
2022-03-29 13:16:35 -04:00
if x != x0 {
blockChanged = true
newState . Insert ( k , & liveSlot { VarLoc : x } )
}
2018-02-02 16:26:58 -05:00
2022-03-29 13:16:35 -04:00
slotLocs [ k ] = x
mask := uint64 ( x . Registers )
2018-01-29 16:09:11 -05:00
for {
if mask == 0 {
break
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2018-05-16 11:21:18 +01:00
reg := uint8 ( bits . TrailingZeros64 ( mask ) )
2018-01-29 16:09:11 -05:00
mask &^= 1 << reg
2022-03-29 13:16:35 -04:00
state . currentState . registers [ reg ] = append ( state . currentState . registers [ reg ] , SlotID ( k ) )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
2018-11-01 15:26:02 -04:00
2022-03-29 13:16:35 -04:00
if previousBlock != nil {
markChangedVars ( blockLocs [ previousBlock . ID ] . endState , newState )
2018-11-01 15:26:02 -04:00
}
2022-03-29 13:16:35 -04:00
locs . startState = newState
return newState , blockChanged
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2022-03-29 13:16:35 -04:00
// processValue updates locs and state.registerContents to reflect v, a
// value with the names in vSlots and homed in vReg. "v" becomes
// visible after execution of the instructions evaluating it. It
// returns which VarIDs were modified by the Value's execution.
2018-01-22 17:06:26 -05:00
func ( state * debugState ) processValue ( v * Value , vSlots [ ] SlotID , vReg * Register ) bool {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
locs := state . currentState
2018-01-22 17:06:26 -05:00
changed := false
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
setSlot := func ( slot SlotID , loc VarLoc ) {
2018-01-22 17:06:26 -05:00
changed = true
2018-02-02 16:26:58 -05:00
state . changedVars . add ( ID ( state . slotVars [ slot ] ) )
2022-03-29 13:16:35 -04:00
state . changedSlots . add ( ID ( slot ) )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state . currentState . slots [ slot ] = loc
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Handle any register clobbering. Call operations, for example,
// clobber all registers even though they don't explicitly write to
// them.
2018-01-29 16:09:11 -05:00
clobbers := uint64 ( opcodeTable [ v . Op ] . reg . clobbers )
for {
if clobbers == 0 {
break
}
2018-05-16 11:21:18 +01:00
reg := uint8 ( bits . TrailingZeros64 ( clobbers ) )
2018-01-29 16:09:11 -05:00
clobbers &^= 1 << reg
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2018-01-29 16:09:11 -05:00
for _ , slot := range locs . registers [ reg ] {
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: %v clobbered out of %v\n" , v , state . slots [ slot ] , & state . registers [ reg ] )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2018-01-29 16:09:11 -05:00
last := locs . slots [ slot ]
if last . absent ( ) {
state . f . Fatalf ( "at %v: slot %v in register %v with no location entry" , v , state . slots [ slot ] , & state . registers [ reg ] )
continue
}
2018-02-25 20:08:22 +09:00
regs := last . Registers &^ ( 1 << reg )
2018-01-31 17:56:14 -05:00
setSlot ( slot , VarLoc { regs , last . StackOffset } )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2018-01-29 16:09:11 -05:00
locs . registers [ reg ] = locs . registers [ reg ] [ : 0 ]
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
switch {
2022-07-22 15:08:07 -07:00
case v . Op == OpVarDef :
2020-12-06 18:13:43 -08:00
n := v . Aux . ( * ir . Name )
2020-11-17 21:47:56 -05:00
if ir . IsSynthetic ( n ) {
2018-03-13 16:14:52 -04:00
break
}
slotID := state . varParts [ n ] [ 0 ]
var stackOffset StackOffset
if v . Op == OpVarDef {
stackOffset = StackOffset ( state . stackOffset ( state . slots [ slotID ] ) << 1 | 1 )
}
setSlot ( slotID , VarLoc { 0 , stackOffset } )
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-03-13 16:14:52 -04:00
if v . Op == OpVarDef {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: stack-only var %v now live\n" , v , state . slots [ slotID ] )
2018-03-13 16:14:52 -04:00
} else {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: stack-only var %v now dead\n" , v , state . slots [ slotID ] )
2018-03-13 16:14:52 -04:00
}
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
case v . Op == OpArg :
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
home := state . f . getHome ( v . ID ) . ( LocalSlot )
2018-01-31 17:56:14 -05:00
stackOffset := state . stackOffset ( home ) << 1 | 1
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _ , slot := range vSlots {
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: arg %v now on stack in location %v\n" , v , state . slots [ slot ] , home )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
if last := locs . slots [ slot ] ; ! last . absent ( ) {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: unexpected arg op on already-live slot %v\n" , v , state . slots [ slot ] )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2017-09-29 14:14:03 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2018-01-31 17:56:14 -05:00
setSlot ( slot , VarLoc { 0 , StackOffset ( stackOffset ) } )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
case v . Op == OpStoreReg :
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
home := state . f . getHome ( v . ID ) . ( LocalSlot )
2018-01-31 17:56:14 -05:00
stackOffset := state . stackOffset ( home ) << 1 | 1
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _ , slot := range vSlots {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
last := locs . slots [ slot ]
if last . absent ( ) {
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: unexpected spill of unnamed register %s\n" , v , vReg )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
break
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2018-01-31 17:56:14 -05:00
setSlot ( slot , VarLoc { last . Registers , StackOffset ( stackOffset ) } )
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
state . logf ( "at %v: %v spilled to stack location %v@%d\n" , v , state . slots [ slot ] , home , state . stackOffset ( home ) )
2017-09-29 14:14:03 -04:00
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
case vReg != nil :
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
newSlots := make ( [ ] bool , len ( state . slots ) )
for _ , slot := range vSlots {
newSlots [ slot ] = true
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _ , slot := range locs . registers [ vReg . num ] {
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
if ! newSlots [ slot ] {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state . logf ( "at %v: overwrote %v in register %v\n" , v , state . slots [ slot ] , vReg )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
}
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _ , slot := range locs . registers [ vReg . num ] {
last := locs . slots [ slot ]
2018-01-31 17:56:14 -05:00
setSlot ( slot , VarLoc { last . Registers &^ ( 1 << uint8 ( vReg . num ) ) , last . StackOffset } )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
locs . registers [ vReg . num ] = locs . registers [ vReg . num ] [ : 0 ]
locs . registers [ vReg . num ] = append ( locs . registers [ vReg . num ] , vSlots ... )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
for _ , slot := range vSlots {
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-05-22 18:00:39 -04:00
state . logf ( "at %v: %v now in %s\n" , v , state . slots [ slot ] , vReg )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2018-02-05 17:04:44 -05:00
last := locs . slots [ slot ]
setSlot ( slot , VarLoc { 1 << uint8 ( vReg . num ) | last . Registers , last . StackOffset } )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
2018-01-22 17:06:26 -05:00
return changed
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// varOffset returns the offset of slot within the user variable it was
// decomposed from. This has nothing to do with its stack offset.
2018-02-05 17:04:44 -05:00
func varOffset ( slot LocalSlot ) int64 {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
offset := slot . Off
2018-02-05 17:04:44 -05:00
s := & slot
for ; s . SplitOf != nil ; s = s . SplitOf {
offset += s . SplitOffset
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
return offset
}
2018-01-22 17:06:26 -05:00
type partsByVarOffset struct {
slotIDs [ ] SlotID
2018-02-05 17:04:44 -05:00
slots [ ] LocalSlot
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-01-22 17:06:26 -05:00
func ( a partsByVarOffset ) Len ( ) int { return len ( a . slotIDs ) }
func ( a partsByVarOffset ) Less ( i , j int ) bool {
2018-07-09 23:08:22 +03:00
return varOffset ( a . slots [ a . slotIDs [ i ] ] ) < varOffset ( a . slots [ a . slotIDs [ j ] ] )
2018-01-22 17:06:26 -05:00
}
func ( a partsByVarOffset ) Swap ( i , j int ) { a . slotIDs [ i ] , a . slotIDs [ j ] = a . slotIDs [ j ] , a . slotIDs [ i ] }
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// A pendingEntry represents the beginning of a location list entry, missing
// only its end coordinate.
type pendingEntry struct {
present bool
startBlock , startValue ID
2018-01-31 16:30:16 -05:00
// The location of each piece of the variable, in the same order as the
// SlotIDs in varParts.
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
pieces [ ] VarLoc
}
func ( e * pendingEntry ) clear ( ) {
e . present = false
e . startBlock = 0
e . startValue = 0
for i := range e . pieces {
e . pieces [ i ] = VarLoc { }
}
}
2022-03-29 13:16:35 -04:00
// canMerge reports whether a new location description is a superset
// of the (non-empty) pending location description, if so, the two
// can be merged (i.e., pending is still a valid and useful location
// description).
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
func canMerge ( pending , new VarLoc ) bool {
if pending . absent ( ) && new . absent ( ) {
return true
}
if pending . absent ( ) || new . absent ( ) {
return false
}
2022-03-29 13:16:35 -04:00
// pending is not absent, therefore it has either a stack mapping,
// or registers, or both.
if pending . onStack ( ) && pending . StackOffset != new . StackOffset {
// if pending has a stack offset, then new must also, and it
// must be the same (StackOffset encodes onStack).
return false
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2022-03-29 13:16:35 -04:00
if pending . Registers & new . Registers != pending . Registers {
// There is at least one register in pending not mentioned in new.
return false
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2022-03-29 13:16:35 -04:00
return true
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// firstReg returns the first register in set that is present.
func firstReg ( set RegisterSet ) uint8 {
2018-01-29 16:09:11 -05:00
if set == 0 {
// This is wrong, but there seem to be some situations where we
// produce locations with no storage.
return 0
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-05-16 11:21:18 +01:00
return uint8 ( bits . TrailingZeros64 ( uint64 ( set ) ) )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2022-03-29 13:16:35 -04:00
// buildLocationLists builds location lists for all the user variables
// in state.f, using the information about block state in blockLocs.
// The returned location lists are not fully complete. They are in
// terms of SSA values rather than PCs, and have no base address/end
// entries. They will be finished by PutLocationList.
2018-02-05 16:55:54 -05:00
func ( state * debugState ) buildLocationLists ( blockLocs [ ] * BlockDebug ) {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Run through the function in program text order, building up location
// lists as we go. The heavy lifting has mostly already been done.
2018-10-26 12:00:07 -04:00
2018-11-01 15:26:02 -04:00
var prevBlock * Block
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _ , b := range state . f . Blocks {
2022-03-29 13:16:35 -04:00
state . mergePredecessors ( b , blockLocs , prevBlock , true )
// Handle any differences among predecessor blocks and previous block (perhaps not a predecessor)
for _ , varID := range state . changedVars . contents ( ) {
state . updateVar ( VarID ( varID ) , b , BlockStart )
}
state . changedVars . clear ( )
2018-11-01 15:26:02 -04:00
2018-02-05 17:04:44 -05:00
if ! blockLocs [ b . ID ] . relevant {
continue
}
2021-07-01 11:22:02 -04:00
mustBeFirst := func ( v * Value ) bool {
return v . Op == OpPhi || v . Op . isLoweredGetClosurePtr ( ) ||
v . Op == OpArgIntReg || v . Op == OpArgFloatReg
}
2021-11-08 13:53:55 -05:00
blockPrologComplete := func ( v * Value ) bool {
if b . ID != state . f . Entry . ID {
return ! opcodeTable [ v . Op ] . zeroWidth
} else {
return v . Op == OpInitMem
}
}
// Examine the prolog portion of the block to process special
// zero-width ops such as Arg, Phi, LoweredGetClosurePtr (etc)
// whose lifetimes begin at the block starting point. In an
// entry block, allow for the possibility that we may see Arg
// ops that appear _after_ other non-zero-width operations.
// Example:
//
// v33 = ArgIntReg <uintptr> {foo+0} [0] : AX (foo)
// v34 = ArgIntReg <uintptr> {bar+0} [0] : BX (bar)
// ...
// v77 = StoreReg <unsafe.Pointer> v67 : ctx+8[unsafe.Pointer]
// v78 = StoreReg <unsafe.Pointer> v68 : ctx[unsafe.Pointer]
// v79 = Arg <*uint8> {args} : args[*uint8] (args[*uint8])
// v80 = Arg <int> {args} [8] : args+8[int] (args+8[int])
// ...
// v1 = InitMem <mem>
//
// We can stop scanning the initial portion of the block when
// we either see the InitMem op (for entry blocks) or the
// first non-zero-width op (for other blocks).
for idx := 0 ; idx < len ( b . Values ) ; idx ++ {
v := b . Values [ idx ]
if blockPrologComplete ( v ) {
break
}
// Consider only "lifetime begins at block start" ops.
if ! mustBeFirst ( v ) && v . Op != OpArg {
continue
}
slots := state . valueNames [ v . ID ]
reg , _ := state . f . getHome ( v . ID ) . ( * Register )
changed := state . processValue ( v , slots , reg ) // changed == added to state.changedVars
if changed {
for _ , varID := range state . changedVars . contents ( ) {
state . updateVar ( VarID ( varID ) , v . Block , BlockStart )
}
state . changedVars . clear ( )
}
}
// Now examine the block again, handling things other than the
// "begins at block start" lifetimes.
2018-02-28 17:53:31 -05:00
zeroWidthPending := false
2021-11-08 13:53:55 -05:00
prologComplete := false
2018-10-26 12:00:07 -04:00
// expect to see values in pattern (apc)* (zerowidth|real)*
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
for _ , v := range b . Values {
2021-11-08 13:53:55 -05:00
if blockPrologComplete ( v ) {
prologComplete = true
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
slots := state . valueNames [ v . ID ]
reg , _ := state . f . getHome ( v . ID ) . ( * Register )
2018-10-26 12:00:07 -04:00
changed := state . processValue ( v , slots , reg ) // changed == added to state.changedVars
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
2018-02-28 17:53:31 -05:00
if opcodeTable [ v . Op ] . zeroWidth {
2021-11-08 13:53:55 -05:00
if prologComplete && mustBeFirst ( v ) {
panic ( fmt . Errorf ( "Unexpected placement of op '%s' appearing after non-pseudo-op at beginning of block %s in %s\n%s" , v . LongString ( ) , b , b . Func . Name , b . Func ) )
}
2018-01-22 17:06:26 -05:00
if changed {
2021-07-01 11:22:02 -04:00
if mustBeFirst ( v ) || v . Op == OpArg {
2021-11-08 13:53:55 -05:00
// already taken care of above
2018-10-26 12:00:07 -04:00
continue
}
2021-11-08 13:53:55 -05:00
zeroWidthPending = true
2018-01-22 17:06:26 -05:00
}
continue
}
2018-02-28 17:53:31 -05:00
if ! changed && ! zeroWidthPending {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
continue
}
2021-11-08 13:53:55 -05:00
// Not zero-width; i.e., a "real" instruction.
2018-02-28 17:53:31 -05:00
zeroWidthPending = false
2021-11-08 13:53:55 -05:00
for _ , varID := range state . changedVars . contents ( ) {
state . updateVar ( VarID ( varID ) , v . Block , v )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-02-02 16:26:58 -05:00
state . changedVars . clear ( )
2018-10-26 12:00:07 -04:00
}
2021-11-08 13:53:55 -05:00
for _ , varID := range state . changedVars . contents ( ) {
state . updateVar ( VarID ( varID ) , b , BlockEnd )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2018-11-01 15:26:02 -04:00
prevBlock = b
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 0 {
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
state . logf ( "location lists:\n" )
}
// Flush any leftover entries live at the end of the last block.
2018-02-05 16:55:54 -05:00
for varID := range state . lists {
2021-04-28 18:26:54 -04:00
state . writePendingEntry ( VarID ( varID ) , state . f . Blocks [ len ( state . f . Blocks ) - 1 ] . ID , FuncEnd . ID )
2018-02-05 16:55:54 -05:00
list := state . lists [ varID ]
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 0 {
2018-05-22 18:00:39 -04:00
if len ( list ) == 0 {
state . logf ( "\t%v : empty list\n" , state . vars [ varID ] )
} else {
state . logf ( "\t%v : %q\n" , state . vars [ varID ] , hex . EncodeToString ( state . lists [ varID ] ) )
}
2018-02-05 16:55:54 -05:00
}
}
}
// updateVar updates the pending location list entry for varID to
2018-10-26 12:00:07 -04:00
// reflect the new locations in curLoc, beginning at v in block b.
// v may be one of the special values indicating block start or end.
func ( state * debugState ) updateVar ( varID VarID , b * Block , v * Value ) {
curLoc := state . currentState . slots
2018-02-05 16:55:54 -05:00
// Assemble the location list entry with whatever's live.
empty := true
for _ , slotID := range state . varSlots [ varID ] {
if ! curLoc [ slotID ] . absent ( ) {
empty = false
break
}
}
pending := & state . pendingEntries [ varID ]
if empty {
2018-10-26 12:00:07 -04:00
state . writePendingEntry ( varID , b . ID , v . ID )
2018-02-05 16:55:54 -05:00
pending . clear ( )
return
}
// Extend the previous entry if possible.
if pending . present {
merge := true
for i , slotID := range state . varSlots [ varID ] {
if ! canMerge ( pending . pieces [ i ] , curLoc [ slotID ] ) {
merge = false
break
}
}
if merge {
return
}
}
2018-10-26 12:00:07 -04:00
state . writePendingEntry ( varID , b . ID , v . ID )
2018-02-05 16:55:54 -05:00
pending . present = true
2018-10-26 12:00:07 -04:00
pending . startBlock = b . ID
2018-02-05 16:55:54 -05:00
pending . startValue = v . ID
for i , slot := range state . varSlots [ varID ] {
pending . pieces [ i ] = curLoc [ slot ]
}
}
// writePendingEntry writes out the pending entry for varID, if any,
// terminated at endBlock/Value.
func ( state * debugState ) writePendingEntry ( varID VarID , endBlock , endValue ID ) {
pending := state . pendingEntries [ varID ]
if ! pending . present {
return
}
// Pack the start/end coordinates into the start/end addresses
// of the entry, for decoding by PutLocationList.
start , startOK := encodeValue ( state . ctxt , pending . startBlock , pending . startValue )
end , endOK := encodeValue ( state . ctxt , endBlock , endValue )
if ! startOK || ! endOK {
// If someone writes a function that uses >65K values,
// they get incomplete debug info on 32-bit platforms.
return
}
2018-11-15 14:11:19 -05:00
if start == end {
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-11-15 14:11:19 -05:00
// Printf not logf so not gated by GOSSAFUNC; this should fire very rarely.
2022-03-29 13:16:35 -04:00
// TODO this fires a lot, need to figure out why.
state . logf ( "Skipping empty location list for %v in %s\n" , state . vars [ varID ] , state . f . Name )
2018-11-15 14:11:19 -05:00
}
return
}
2018-02-05 16:55:54 -05:00
list := state . lists [ varID ]
list = appendPtr ( state . ctxt , list , start )
list = appendPtr ( state . ctxt , list , end )
// Where to write the length of the location description once
// we know how big it is.
sizeIdx := len ( list )
list = list [ : len ( list ) + 2 ]
2022-03-29 13:16:35 -04:00
if state . loggingLevel > 1 {
2018-02-05 16:55:54 -05:00
var partStrs [ ] string
for i , slot := range state . varSlots [ varID ] {
partStrs = append ( partStrs , fmt . Sprintf ( "%v@%v" , state . slots [ slot ] , state . LocString ( pending . pieces [ i ] ) ) )
}
state . logf ( "Add entry for %v: \tb%vv%v-b%vv%v = \t%v\n" , state . vars [ varID ] , pending . startBlock , pending . startValue , endBlock , endValue , strings . Join ( partStrs , " " ) )
}
for i , slotID := range state . varSlots [ varID ] {
loc := pending . pieces [ i ]
slot := state . slots [ slotID ]
if ! loc . absent ( ) {
if loc . onStack ( ) {
if loc . stackOffsetValue ( ) == 0 {
list = append ( list , dwarf . DW_OP_call_frame_cfa )
} else {
list = append ( list , dwarf . DW_OP_fbreg )
list = dwarf . AppendSleb128 ( list , int64 ( loc . stackOffsetValue ( ) ) )
}
} else {
regnum := state . ctxt . Arch . DWARFRegisters [ state . registers [ firstReg ( loc . Registers ) ] . ObjNum ( ) ]
if regnum < 32 {
list = append ( list , dwarf . DW_OP_reg0 + byte ( regnum ) )
} else {
list = append ( list , dwarf . DW_OP_regx )
list = dwarf . AppendUleb128 ( list , uint64 ( regnum ) )
}
}
}
if len ( state . varSlots [ varID ] ) > 1 {
list = append ( list , dwarf . DW_OP_piece )
list = dwarf . AppendUleb128 ( list , uint64 ( slot . Type . Size ( ) ) )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
}
2018-02-05 16:55:54 -05:00
state . ctxt . Arch . ByteOrder . PutUint16 ( list [ sizeIdx : ] , uint16 ( len ( list ) - sizeIdx - 2 ) )
state . lists [ varID ] = list
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// PutLocationList adds list (a location list in its intermediate representation) to listSym.
func ( debugInfo * FuncDebug ) PutLocationList ( list [ ] byte , ctxt * obj . Link , listSym , startPC * obj . LSym ) {
getPC := debugInfo . GetPC
2019-04-02 17:26:49 -04:00
if ctxt . UseBASEntries {
listSym . WriteInt ( ctxt , listSym . Size , ctxt . Arch . PtrSize , ^ 0 )
listSym . WriteAddr ( ctxt , listSym . Size , ctxt . Arch . PtrSize , startPC , 0 )
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
// Re-read list, translating its address from block/value ID to PC.
for i := 0 ; i < len ( list ) ; {
2018-02-28 16:32:11 -05:00
begin := getPC ( decodeValue ( ctxt , readPtr ( ctxt , list [ i : ] ) ) )
end := getPC ( decodeValue ( ctxt , readPtr ( ctxt , list [ i + ctxt . Arch . PtrSize : ] ) ) )
// Horrible hack. If a range contains only zero-width
// instructions, e.g. an Arg, and it's at the beginning of the
// function, this would be indistinguishable from an
// end entry. Fudge it.
if begin == 0 && end == 0 {
end = 1
}
2019-04-02 17:26:49 -04:00
if ctxt . UseBASEntries {
listSym . WriteInt ( ctxt , listSym . Size , ctxt . Arch . PtrSize , int64 ( begin ) )
listSym . WriteInt ( ctxt , listSym . Size , ctxt . Arch . PtrSize , int64 ( end ) )
} else {
listSym . WriteCURelativeAddr ( ctxt , listSym . Size , startPC , int64 ( begin ) )
listSym . WriteCURelativeAddr ( ctxt , listSym . Size , startPC , int64 ( end ) )
}
2018-02-28 16:32:11 -05:00
i += 2 * ctxt . Arch . PtrSize
2019-04-02 17:26:49 -04:00
datalen := 2 + int ( ctxt . Arch . ByteOrder . Uint16 ( list [ i : ] ) )
listSym . WriteBytes ( ctxt , listSym . Size , list [ i : i + datalen ] ) // copy datalen and location encoding
i += datalen
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// Location list contents, now with real PCs.
// End entry.
listSym . WriteInt ( ctxt , listSym . Size , ctxt . Arch . PtrSize , 0 )
listSym . WriteInt ( ctxt , listSym . Size , ctxt . Arch . PtrSize , 0 )
}
2022-03-29 13:16:35 -04:00
// Pack a value and block ID into an address-sized uint, returning
// encoded value and boolean indicating whether the encoding succeeded.
// For 32-bit architectures the process may fail for very large
// procedures(the theory being that it's ok to have degraded debug
// quality in this case).
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
func encodeValue ( ctxt * obj . Link , b , v ID ) ( uint64 , bool ) {
if ctxt . Arch . PtrSize == 8 {
result := uint64 ( b ) << 32 | uint64 ( uint32 ( v ) )
//ctxt.Logf("b %#x (%d) v %#x (%d) -> %#x\n", b, b, v, v, result)
return result , true
}
if ctxt . Arch . PtrSize != 4 {
panic ( "unexpected pointer size" )
}
if ID ( int16 ( b ) ) != b || ID ( int16 ( v ) ) != v {
return 0 , false
}
return uint64 ( b ) << 16 | uint64 ( uint16 ( v ) ) , true
}
// Unpack a value and block ID encoded by encodeValue.
func decodeValue ( ctxt * obj . Link , word uint64 ) ( ID , ID ) {
if ctxt . Arch . PtrSize == 8 {
b , v := ID ( word >> 32 ) , ID ( word )
//ctxt.Logf("%#x -> b %#x (%d) v %#x (%d)\n", word, b, b, v, v)
return b , v
}
if ctxt . Arch . PtrSize != 4 {
panic ( "unexpected pointer size" )
}
2018-03-07 16:21:47 -05:00
return ID ( word >> 16 ) , ID ( int16 ( word ) )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// Append a pointer-sized uint to buf.
func appendPtr ( ctxt * obj . Link , buf [ ] byte , word uint64 ) [ ] byte {
2018-01-31 17:56:14 -05:00
if cap ( buf ) < len ( buf ) + 20 {
b := make ( [ ] byte , len ( buf ) , 20 + cap ( buf ) * 2 )
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
copy ( b , buf )
buf = b
}
writeAt := len ( buf )
buf = buf [ 0 : len ( buf ) + ctxt . Arch . PtrSize ]
writePtr ( ctxt , buf [ writeAt : ] , word )
return buf
}
// Write a pointer-sized uint to the beginning of buf.
func writePtr ( ctxt * obj . Link , buf [ ] byte , word uint64 ) {
switch ctxt . Arch . PtrSize {
case 4 :
ctxt . Arch . ByteOrder . PutUint32 ( buf , uint32 ( word ) )
case 8 :
ctxt . Arch . ByteOrder . PutUint64 ( buf , word )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
default :
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
panic ( "unexpected pointer size" )
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
cmd/compile: reimplement location list generation
Completely redesign and reimplement location list generation to be more
efficient, and hopefully not too hard to understand.
RegKills are gone. Instead of using the regalloc's liveness
calculations, redo them using the Ops' clobber information. Besides
saving a lot of Values, this avoids adding RegKills to blocks that would
be empty otherwise, which was messing up optimizations. This does mean
that it's much harder to tell whether the generation process is buggy
(there's nothing to cross-check it with), and there may be disagreements
with GC liveness. But the performance gain is significant, and it's nice
not to be messing with earlier compiler phases.
The intermediate representations are gone. Instead of producing
ssa.BlockDebugs, then dwarf.LocationLists, and then finally real
location lists, go directly from the SSA to a (mostly) real location
list. Because the SSA analysis happens before assembly, it stores
encoded block/value IDs where PCs would normally go. It would be easier
to do the SSA analysis after assembly, but I didn't want to retain the
SSA just for that.
Generation proceeds in two phases: first, it traverses the function in
CFG order, storing the state of the block at the beginning and end. End
states are used to produce the start states of the successor blocks. In
the second phase, it traverses in program text order and produces the
location lists. The processing in the second phase is redundant, but
much cheaper than storing the intermediate representation. It might be
possible to combine the two phases somewhat to take advantage of cases
where the CFG matches the block layout, but I haven't tried.
Location lists are finalized by adding a base address selection entry,
translating each encoded block/value ID to a real PC, and adding the
terminating zero entry. This probably won't work on OSX, where dsymutil
will choke on the base address selection. I tried emitting CU-relative
relocations for each address, and it was *very* bad for performance --
it uses more memory storing all the relocations than it does for the
actual location list bytes. I think I'm going to end up synthesizing the
relocations in the linker only on OSX, but TBD.
TestNexting needs updating: with more optimizations working, the
debugger doesn't stop on the continue (line 88) any more, and the test's
duplicate suppression kicks in. Also, dx and dy live a little longer
now, but they have the correct values.
Change-Id: Ie772dfe23a4e389ca573624fac4d05401ae32307
Reviewed-on: https://go-review.googlesource.com/89356
Run-TryBot: Heschi Kreinick <heschi@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: David Chase <drchase@google.com>
2017-10-26 15:40:17 -04:00
}
// Read a pointer-sized uint from the beginning of buf.
func readPtr ( ctxt * obj . Link , buf [ ] byte ) uint64 {
switch ctxt . Arch . PtrSize {
case 4 :
return uint64 ( ctxt . Arch . ByteOrder . Uint32 ( buf ) )
case 8 :
return ctxt . Arch . ByteOrder . Uint64 ( buf )
default :
panic ( "unexpected pointer size" )
}
[dev.debug] cmd/compile: better DWARF with optimizations on
Debuggers use DWARF information to find local variables on the
stack and in registers. Prior to this CL, the DWARF information for
functions claimed that all variables were on the stack at all times.
That's incorrect when optimizations are enabled, and results in
debuggers showing data that is out of date or complete gibberish.
After this CL, the compiler is capable of representing variable
locations more accurately, and attempts to do so. Due to limitations of
the SSA backend, it's not possible to be completely correct.
There are a number of problems in the current design. One of the easier
to understand is that variable names currently must be attached to an
SSA value, but not all assignments in the source code actually result
in machine code. For example:
type myint int
var a int
b := myint(int)
and
b := (*uint64)(unsafe.Pointer(a))
don't generate machine code because the underlying representation is the
same, so the correct value of b will not be set when the user would
expect.
Generating the more precise debug information is behind a flag,
dwarflocationlists. Because of the issues described above, setting the
flag may not make the debugging experience much better, and may actually
make it worse in cases where the variable actually is on the stack and
the more complicated analysis doesn't realize it.
A number of changes are included:
- Add a new pseudo-instruction, RegKill, which indicates that the value
in the register has been clobbered.
- Adjust regalloc to emit RegKills in the right places. Significantly,
this means that phis are mixed with StoreReg and RegKills after
regalloc.
- Track variable decomposition in ssa.LocalSlots.
- After the SSA backend is done, analyze the result and build location
lists for each LocalSlot.
- After assembly is done, update the location lists with the assembled
PC offsets, recompose variables, and build DWARF location lists. Emit the
list as a new linker symbol, one per function.
- In the linker, aggregate the location lists into a .debug_loc section.
TODO:
- currently disabled for non-X86/AMD64 because there are no data tables.
go build -toolexec 'toolstash -cmp' -a std succeeds.
With -dwarflocationlists false:
before: f02812195637909ff675782c0b46836a8ff01976
after: 06f61e8112a42ac34fb80e0c818b3cdb84a5e7ec
benchstat -geomean /tmp/220352263 /tmp/621364410
completed 15 of 15, estimated time remaining 0s (eta 3:52PM)
name old time/op new time/op delta
Template 199ms ± 3% 198ms ± 2% ~ (p=0.400 n=15+14)
Unicode 96.6ms ± 5% 96.4ms ± 5% ~ (p=0.838 n=15+15)
GoTypes 653ms ± 2% 647ms ± 2% ~ (p=0.102 n=15+14)
Flate 133ms ± 6% 129ms ± 3% -2.62% (p=0.041 n=15+15)
GoParser 164ms ± 5% 159ms ± 3% -3.05% (p=0.000 n=15+15)
Reflect 428ms ± 4% 422ms ± 3% ~ (p=0.156 n=15+13)
Tar 123ms ±10% 124ms ± 8% ~ (p=0.461 n=15+15)
XML 228ms ± 3% 224ms ± 3% -1.57% (p=0.045 n=15+15)
[Geo mean] 206ms 377ms +82.86%
name old user-time/op new user-time/op delta
Template 292ms ±10% 301ms ±12% ~ (p=0.189 n=15+15)
Unicode 166ms ±37% 158ms ±14% ~ (p=0.418 n=15+14)
GoTypes 962ms ± 6% 963ms ± 7% ~ (p=0.976 n=15+15)
Flate 207ms ±19% 200ms ±14% ~ (p=0.345 n=14+15)
GoParser 246ms ±22% 240ms ±15% ~ (p=0.587 n=15+15)
Reflect 611ms ±13% 587ms ±14% ~ (p=0.085 n=15+13)
Tar 211ms ±12% 217ms ±14% ~ (p=0.355 n=14+15)
XML 335ms ±15% 320ms ±18% ~ (p=0.169 n=15+15)
[Geo mean] 317ms 583ms +83.72%
name old alloc/op new alloc/op delta
Template 40.2MB ± 0% 40.2MB ± 0% -0.15% (p=0.000 n=14+15)
Unicode 29.2MB ± 0% 29.3MB ± 0% ~ (p=0.624 n=15+15)
GoTypes 114MB ± 0% 114MB ± 0% -0.15% (p=0.000 n=15+14)
Flate 25.7MB ± 0% 25.6MB ± 0% -0.18% (p=0.000 n=13+15)
GoParser 32.2MB ± 0% 32.2MB ± 0% -0.14% (p=0.003 n=15+15)
Reflect 77.8MB ± 0% 77.9MB ± 0% ~ (p=0.061 n=15+15)
Tar 27.1MB ± 0% 27.0MB ± 0% -0.11% (p=0.029 n=15+15)
XML 42.7MB ± 0% 42.5MB ± 0% -0.29% (p=0.000 n=15+15)
[Geo mean] 42.1MB 75.0MB +78.05%
name old allocs/op new allocs/op delta
Template 402k ± 1% 398k ± 0% -0.91% (p=0.000 n=15+15)
Unicode 344k ± 1% 344k ± 0% ~ (p=0.715 n=15+14)
GoTypes 1.18M ± 0% 1.17M ± 0% -0.91% (p=0.000 n=15+14)
Flate 243k ± 0% 240k ± 1% -1.05% (p=0.000 n=13+15)
GoParser 327k ± 1% 324k ± 1% -0.96% (p=0.000 n=15+15)
Reflect 984k ± 1% 982k ± 0% ~ (p=0.050 n=15+15)
Tar 261k ± 1% 259k ± 1% -0.77% (p=0.000 n=15+15)
XML 411k ± 0% 404k ± 1% -1.55% (p=0.000 n=15+15)
[Geo mean] 439k 755k +72.01%
name old text-bytes new text-bytes delta
HelloSize 694kB ± 0% 694kB ± 0% -0.00% (p=0.000 n=15+15)
name old data-bytes new data-bytes delta
HelloSize 5.55kB ± 0% 5.55kB ± 0% ~ (all equal)
name old bss-bytes new bss-bytes delta
HelloSize 133kB ± 0% 133kB ± 0% ~ (all equal)
name old exe-bytes new exe-bytes delta
HelloSize 1.04MB ± 0% 1.04MB ± 0% ~ (all equal)
Change-Id: I991fc553ef175db46bb23b2128317bbd48de70d8
Reviewed-on: https://go-review.googlesource.com/41770
Reviewed-by: Josh Bleecher Snyder <josharian@gmail.com>
2017-07-21 18:30:19 -04:00
}
2021-04-27 12:40:43 -04:00
// setupLocList creates the initial portion of a location list for a
// user variable. It emits the encoded start/end of the range and a
// placeholder for the size. Return value is the new list plus the
// slot in the list holding the size (to be updated later).
func setupLocList ( ctxt * obj . Link , f * Func , list [ ] byte , st , en ID ) ( [ ] byte , int ) {
start , startOK := encodeValue ( ctxt , f . Entry . ID , st )
end , endOK := encodeValue ( ctxt , f . Entry . ID , en )
if ! startOK || ! endOK {
// This could happen if someone writes a function that uses
// >65K values on a 32-bit platform. Hopefully a degraded debugging
// experience is ok in that case.
return nil , 0
}
list = appendPtr ( ctxt , list , start )
list = appendPtr ( ctxt , list , end )
// Where to write the length of the location description once
// we know how big it is.
sizeIdx := len ( list )
list = list [ : len ( list ) + 2 ]
return list , sizeIdx
}
// locatePrologEnd walks the entry block of a function with incoming
// register arguments and locates the last instruction in the prolog
// that spills a register arg. It returns the ID of that instruction
// Example:
//
2022-02-03 14:12:08 -05:00
// b1:
// v3 = ArgIntReg <int> {p1+0} [0] : AX
// ... more arg regs ..
// v4 = ArgFloatReg <float32> {f1+0} [0] : X0
// v52 = MOVQstore <mem> {p1} v2 v3 v1
// ... more stores ...
// v68 = MOVSSstore <mem> {f4} v2 v67 v66
// v38 = MOVQstoreconst <mem> {blob} [val=0,off=0] v2 v32
2021-04-27 12:40:43 -04:00
//
// Important: locatePrologEnd is expected to work properly only with
// optimization turned off (e.g. "-N"). If optimization is enabled
// we can't be assured of finding all input arguments spilled in the
// entry block prolog.
func locatePrologEnd ( f * Func ) ID {
// returns true if this instruction looks like it moves an ABI
// register to the stack, along with the value being stored.
isRegMoveLike := func ( v * Value ) ( bool , ID ) {
n , ok := v . Aux . ( * ir . Name )
var r ID
if ! ok || n . Class != ir . PPARAM {
return false , r
}
regInputs , memInputs , spInputs := 0 , 0 , 0
for _ , a := range v . Args {
if a . Op == OpArgIntReg || a . Op == OpArgFloatReg {
regInputs ++
r = a . ID
} else if a . Type . IsMemory ( ) {
memInputs ++
} else if a . Op == OpSP {
spInputs ++
} else {
return false , r
}
}
return v . Type . IsMemory ( ) && memInputs == 1 &&
regInputs == 1 && spInputs == 1 , r
}
// OpArg*Reg values we've seen so far on our forward walk,
// for which we have not yet seen a corresponding spill.
regArgs := make ( [ ] ID , 0 , 32 )
// removeReg tries to remove a value from regArgs, returning true
// if found and removed, or false otherwise.
removeReg := func ( r ID ) bool {
for i := 0 ; i < len ( regArgs ) ; i ++ {
if regArgs [ i ] == r {
regArgs = append ( regArgs [ : i ] , regArgs [ i + 1 : ] ... )
return true
}
}
return false
}
// Walk forwards through the block. When we see OpArg*Reg, record
// the value it produces in the regArgs list. When see a store that uses
// the value, remove the entry. When we hit the last store (use)
// then we've arrived at the end of the prolog.
for k , v := range f . Entry . Values {
if v . Op == OpArgIntReg || v . Op == OpArgFloatReg {
regArgs = append ( regArgs , v . ID )
continue
}
if ok , r := isRegMoveLike ( v ) ; ok {
if removed := removeReg ( r ) ; removed {
if len ( regArgs ) == 0 {
// Found our last spill; return the value after
// it. Note that it is possible that this spill is
// the last instruction in the block. If so, then
// return the "end of block" sentinel.
if k < len ( f . Entry . Values ) - 1 {
return f . Entry . Values [ k + 1 ] . ID
}
return BlockEnd . ID
}
}
}
if v . Op . IsCall ( ) {
// if we hit a call, we've gone too far.
return v . ID
}
}
// nothing found
return ID ( - 1 )
}
// isNamedRegParam returns true if the param corresponding to "p"
// is a named, non-blank input parameter assigned to one or more
// registers.
func isNamedRegParam ( p abi . ABIParamAssignment ) bool {
if p . Name == nil {
return false
}
n := p . Name . ( * ir . Name )
if n . Sym ( ) == nil || n . Sym ( ) . IsBlank ( ) {
return false
}
if len ( p . Registers ) == 0 {
return false
}
return true
}
2021-11-10 10:44:00 -05:00
// BuildFuncDebugNoOptimized populates a FuncDebug object "rval" with
2021-04-27 12:40:43 -04:00
// entries corresponding to the register-resident input parameters for
// the function "f"; it is used when we are compiling without
// optimization but the register ABI is enabled. For each reg param,
// it constructs a 2-element location list: the first element holds
// the input register, and the second element holds the stack location
// of the param (the assumption being that when optimization is off,
// each input param reg will be spilled in the prolog.
2021-11-10 10:44:00 -05:00
func BuildFuncDebugNoOptimized ( ctxt * obj . Link , f * Func , loggingEnabled bool , stackOffset func ( LocalSlot ) int32 , rval * FuncDebug ) {
2021-04-27 12:40:43 -04:00
pri := f . ABISelf . ABIAnalyzeFuncType ( f . Type . FuncType ( ) )
// Look to see if we have any named register-promoted parameters.
// If there are none, bail early and let the caller sort things
// out for the remainder of the params/locals.
numRegParams := 0
for _ , inp := range pri . InParams ( ) {
if isNamedRegParam ( inp ) {
numRegParams ++
}
}
if numRegParams == 0 {
2021-11-10 10:44:00 -05:00
return
2021-04-27 12:40:43 -04:00
}
2021-05-04 20:38:24 -04:00
state := debugState { f : f }
if loggingEnabled {
state . logf ( "generating -N reg param loc lists for func %q\n" , f . Name )
}
2021-04-27 12:40:43 -04:00
// Allocate location lists.
2021-11-10 10:44:00 -05:00
rval . LocationLists = make ( [ ] [ ] byte , numRegParams )
2021-04-27 12:40:43 -04:00
// Locate the value corresponding to the last spill of
// an input register.
afterPrologVal := locatePrologEnd ( f )
// Walk the input params again and process the register-resident elements.
pidx := 0
for _ , inp := range pri . InParams ( ) {
if ! isNamedRegParam ( inp ) {
// will be sorted out elsewhere
continue
}
n := inp . Name . ( * ir . Name )
sl := LocalSlot { N : n , Type : inp . Type , Off : 0 }
2021-11-10 10:44:00 -05:00
rval . Vars = append ( rval . Vars , n )
rval . Slots = append ( rval . Slots , sl )
slid := len ( rval . VarSlots )
rval . VarSlots = append ( rval . VarSlots , [ ] SlotID { SlotID ( slid ) } )
2021-04-27 12:40:43 -04:00
2021-05-04 16:18:56 -04:00
if afterPrologVal == ID ( - 1 ) {
// This can happen for degenerate functions with infinite
// loops such as that in issue 45948. In such cases, leave
// the var/slot set up for the param, but don't try to
// emit a location list.
2021-05-04 20:38:24 -04:00
if loggingEnabled {
state . logf ( "locatePrologEnd failed, skipping %v\n" , n )
}
2021-05-04 16:18:56 -04:00
pidx ++
continue
}
2021-04-27 12:40:43 -04:00
// Param is arriving in one or more registers. We need a 2-element
// location expression for it. First entry in location list
// will correspond to lifetime in input registers.
2021-11-10 10:44:00 -05:00
list , sizeIdx := setupLocList ( ctxt , f , rval . LocationLists [ pidx ] ,
2021-04-27 12:40:43 -04:00
BlockStart . ID , afterPrologVal )
if list == nil {
pidx ++
continue
}
2021-05-04 20:38:24 -04:00
if loggingEnabled {
state . logf ( "param %v:\n [<entry>, %d]:\n" , n , afterPrologVal )
}
2021-04-27 12:40:43 -04:00
rtypes , _ := inp . RegisterTypesAndOffsets ( )
2021-04-29 11:47:18 -04:00
padding := make ( [ ] uint64 , 0 , 32 )
padding = inp . ComputePadding ( padding )
2021-04-27 12:40:43 -04:00
for k , r := range inp . Registers {
reg := ObjRegForAbiReg ( r , f . Config )
dwreg := ctxt . Arch . DWARFRegisters [ reg ]
if dwreg < 32 {
list = append ( list , dwarf . DW_OP_reg0 + byte ( dwreg ) )
} else {
list = append ( list , dwarf . DW_OP_regx )
list = dwarf . AppendUleb128 ( list , uint64 ( dwreg ) )
}
2021-05-04 20:38:24 -04:00
if loggingEnabled {
state . logf ( " piece %d -> dwreg %d" , k , dwreg )
}
2021-04-27 12:40:43 -04:00
if len ( inp . Registers ) > 1 {
list = append ( list , dwarf . DW_OP_piece )
2021-08-26 12:11:14 -07:00
ts := rtypes [ k ] . Size ( )
2021-04-27 12:40:43 -04:00
list = dwarf . AppendUleb128 ( list , uint64 ( ts ) )
2021-04-29 11:47:18 -04:00
if padding [ k ] > 0 {
2021-05-04 20:38:24 -04:00
if loggingEnabled {
state . logf ( " [pad %d bytes]" , padding [ k ] )
}
2021-04-29 11:47:18 -04:00
list = append ( list , dwarf . DW_OP_piece )
list = dwarf . AppendUleb128 ( list , padding [ k ] )
}
2021-04-27 12:40:43 -04:00
}
2021-05-04 20:38:24 -04:00
if loggingEnabled {
state . logf ( "\n" )
}
2021-04-27 12:40:43 -04:00
}
// fill in length of location expression element
ctxt . Arch . ByteOrder . PutUint16 ( list [ sizeIdx : ] , uint16 ( len ( list ) - sizeIdx - 2 ) )
// Second entry in the location list will be the stack home
// of the param, once it has been spilled. Emit that now.
list , sizeIdx = setupLocList ( ctxt , f , list ,
afterPrologVal , FuncEnd . ID )
if list == nil {
pidx ++
continue
}
soff := stackOffset ( sl )
if soff == 0 {
list = append ( list , dwarf . DW_OP_call_frame_cfa )
} else {
list = append ( list , dwarf . DW_OP_fbreg )
list = dwarf . AppendSleb128 ( list , int64 ( soff ) )
}
2021-05-04 20:38:24 -04:00
if loggingEnabled {
state . logf ( " [%d, <end>): stackOffset=%d\n" , afterPrologVal , soff )
}
2021-04-27 12:40:43 -04:00
// fill in size
ctxt . Arch . ByteOrder . PutUint16 ( list [ sizeIdx : ] , uint16 ( len ( list ) - sizeIdx - 2 ) )
2021-11-10 10:44:00 -05:00
rval . LocationLists [ pidx ] = list
2021-04-27 12:40:43 -04:00
pidx ++
}
}